Dependent continuous queries at multiple resolutions

influxdb
#1

I’m using InfluxDB to store sensor data (each sensor measuring every 30 seconds or so, a few thousand sensors). I often want to plot data for longer timeframes, so I want to use continuous queries to downsample the data. Currently my plan is to store min, max, mean, and count for each sensor at resolutions of 1 hour, 1 day, and 1 week.

Because a day is 24 hours and a week is 7 days, I’d like to do the week aggregation by querying from the day measurement, and the day from the hours, that way I don’t have to query a week’s worth of raw data.

The issue is that if the week CQ runs before the day CQ, it will be missing the last day’s data. So I’d either need to specify the order I want the CQs to run in, or somehow express a dependency between them that could be used to set their ordering. Is there any way to do this in Influx, or some better approach to downsampling to multiple resolutions?

#2

@ssfrr There are RESAMPLE and EVERY paramaters for CQs designed to solve exactly this problem! This will ensure that if there are timing issues they are resolved for the historical data.

#3

Thanks @jackzampolin, I ran across that part of the docs but I’m not 100% sure how that targets this issue (maybe I’m being dense).

Maybe a more concrete example of what I want will help. I want the following 3 continuous queries:

CREATE CONTINUOUS QUERY "cq_1h"
ON "mydb"
BEGIN
  SELECT mean("value") as "mean", max("value") as "max", min("value") as "min", count("value") as "count",
  INTO "sensordata_1h"
  FROM "sensordata"
  GROUP BY time(1h), "sensor_id"
END

CREATE CONTINUOUS QUERY "cq_1d"
ON "mydb"
BEGIN
  SELECT mean("mean") as "mean", max("max") as "max", min("min") as "min", sum("count") as "count",
  INTO "sensordata_1d"
  FROM "sensordata_1h"
  GROUP BY time(1d), "sensor_id"
END

CREATE CONTINUOUS QUERY "cq_1w"
ON "mydb"
BEGIN
  SELECT mean("mean") as "mean", max("max") as "max", min("min") as "min", sum("count") as "count",
  INTO "sensordata_1w"
  FROM "sensordata_1d"
  GROUP BY time(1w), "sensor_id"
END

The issue is that I want them to run in the order “cq_1h”, “cq_1d”, “cq_1w” so that the 1d query isn’t missing the last hour and the 1w query isn’t missing the last day. One way to sort of solve this would be to use RESAMPLE FOR to always do the last 2 days daily and 2 weeks weekly so I can reach back and correct the previous aggregation, but it seems wasteful. Would you mind clarifying your suggestion? It would be great if I could specify a DEPENDS ON "cq_1h" or something to enforce the run order (or I just need to better understand how RESAMPLE applies here).

#4

This is the suggested method to perform this type of downsampling.

#5

Is there some syntax where I can have the CQ run on now()-2d to now()-1d, so that I’m not putting potentially wrong data in the DB? In addition to performance concerns (which might not be a real issue here, I don’t have a good gut feeling for how expensive these are), my bigger concern is that when I query data from the most-recent aggregate (an hour ago, yesterday, last week), there will be data in the DB but it could be wrong. I’d rather just not write the data until I’m writing the correct data.