Backfilling a Continuous Query?

influxdb
#1

We have one measurement that stores fine-grained event data.

From this, we create several continuous queries that roll up the data in specific ways.

Let’s say that later we realize we want to have a new continuous query, but we wish we had created it at the beginning! Is it possible to “backfill” that continuous query so it runs not just on data going forward, but also fills in historical data as well?

Get highest value per month
#2

Yeah you can use the same CQ with time clauses under “where”.

e.g.

CQ
CREATE CONTINUOUS QUERY "1h_event_count"
ON "db_name"
BEGIN
SELECT sum(“count”) as "count"
INTO “2_years”."events"
FROM “6_months”."events"
GROUP BY time(1h)
END;

Backfill the measurement using:

SELECT sum(“count”) as "count"
INTO “2_years”."events"
FROM “6_months”."events"
where time > and time < "End Time"
GROUP BY time(1h)

#3

Just be careful if your data is sparse. The SELECT INTO will iterate over every possible time interval between the specified start and end times, which could be a lot of wasted queries if there is no data for an extended time period.

For example, if your devices only take readings during the day and are quiescent overnight, the SELECT INTO will generate a lot of pointless data searches looking for overnight data.

#4

@sbains Thanks for the tip! If I’m backfilling months of data, is this query going to potentially cause InfluxDB to choke? My understanding about CQs is that they perform well because they are only looking at a small slice of time at once. With a very wide time range in the “where” clause, would this cause a huge mega-query that could cause Influx to crash or slow down for many hours?

#5

Yeah you will need to split the query into smaller time frames which will ensure that it doesn’t cause any performance issues.