Continuous query on the full database

thebuleon29 · April 10, 2019, 9:17am

Hi,
I am a new InfluxDB user and I need some advice on how to correctly use Continuous Queries for downsampling. If I understood the documentation correctly, a Continuous Query to group data by an interval i will be executed on data with timestamps in a range [now()-i, now()]. That means that if I insert data with timestamps older than now()-i, it won’t be concerned by the CQ.

I understand that normally CQ are used to save disk space by copying new data in databases with longer Retention Policies and a lower level of detail. I don’t want to use it that way. I am using downsampled databases to make queries on a large time ranges faster and the main database (the one with the non downsampled data) for queries on smaller time ranges. I will also have to insert data with old timestamps from CSV files in the database I am building. That means that I need the CQ to covers the full database.

I have two ideas :
1- run a SELECT … INTO … FROM … GROUP BY … query after importing a CSV file.
2- make a CQ with the EVERY and FOR keywords like :

CREATE CONTINUOUS QUERY <cq_name> ON <database_name>
RESAMPLE EVERY <interval1> FOR <interval2>
BEGIN
  <cq_query>
END

This should run the query every EVERY intervals on data with timestamps between now()-FOR and now(). So my idea is to make the FOR interval “infinite”.

What do you guys think about these ideas ? Is there a better way to do this ?

rawkode · April 10, 2019, 9:38am

Hopefully this example will help

CREATE CONTINUOUS QUERY "rollup_1h" ON "system_stats"
BEGIN
   SELECT mean(*) INTO forever.cpu FROM twoweeks.cpu GROUP BY time(1h)
END

This rolls up 1s resolution into 1h resolution, rolling up data from twoweeks.cpu into forever.cpu

thebuleon29 · April 10, 2019, 9:53am

Thanks for your answer
But if I quote the documentation (InfluxQL Continuous Queries | InfluxDB OSS 1.7 Documentation):

" CQs operate on realtime data. They use the local server’s timestamp, the GROUP BY time() interval, and InfluxDB’s preset time boundaries to determine when to execute and what time range to cover in the query.
CQs execute at the same interval as the cq_query ’s GROUP BY time() interval, and they run at the start of InfluxDB’s preset time boundaries. If the GROUP BY time() interval is one hour, the CQ executes at the start of every hour.
When the CQ executes, it runs a single query for the time range between now() and now() minus the GROUP BY time() interval. If the GROUP BY time() interval is one hour and the current time is 17:00, the query’s time range is between 16:00 and 16:59.999999999."

Doesn’t this mean that only the data from the last hour will be downsampled into “rollup_1h” ?

rawkode · April 10, 2019, 10:26am

From the docs:

Issue 3: Backfilling results for older data

CQs operate on realtime data, that is, data with timestamps that occur relative to now() . Use a basic INTO query to backfill results for data with older timestamps.

So the following query, as is my understanding, should handle backfilling older data, as it uses INTO

CREATE CONTINUOUS QUERY "rollup_1h" ON "system_stats"
BEGIN
   SELECT mean(*) INTO forever.cpu FROM twoweeks.cpu GROUP BY time(1h)
END

thebuleon29 · April 10, 2019, 10:50am

According to InfluxQL Continuous Queries | InfluxDB OSS 1.7 Documentation, the INTO clause is mandatory in a CQ (“The cq_query requires a function, an INTO, and a GROUP BY time() clause”). So my understanding is that the INTO clause does not change the behavior of the CQ regarding the time range on which it performs the query.

I think that the “basic INTO query” you mentioned refers to a regular query (not a continuous one) with an INTO clause, which corresponds to what I called “solution 1” in my original question.

rawkode · April 10, 2019, 10:54am

Hi @thebuleon29,

Let me check with my colleagues and I’ll get back to you

rawkode · April 10, 2019, 11:27am

Hi @thebuleon29,

My colleagues haven’t gotten back to me yet (timezones ), but I’ve done a lot more reading and I think you’re correct. I’ll submit a PR to the docs to remove any ambiguity in them.

You’d need to execute a one time SELECT INTO to backfill older data

thebuleon29 · April 10, 2019, 11:28am

Ok, thank you very much !

Topic		Replies	Views
Continuous query downsampling issue Store influxdb , influxql	8	1233	October 13, 2020
Downsample and Downsampling a Downsample	2	982	April 5, 2017
Continuous query gives different results than manual query execution InfluxDB 2 influxdb , query	2	666	June 24, 2020
Continuous Query >> count aggregation >> group by time >> not fetching correct count from the data ingested	0	559	December 11, 2018
Downsampling batch-imported-data automatically influxdb	0	929	February 14, 2018

Continuous query on the full database

Related topics