Retention policy/continuous queries not collecting data correctly

loganmeetsworld · April 18, 2017, 1:52pm

Hello everyone! I had a question about continuous queries and retention policies. Our team can’t seem to get ours to work properly. Ideally we’d like to have a default policy that keeps metrics at a 1s retention for two days. This policy seems to be working well. Then we’d like a policy that keeps metrics at a 10s retention for two weeks that queries from our default. Then we’d have a longterm policies for quarterly and longterm retention for 1m and 5m respectively. However, the non-default policies are not getting populated with any data. We have also noticed that we have some server write errors, which may be related to failing queries.

These are our retention policies:

name      duration   shardGroupDuration replicaN default
----      --------   ------------------ -------- -------
two_days  48h0m0s    24h0m0s            1        true
two_weeks 336h0m0s   24h0m0s            1        false
quarterly 2160h0m0s  24h0m0s            1        false
long_term 26280h0m0s 168h0m0s           1        false

These are our continuous queries:

name: telegraf
name                query
----                -----
tg_two_weeks_10s_cq CREATE CONTINUOUS QUERY tg_two_weeks_10s_cq ON telegraf BEGIN SELECT mean(*) INTO telegraf.two_weeks.:MEASUREMENT FROM telegraf.two_days./.*/ GROUP BY time(10s), * END
tg_quarterly_1m_cq  CREATE CONTINUOUS QUERY tg_quarterly_1m_cq ON telegraf BEGIN SELECT mean(*) INTO telegraf.quarterly.:MEASUREMENT FROM telegraf.two_days./.*/ GROUP BY time(1m), * END
tg_long_term_5m_cq  CREATE CONTINUOUS QUERY tg_long_term_5m_cq ON telegraf BEGIN SELECT mean(*) INTO telegraf.long_term.:MEASUREMENT FROM telegraf.two_days./.*/ GROUP BY time(5m), * END

If anyone has any pointers or places to look, I’d really appreciate it!

pauldix · April 18, 2017, 1:59pm

Continuous queries as they’re currently designed will most likely have a problem trying to aggregate everything across the entire database. If you’re looking to aggregate all measurements and fields you’ll be better off using Kapacitor as an aggregator. See the example here: Kapacitor as a Continous Query engine | InfluxData Documentation Archive

We have an open issue in the database to add better support for aggregating everything. See these issues called Intelligent Rollups:

github.com/influxdata/influxdb

Add intelligent rollups

opened 10:28PM - 24 Jun 16 UTC

closed 12:18AM - 24 Aug 16 UTC

pauldix

RFC

There is a need to specify rollup rules for everything, or almost everything in …the database. These should also be tied in with the query language so that queries can use whatever downsample makes sense based on the time range and level of precision required. The largest rollup interval will be 1h. [Useful information about requirements is in a comment below](/influxdata/influxdb/issues/6910#issuecomment-238904814) # API The two sides of the API are the queries to create and show the downsampling rules, and the extra syntax to specify that a query should automatically scale to lower precision data if needed. ## Working with Rollups For the downsampling rules, you'd specify a matcher, the functions (and optionally the new field names), and the downsampling periods (each tied to a retention policy). Here's an idea for what that might look like. ``` CREATE ROLLUP "lagged_data" ON "mydb" measurements("field_sensors") fields(*) functions(mean, count) periods(5m, "5m_rollup_retention_policy", 1h, "1h_rollup_retention_policy) recompute(20m) CREATE ROLLUP "foo" ON "mydb" measurements("foo", "bar") fields(*) functions(mean, count, max, percentile(90) as perc_90, percentile(99) as perc_99) periods(5m, "5m_rollup_retention_policy", 1h, "1h_rollup_retention_policy) CREATE ROLLUP "everything" ON "mydb" measurements(*) fields(*) functions(mean, count, max) periods(5m, "5m_rollup_retention_policy", 1h, "1h_rollup_retention_policy) SHOW ROLLUPS DROP ROLLUP "foo" BACKFILL "foo" FROM "2015-01-01" TO "2016-06-23" THROTTLE 10s ``` The measurements and fields functions should be able to take *, strings, or regexes. The number of arguments in periods will always be a factor of 2 with the first argument being a duration and the second argument being the retention policy that will get written into. The `recompute` function tells the system to keep the current interval around and do a final compute of the rollup after that amount of time has passed. Backfill will backfill a specific rollup from a start time to and end time. The query should return immediately and should run in the background on the server. The optional THROTTLE will pause between backfilling measurements to avoid overloading. The TO is optional. If not specified, backfill would go all the way up to now. Time should be in RFC3339 format. ## Querying Rollups And the query syntax for specifying autoscaling to different rollups based on time range: ``` SELECT mean(count) FROM "foo" WHERE time > now() - 30d SCALE(mean, mean) ``` Scale would take an argument that specifies which field name from the select clause to scale and the rollup value to use if it needs it. The query should either compute using the raw values or compute using whichever rollup is appropriate for that period of time range. When to scale could get tricky so we may want to have other arguments to the scale function to specify it more granularly. The query should return a warning in the query results if the scale logic specified that it should use a rollup, but an appropriate one couldn't be found and it had to compute off the raw results. If the start time of the query is before the end of the raw values retention policy, an error should be returned that it can't properly scale if no rollup exists. # Implementation These rollups should be done all the time on data getting written in that is in the current interval. This will have to work for all series in the database so it'll probably have to be kept hot in memory and updated on every write. This will obviously reduce performance since there's additional overhead, but I think it's better than the alternative of trying to query all data every 5m or whatever the rollup period is. When a write comes in it should check against the rollup rules in order and use the first one that matches the measurement name. The rollups should take all tag values along with them. The naming convention for rollup values should be `<field_name>_<rollup function or AS name>` So if you had a field named `used_bytes` on measurement `hd` and you were rolling up all fields with `max`, you'd have fields called `used_bytes_max`. Ultimately, the idea is that you'd have each different precision of rollup stored in a separate retention policy. So the measurement name and tags would remain the same. The field names would be different, but that would be handled automatically for the user by the `SCALE` option. They could always query the lower precision data directly by asking for that field name from the specific retention policy. If the user does a restart of the database, they'll need to do a backfill query if they want that period's rollups to be accurate. [Useful information about requirements is in a comment below](/influxdata/influxdb/issues/6910#issuecomment-238904814)

github.com/influxdata/influxdb

Intelligent Rollups and Querying of Aggregated Data

opened 12:17AM - 24 Aug 16 UTC

jsternberg

area/queries kind/feature-request 1.x

### Feature Request The database should support more intelligent rollups and qu…erying of aggregated data. Currently, the only way to rollup data is through manually setting up continuous queries and then manually modifying the select statements to query that data which requires the user to know which retention policies exist rather than it being discovered automatically. **Proposal:** It should be simple for an administrator to setup rollups for an entire database and users should not need knowledge of the rollups for them to automatically start using them. Using rollups should be automatic and performant. **Current behavior:** Rollups require an administrator to create a retention policy and a continuous query like this: ``` CREATE RETENTION POLICY "5m" ON mydb DURATION 0s REPLICATION 1; CREATE CONTINUOUS QUERY mycq ON mydb BEGIN SELECT mean(*) INTO mydb."5m".:MEASUREMENT FROM /.*/ GROUP BY time(5m) END; ``` It then requires the user to query the mean of a measurement like this: ``` SELECT mean_value FROM mydb."5m".cpu ``` If the server is not running when an interval should be calculated, that interval will never be run and the user needs to run that query manually. There is no way to automatically reuse the information in continuous queries to backfill data either. Also, if data is written after the last calculation, it will never enter the aggregation. It is possible to obtain partial data, but this involves telling the continuous query to resample more frequently than the default time. ``` CREATE CONTINUOUS QUERY mycq ON mydb RESAMPLE EVERY 1m BEGIN SELECT mean(*) INTO mydb."5m".:MEASUREMENT FROM /.*/ GROUP BY time(5m) END; ``` This will obtain partial data every minute, but it will not be an active result of all of the data that is available. **Desired behavior:** Administrators should have easier commands to create rollups (optional since the commands above are fairly easy to write). Users should not need to care about retention policies when trying to get the data they want. The above query the user should write is: ``` SELECT mean(value) FROM cpu GROUP BY time(5m) ``` This should use the rollup automatically if one is available and would return the same value as querying the raw data. Along with using the rollup automatically, we would also include syntax to automatically select the appropriate precision interval to be used based on the time range or number of points requested. So if we have raw data that is retained for 1 week, 1 minute aggregated data for 2 weeks, and 5 minute aggregated data for 4 weeks and we said this: ``` SELECT mean(value) FROM cpu WHERE time >= now() - 1w GROUP BY time(auto) ``` This would automatically select the 1 minute precision interval because that is the lowest precision. If we scaled this query for the past 3 weeks, we would return the 5 minute precision level. **Use case:** Downsampling long term data into new retention policies and greater performance by precalculating certain aggregates for certain intervals. This was the original use case for continuous queries, but the current continuous queries are too cumbersome for this currently. ### Documentation - [Requirements Doc](https://github.com/influxdata/influxdb/pull/7208)

loganmeetsworld · April 24, 2017, 8:14pm

Thank you so much for your speedy response. I really appreciate it. I’m looking into Kapacitor, but having a hard time finding examples of how this would be used in my case, beyond single recordings. Can I use wildcards, *, with Kapactior to downsample all measurements to my various retention policies with only single TICKscripts for each policy? Something like this has not worked so far for my first 10s retention policy:

stream
    |from()
        .database('myDatabase')
        .measurement('*')
        .groupBy(*)
    |window()
        .period(10s)
        .every(10s)
        .align()
    |mean('*')
    |influxDBOut()
        .database('myDatabase')
        .retentionPolicy('myTenSecondRP')
        .measurement(*)
        .precision('s')

Topic		Replies	Views
Continuous query skipping measurements Store	0	493	September 10, 2018
Continiouse Queries and Retention Policies InfluxDB 2 influxdb	1	622	June 16, 2020
How to configure different retention policies for different measurements influxdb	10	8263	April 18, 2019
How to create a retention policy for measurement? influxdb	1	2323	August 13, 2021
Telegraf and continuous query influxdb , telegraf , time-series	1	723	September 20, 2020

Retention policy/continuous queries not collecting data correctly

Related topics