Hello everyone! I had a question about continuous queries and retention policies. Our team can’t seem to get ours to work properly. Ideally we’d like to have a default policy that keeps metrics at a 1s retention for two days. This policy seems to be working well. Then we’d like a policy that keeps metrics at a 10s retention for two weeks that queries from our default. Then we’d have a longterm policies for quarterly and longterm retention for 1m and 5m respectively. However, the non-default policies are not getting populated with any data. We have also noticed that we have some server write errors, which may be related to failing queries.
These are our retention policies:
name duration shardGroupDuration replicaN default
---- -------- ------------------ -------- -------
two_days 48h0m0s 24h0m0s 1 true
two_weeks 336h0m0s 24h0m0s 1 false
quarterly 2160h0m0s 24h0m0s 1 false
long_term 26280h0m0s 168h0m0s 1 false
These are our continuous queries:
name: telegraf
name query
---- -----
tg_two_weeks_10s_cq CREATE CONTINUOUS QUERY tg_two_weeks_10s_cq ON telegraf BEGIN SELECT mean(*) INTO telegraf.two_weeks.:MEASUREMENT FROM telegraf.two_days./.*/ GROUP BY time(10s), * END
tg_quarterly_1m_cq CREATE CONTINUOUS QUERY tg_quarterly_1m_cq ON telegraf BEGIN SELECT mean(*) INTO telegraf.quarterly.:MEASUREMENT FROM telegraf.two_days./.*/ GROUP BY time(1m), * END
tg_long_term_5m_cq CREATE CONTINUOUS QUERY tg_long_term_5m_cq ON telegraf BEGIN SELECT mean(*) INTO telegraf.long_term.:MEASUREMENT FROM telegraf.two_days./.*/ GROUP BY time(5m), * END
If anyone has any pointers or places to look, I’d really appreciate it!
Continuous queries as they’re currently designed will most likely have a problem trying to aggregate everything across the entire database. If you’re looking to aggregate all measurements and fields you’ll be better off using Kapacitor as an aggregator. See the example here: Kapacitor as a Continous Query engine | InfluxData Documentation Archive
We have an open issue in the database to add better support for aggregating everything. See these issues called Intelligent Rollups:
opened 10:28PM - 24 Jun 16 UTC
closed 12:18AM - 24 Aug 16 UTC
RFC
There is a need to specify rollup rules for everything, or almost everything in … the database. These should also be tied in with the query language so that queries can use whatever downsample makes sense based on the time range and level of precision required. The largest rollup interval will be 1h.
[Useful information about requirements is in a comment below](/influxdata/influxdb/issues/6910#issuecomment-238904814)
# API
The two sides of the API are the queries to create and show the downsampling rules, and the extra syntax to specify that a query should automatically scale to lower precision data if needed.
## Working with Rollups
For the downsampling rules, you'd specify a matcher, the functions (and optionally the new field names), and the downsampling periods (each tied to a retention policy). Here's an idea for what that might look like.
```
CREATE ROLLUP "lagged_data" ON "mydb"
measurements("field_sensors")
fields(*)
functions(mean, count)
periods(5m, "5m_rollup_retention_policy", 1h, "1h_rollup_retention_policy)
recompute(20m)
CREATE ROLLUP "foo" ON "mydb"
measurements("foo", "bar")
fields(*)
functions(mean, count, max, percentile(90) as perc_90, percentile(99) as perc_99)
periods(5m, "5m_rollup_retention_policy", 1h, "1h_rollup_retention_policy)
CREATE ROLLUP "everything" ON "mydb"
measurements(*)
fields(*)
functions(mean, count, max)
periods(5m, "5m_rollup_retention_policy", 1h, "1h_rollup_retention_policy)
SHOW ROLLUPS
DROP ROLLUP "foo"
BACKFILL "foo" FROM "2015-01-01" TO "2016-06-23" THROTTLE 10s
```
The measurements and fields functions should be able to take *, strings, or regexes. The number of arguments in periods will always be a factor of 2 with the first argument being a duration and the second argument being the retention policy that will get written into. The `recompute` function tells the system to keep the current interval around and do a final compute of the rollup after that amount of time has passed.
Backfill will backfill a specific rollup from a start time to and end time. The query should return immediately and should run in the background on the server. The optional THROTTLE will pause between backfilling measurements to avoid overloading.
The TO is optional. If not specified, backfill would go all the way up to now. Time should be in RFC3339 format.
## Querying Rollups
And the query syntax for specifying autoscaling to different rollups based on time range:
```
SELECT mean(count) FROM "foo"
WHERE time > now() - 30d
SCALE(mean, mean)
```
Scale would take an argument that specifies which field name from the select clause to scale and the rollup value to use if it needs it. The query should either compute using the raw values or compute using whichever rollup is appropriate for that period of time range. When to scale could get tricky so we may want to have other arguments to the scale function to specify it more granularly.
The query should return a warning in the query results if the scale logic specified that it should use a rollup, but an appropriate one couldn't be found and it had to compute off the raw results.
If the start time of the query is before the end of the raw values retention policy, an error should be returned that it can't properly scale if no rollup exists.
# Implementation
These rollups should be done all the time on data getting written in that is in the current interval. This will have to work for all series in the database so it'll probably have to be kept hot in memory and updated on every write. This will obviously reduce performance since there's additional overhead, but I think it's better than the alternative of trying to query all data every 5m or whatever the rollup period is.
When a write comes in it should check against the rollup rules in order and use the first one that matches the measurement name. The rollups should take all tag values along with them. The naming convention for rollup values should be `<field_name>_<rollup function or AS name>`
So if you had a field named `used_bytes` on measurement `hd` and you were rolling up all fields with `max`, you'd have fields called `used_bytes_max`. Ultimately, the idea is that you'd have each different precision of rollup stored in a separate retention policy. So the measurement name and tags would remain the same.
The field names would be different, but that would be handled automatically for the user by the `SCALE` option. They could always query the lower precision data directly by asking for that field name from the specific retention policy.
If the user does a restart of the database, they'll need to do a backfill query if they want that period's rollups to be accurate.
[Useful information about requirements is in a comment below](/influxdata/influxdb/issues/6910#issuecomment-238904814)
opened 12:17AM - 24 Aug 16 UTC
area/queries
kind/feature-request
1.x
### Feature Request
The database should support more intelligent rollups and qu… erying of
aggregated data. Currently, the only way to rollup data is through
manually setting up continuous queries and then manually modifying the
select statements to query that data which requires the user to know
which retention policies exist rather than it being discovered
automatically.
**Proposal:**
It should be simple for an administrator to setup rollups for an entire
database and users should not need knowledge of the rollups for them to
automatically start using them. Using rollups should be automatic and
performant.
**Current behavior:**
Rollups require an administrator to create a retention policy and a
continuous query like this:
```
CREATE RETENTION POLICY "5m" ON mydb DURATION 0s REPLICATION 1;
CREATE CONTINUOUS QUERY mycq ON mydb BEGIN
SELECT mean(*) INTO mydb."5m".:MEASUREMENT FROM /.*/ GROUP BY time(5m)
END;
```
It then requires the user to query the mean of a measurement like this:
```
SELECT mean_value FROM mydb."5m".cpu
```
If the server is not running when an interval should be calculated, that
interval will never be run and the user needs to run that query
manually. There is no way to automatically reuse the information in
continuous queries to backfill data either.
Also, if data is written after the last calculation, it will never enter
the aggregation.
It is possible to obtain partial data, but this involves telling the
continuous query to resample more frequently than the default time.
```
CREATE CONTINUOUS QUERY mycq ON mydb RESAMPLE EVERY 1m BEGIN
SELECT mean(*) INTO mydb."5m".:MEASUREMENT FROM /.*/ GROUP BY time(5m)
END;
```
This will obtain partial data every minute, but it will not be an active
result of all of the data that is available.
**Desired behavior:**
Administrators should have easier commands to create rollups (optional
since the commands above are fairly easy to write).
Users should not need to care about retention policies when trying to
get the data they want. The above query the user should write is:
```
SELECT mean(value) FROM cpu GROUP BY time(5m)
```
This should use the rollup automatically if one is available and would
return the same value as querying the raw data.
Along with using the rollup automatically, we would also include syntax to automatically select the appropriate precision interval to be used based on the time range or number of points requested. So if we have raw data that is retained for 1 week, 1 minute aggregated data for 2 weeks, and 5 minute aggregated data for 4 weeks and we said this:
```
SELECT mean(value) FROM cpu WHERE time >= now() - 1w GROUP BY time(auto)
```
This would automatically select the 1 minute precision interval because that is the lowest precision. If we scaled this query for the past 3 weeks, we would return the 5 minute precision level.
**Use case:**
Downsampling long term data into new retention policies and greater
performance by precalculating certain aggregates for certain intervals.
This was the original use case for continuous queries, but the current
continuous queries are too cumbersome for this currently.
### Documentation
- [Requirements Doc](https://github.com/influxdata/influxdb/pull/7208)
Thank you so much for your speedy response. I really appreciate it. I’m looking into Kapacitor, but having a hard time finding examples of how this would be used in my case, beyond single recordings. Can I use wildcards, *
, with Kapactior to downsample all measurements to my various retention policies with only single TICKscripts for each policy? Something like this has not worked so far for my first 10s retention policy:
stream
|from()
.database('myDatabase')
.measurement('*')
.groupBy(*)
|window()
.period(10s)
.every(10s)
.align()
|mean('*')
|influxDBOut()
.database('myDatabase')
.retentionPolicy('myTenSecondRP')
.measurement(*)
.precision('s')