ryantxu
December 20, 2018, 4:47pm
1
I am working on the standard problem of needing to pick the correct rollup for the selected time interval.
We have lots of data stored every 1ms, and build grafana dashboards to show it. We currently have a very custom grafana datasource that picks a different rollup (retention policy) based on the query interval.
Is there any way to do this directly with flux? Something like:
from( my_func_that_somehow_knows_the_range() )
|> range(start: -1h)
Any ideas? The real goal is:
opened 12:17AM - 24 Aug 16 UTC
area/queries
kind/feature-request
1.x
### Feature Request
The database should support more intelligent rollups and qu… erying of
aggregated data. Currently, the only way to rollup data is through
manually setting up continuous queries and then manually modifying the
select statements to query that data which requires the user to know
which retention policies exist rather than it being discovered
automatically.
**Proposal:**
It should be simple for an administrator to setup rollups for an entire
database and users should not need knowledge of the rollups for them to
automatically start using them. Using rollups should be automatic and
performant.
**Current behavior:**
Rollups require an administrator to create a retention policy and a
continuous query like this:
```
CREATE RETENTION POLICY "5m" ON mydb DURATION 0s REPLICATION 1;
CREATE CONTINUOUS QUERY mycq ON mydb BEGIN
SELECT mean(*) INTO mydb."5m".:MEASUREMENT FROM /.*/ GROUP BY time(5m)
END;
```
It then requires the user to query the mean of a measurement like this:
```
SELECT mean_value FROM mydb."5m".cpu
```
If the server is not running when an interval should be calculated, that
interval will never be run and the user needs to run that query
manually. There is no way to automatically reuse the information in
continuous queries to backfill data either.
Also, if data is written after the last calculation, it will never enter
the aggregation.
It is possible to obtain partial data, but this involves telling the
continuous query to resample more frequently than the default time.
```
CREATE CONTINUOUS QUERY mycq ON mydb RESAMPLE EVERY 1m BEGIN
SELECT mean(*) INTO mydb."5m".:MEASUREMENT FROM /.*/ GROUP BY time(5m)
END;
```
This will obtain partial data every minute, but it will not be an active
result of all of the data that is available.
**Desired behavior:**
Administrators should have easier commands to create rollups (optional
since the commands above are fairly easy to write).
Users should not need to care about retention policies when trying to
get the data they want. The above query the user should write is:
```
SELECT mean(value) FROM cpu GROUP BY time(5m)
```
This should use the rollup automatically if one is available and would
return the same value as querying the raw data.
Along with using the rollup automatically, we would also include syntax to automatically select the appropriate precision interval to be used based on the time range or number of points requested. So if we have raw data that is retained for 1 week, 1 minute aggregated data for 2 weeks, and 5 minute aggregated data for 4 weeks and we said this:
```
SELECT mean(value) FROM cpu WHERE time >= now() - 1w GROUP BY time(auto)
```
This would automatically select the 1 minute precision interval because that is the lowest precision. If we scaled this query for the past 3 weeks, we would return the 5 minute precision level.
**Use case:**
Downsampling long term data into new retention policies and greater
performance by precalculating certain aggregates for certain intervals.
This was the original use case for continuous queries, but the current
continuous queries are too cumbersome for this currently.
### Documentation
- [Requirements Doc](https://github.com/influxdata/influxdb/pull/7208)
but that solves the general case. I wonder if flux would let me put custom logic into a single function