Hi everyone,
I’m trying to set up InfluxDB to collect data history for use with Grafana. In order to reduce storage I want to use CQs.
My current setup looks like this:
DB is “node”
Data is put in the DB whenever it’s available to the default-RP “rp_full”.
I also set up a RP “rp_hourly” which will store data for a longer period. This is supposed to have one entry per hour. On the downsampled data I want to have the min, mean and max of the condensed input data.
So my CQ looks like this:
CREATE CONTINUOUS QUERY cq_1hour ON node BEGIN
SELECT mean(value) AS mean_val, min(value) AS min_val, max(value) AS max_val, stddev(value) AS stddev_val INTO node.rp_hourly.:MEASUREMENT
FROM (SELECT mean(value) AS value FROM node.rp_full./.*/ GROUP BY time(1m) fill(previous)) GROUP BY time(1h), *
END
I need the subquery as my input data is submitted non-regularly whenever the source values change.
The thing is that the resulting data in node.rp_hourly.* does not make any sense.
For example purposes we will be looking at measurement “4/0/0” and the timeframe 8 to 9 am.
The “value” field on the source data for this timespan looks like this:
time value
---- -----
2020-06-11T07:58:44.54209695Z 14.949999809265137
2020-06-11T08:05:27.58344215Z 9.630000114440918
2020-06-11T08:05:37.269541956Z 16.100000381469727
2020-06-11T08:05:42.112164073Z 9.789999961853027
2020-06-11T08:05:45.738821711Z 16.09000015258789
2020-06-11T08:11:22.206091906Z 9.460000038146973
2020-06-11T08:15:41.221205537Z 6.75
2020-06-11T08:15:43.652781104Z 10.710000038146973
2020-06-11T08:17:13.209461204Z 7.389999866485596
2020-06-11T08:18:39.138983555Z 15.300000190734863
2020-06-11T08:23:30.837954423Z 8.039999961853027
2020-06-11T08:23:34.468048259Z 16.389999389648438
2020-06-11T08:23:55.044761502Z 9.699999809265137
2020-06-11T08:25:56.071753243Z 12.25
2020-06-11T08:26:10.5976115Z 8.239999771118164
2020-06-11T08:27:20.797840049Z 16.09000015258789
2020-06-11T08:28:00.73611395Z 11.390000343322754
2020-06-11T08:28:10.429800854Z 14.65999984741211
2020-06-11T08:30:36.873489326Z 9.850000381469727
2020-06-11T08:31:43.447027025Z 6.949999809265137
2020-06-11T08:32:17.332299146Z 9.470000267028809
2020-06-11T08:33:55.368249355Z 6.989999771118164
2020-06-11T08:34:13.523218428Z 9.880000114440918
2020-06-11T08:34:54.679193217Z 6.929999828338623
2020-06-11T08:35:20.086006869Z 9.579999923706055
2020-06-11T08:35:34.61665826Z 6.869999885559082
2020-06-11T08:36:52.083092042Z 10.279999732971191
2020-06-11T08:38:08.335660835Z 6.659999847412109
2020-06-11T08:38:27.696576342Z 9.869999885559082
2020-06-11T08:39:58.467251499Z 12.359999656677246
2020-06-11T08:40:05.738706411Z 9.210000038146973
2020-06-11T08:40:26.302864007Z 13.0600004196167
2020-06-11T08:41:21.981400515Z 9.180000305175781
2020-06-11T08:45:05.897584778Z 6.449999809265137
2020-06-11T08:45:55.51365124Z 11.329999923706055
2020-06-11T08:47:12.973742678Z 6.829999923706055
2020-06-11T08:47:32.346527124Z 11.489999771118164
2020-06-11T08:48:05.022669143Z 8.329999923706055
2020-06-11T08:51:36.835643437Z 15.890000343322754
2020-06-11T08:52:43.401826926Z 10.789999961853027
2020-06-11T08:52:51.867819687Z 15.649999618530273
2020-06-11T08:54:20.228000897Z 9.90999984741211
2020-06-11T08:54:22.64427344Z 15.180000305175781
2020-06-11T08:55:43.750395401Z 11.300000190734863
2020-06-11T08:57:01.20851979Z 8.1899995803833
2020-06-11T08:59:31.292350109Z 10.3100004196167
Running the query from the CQ (looked at InfluxDB’s log to make sure it is exactly the same) as below manually returns the expected data:
SELECT mean(value) AS mean_val, min(value) AS min_val, max(value) AS max_val, stddev(value) AS stddev_val FROM (SELECT mean(value) AS value FROM node.rp_full.“4/0/0” GROUP BY time(1m) fill(previous)) WHERE time >= ‘2020-06-11T08:00:00Z’ AND time < ‘2020-06-11T09:00:00Z’ GROUP BY time(1h),*
name: 4/0/0
time mean_val min_val max_val stddev_val
2020-06-11T08:00:00Z 10.854787903121025 6.949999809265137 16.09000015258789 2.5455139565628055
Now, querying the value from node.rp_hourly.“4/0/0” gives this:
SELECT * FROM rp_hourly.“4/0/0” WHERE time >= ‘2020-06-11T08:00:00Z’ AND time < ‘2020-06-11T09:00:00Z’
name: 4/0/0
time max_val mean_val min_val stddev_val
2020-06-11T08:00:00Z 23.2 11.883555577860937 6.949999809265137 4.2154280061646405
Notice the difference in all columns, even though the log says it executed the same query for the CQ. And obviously those values are simply wrong (as in the longer output above the higest value recorded in that timeframe was indeed 16 and not 23.2 and so on. I even had one of those rp_hourly values be 100 even though the source values rarely exceed 20 at all.
Any idea on what is going wrong here? Or am I misunderstanding the CQ behaviour itself somehow?
Regards,
Chris
PS: Sorry for the display style of the outputs, the “preformatted text” formatting doesn’t seem to do anything so I had to go with simple quotes