As discussed on the #influxdb channel on slack, here’s my complete write-up on the issue I’m having.
Our set-up
I have one large measurement in my database, called log_record. This measurement stores sensor values for a lot of sensors. There are two retention policies:
- four_weeks; contains four weeks of “raw” sensor data, about 10s interval
- forever; contains averages over 2 minutes and keeps that data forever
Every measurement in theses retention policies is tagged with a “sensor_id” which is the identifier of the meta-data on this sensor which we store in a mysql database.
Data in the forever retention policy are averages which are calculated based on the data in the four_weeks retention policy.
Continuous query
So first, to calculate the averages which are stored in the forever retention policy, I used a continuous query, like this:
CREATE CONTINUOUS QUERY cq_aggregate_log_record ON mydb
BEGIN
SELECT mean(value) AS value
INTO mydb.forever.log_record
FROM mydb.four_weeks.log_record
GROUP BY time(2m), *
END
This worked for a while until I found out that not all sensor data was correctly being averaged, actually, a lot of sensors did not receive averages at all, even though their data was certainly present in the four_weeks retention policy.
After trying to fix this continuous query, but failing, I gave up and switched to a script that was called periodically every 2 minutes, containing this query:
SELECT MEAN(value) AS value
INTO mydb.forever.log_record
FROM mydb.four_weeks.log_record
WHERE time > {{{ ten minutes ago calculated by my script }}}}
GROUP BY time(2m), *
This worked like a charm and correctly calculated all averages for all sensors. The continuous query issue was never solved, but alas, at least it worked using my script.
Kapacitor
However, recently we started using kapacitor for alerting, and I found out that kapacitor was also good at being a continuous query engine. So I set about to build a kapacitor task that would aggregate my sensor data, so I could finally remove my dirty script.
Batch task
I found this page and altered it a bit, to come to this:
batch
|query('SELECT mean(value) as value from "mydb"."four_weeks"."log_record"')
.period(10m)
.every(2m)
.groupBy(time(2m), *)
.align()
.fill('none')
|influxDBOut()
.database('mydb')
.retentionPolicy('forever')
.measurement('log_record')
.precision('s')
This worked, until I found out that the exact same issue was occuring as I had with the Continuous Query I previously tried: a lot of my sensor data was just not being averaged.
So I tried the stream example instead, but it behaved exactly the same: not all sensor data was being averaged.
When testing the stream task by recording a query and replaying that data over the task, I found out though that InfluxDB was correctly returning all averages for all my sensors. This is a gist of the response I got from InfluxDB when CURLing the query.
Note that all data up until line 2414 are correctly being written back to InfluxDB by kapacitor, using the InfluxDBOut node. All data after line 2414 is not being written back.
Nothing seems to be wrong with the data on line 2141 though…
I am completely at a loss where this issue might be coming from. Please @nathaniel, do you have any idea why this is happening?