Hello,
Moving the questions from here.
I’m trying to understand correctly the behaviour of the InfluxDB CQ vs Kapacitor Batch vs Kapacitor Stream.
For that purpose, I built the equivalent configurations for the 3 cases (or that’s my intention), where I want to calculate the mean of a field called responseTime that contains the response time of a service request grouped by 1 minute:
CREATE CONTINUOUS QUERY meanResponseTime_1m_cq ON events
RESAMPLE EVERY 1m
BEGIN
SELECT mean(responseTime) INTO meanResponseTime_1m_cq FROM test123_11 GROUP BY time(1m)
END
batch
|query('
SELECT responseTime
FROM "events"."retention_policy".test123_11
')
.every(1m)
.period(1m)
.align()
|mean('responseTime')
|influxDBOut()
.database('events')
.measurement('meanResponseTime_1m_batch')
.flushInterval(1s)
batch
|query('
SELECT mean(responseTime)
FROM "events"."retention_policy".test123_11
')
.every(1m)
.period(1m)
.align()
|influxDBOut()
.database('events')
.measurement('meanResponseTime_1m_batch_mean_in_query')
.flushInterval(1s)
stream
|from()
.database('events')
.retentionPolicy('retention_policy')
.measurement('test123_11')
|window()
.period(1m)
.every(1m)
.align()
|mean('responseTime')
|influxDBOut()
.database('events')
.measurement('meanResponseTime_1m_stream')
.flushInterval(1s)
The first question is, are these configuration equivalents?
Visualizing the data with Grafana (with the following queries), it can be seen that the resulting data is quite different:
SELECT mean("responseTime") FROM "test123_11" WHERE $timeFilter GROUP BY time(1m)
SELECT mean FROM "meanResponseTime_1m_cq" WHERE $timeFilter
SELECT mean FROM "meanResponseTime_1m_stream" WHERE $timeFilter
SELECT mean FROM "meanResponseTime_1m_batch" WHERE $timeFilter
SELECT mean FROM "meanResponseTime_1m_batch_mean_in_query" WHERE $timeFilter
-
Why this different behaviour?
-
The CQ result is exactly the same as querying directly to the original data, but with a delay as late as the period, in this case 1 minute. So, if now is 13:02:XX, the last period that can be queried is between 13:01:00 and 13:02:00. Why this behaviour? Why the CQ doesn’t query till 2 periods after? (the same behaviour as the batch calculating the mean in the query).
-
Why, in a Kapacitor Batch task, the behaviour is different between calculating the mean in the batch query than in the task as a query chaining method? (And it looks like calculating the mean in the query node is more accurate/correct)
-
Is the behaviour of the stream compared with the CQ or with the batch?
Additionally, similary to the last example, I’m trying to understand the behaviour of the windows. I would like to calculate the mean of the response time each 2 seconds, for the last 5 minutes (in the time point 00:00 (mm:ss), the mean of the response time of the events between -05:00 and the 00:00, in the point 00:02, the mean of the response time of the events between -04:58 and the 00:02, in the point 00:04, the mean of the response time of the events between -04:56 and the 00:04…)
CREATE CONTINUOUS QUERY meanResponseTime_window2s5m_cq ON events
RESAMPLE EVERY 2s
BEGIN
SELECT mean(responseTime) INTO meanResponseTime_window2s5m_cq FROM test123_11 GROUP BY time(5m)
END
stream
|from()
.database('events')
.retentionPolicy('retention_policy')
.measurement('test123_11')
|window()
.period(5m)
.every(2s)
.align()
|mean('responseTime')
|influxDBOut()
.database('events')
.measurement('meanResponseTime_window2s5m_stream')
.flushInterval(1s)
batch
|query('
SELECT responseTime
FROM "events"."retention_policy".test123_11
')
.every(2s)
.period(5m)
.align()
flujo
|mean('responseTime')
|influxDBOut()
.database('events')
.measurement('meanResponseTime_window2s5m_batch')
.flushInterval(1s)
batch
|query('
SELECT mean(responseTime)
FROM "events"."retention_policy".test123_11
')
.every(2s)
.period(5m)
.align()
|influxDBOut()
.database('events')
.measurement('meanResponseTime_window2s5m_batch_mean_in_query')
.flushInterval(1s)
Visualizing the data with Grafana (with the following queries), it can be seen that the resulting data is quite different:
SELECT mean FROM "meanResponseTime_window2s5m_cq" WHERE $timeFilter
SELECT mean FROM "meanResponseTime_window2s5m_stream" WHERE $timeFilter
SELECT mean FROM "meanResponseTime_window2s5m_batch" WHERE $timeFilter
SELECT mean FROM "meanResponseTime_window2s5m_batch_mean_in_query" WHERE $timeFilter
GRAPHIC WINDOW EVERY 2s PERIOD 5m
The CQ, according to the documentation, executes at boundary of last complete interval, not at each 2s for the last 5m as I want to get. So I cannot use this way to obtein the desired behaviour.
The Kapacitor Batch with the mean calculation in the query looks like wait till the period of time is complete, I mean, the last point that it writes is now()-5minutes, why?
And about Kapacitor Stream and the others… why are they so different? Something wrong in the configuration? If the behaviour is correct, how they realy work?
Sorry if these questions are so obvious, but after a time trying to understand them I haven’t found the answer.
Thanks in advance.