Summing multiple series

vgrivel · February 26, 2018, 2:54pm

Hi,

I’m using influxdb 1.2.2 to store metrics from Kafka. Each brokers send their metrics every 15 seconds to influx via jmxTrans.
One of the metrics I’m interested in is the number of msg/s for one topic. To get the throughput of that kafka topic for the cluster I need to sum the msg/seconds on each broker for a time period.

For instance, I have 3 servers and that ingest 1/3rd of the data of the topic example.
Server 1: 10 msg/sec
Server 2: 10 msg/sec
Server 3: 10 msg/sec

Total, I should have 30 msg/sec

My problem is that for some reasons, when I query the data I can have:

SELECT sum(oneMinRate) AS sum FROM myRP.kafka_server_topic WHERE cluster = ‘myKafkaCluster’ AND typeName = ‘type=BrokerTopicMetrics,name=MessagesInPerSec,topic=example’ AND time >= ‘2017-12-19T13:07:30Z’ AND time < ‘2017-12-19T13:09:00Z’ GROUP BY time(15s, 0s), cluster

name: kafka_server_topic
tags: cluster=myKafkaCluster

time sum

1513688850000000000 30
1513688865000000000 30
1513688880000000000 30
1513688895000000000 20
1513688910000000000 40
1513688925000000000 30

At the time: 1513688895000000000 one server didn’t store any value and at the next window, I have 2 values.

Steps taken:
JMXTrans try to spread the load and start to collect data at random(15seconds) and then every 15 seconds. This means one server can report the data at seconds 01, 16, 31 and 46. Then another one can report at 14, 29, 44 and 59.
As you can understand, between the trigger and the actual collection time, it can happen 1 seconds and instead of collecting at second 59, it will be at 0.
My mitigation was to narrow down the collection windows. It is now between second 5 and 10.
But this does not work, because influx does not always do the window at 0, 15, 30 and 45, but sometimes at 7, 22, 37 and 52.

As you can see on the screenshot, if I groupBy(15s) I have all the bumps, but if I groupBy(15s,5s) the charts is right.

My question: Is there a way to force influx to create the time window at 0, 15, 30, 45? The groupby(15s,5s) does not work on each situation, sometimes it is groupby(15s,Ns) where N is between 1 and 14.

Thanks for the help.

Igor · May 2, 2018, 9:12pm

The problem is that multiple points in the downsampling interval get summed up by the sum() aggregator, intended for aggregating only by cluster.

The way InfluxQL works is that a single query only supports a single aggregator, but you can use nested queries!
So the inner query will group by time and * (all tags) and average all points in the interval,
and the outer query will group by time (same interval) and cluster tag and sum up all the averaged points per cluster.

Here is example query with downsampling:

SELECT sum("mean")
FROM
(
	SELECT mean("oneMinRate")
	FROM myRP.kafka_server_topic
	WHERE ...
	GROUP BY time(15s, 0s), *
)
GROUP time(15s, 0s), cluster

Although I would recommend to use a larger interval, just in case if there are missing values per node.

Actually I found this issue because Grafana query editor for InfluxDB doesn’t support downsampling aggregator, so I had to manually create nested InfluxQL query by toggling edit mode.
I am transitioning from OpenTSDB and the downsampling aggregator has always been available there as the first class citizen, so I was surprised that Grafana still doesn’t support that for InfluxDB, especially as they are partners. My guess is that it is because InfluxDB is still young and subqueries were only introduced in version 1.2:

vgrivel · May 11, 2018, 1:53pm

Hi Igor,

Thanks for your response. I think downsampling with a subquery should solve my problem.
I do not have much time those day to try. I will come back to you to confirm the solution.

Vincent

Topic		Replies	Views
Query too slow when group by small time interval InfluxDB 2 influxdb , time	1	983	January 25, 2021
Double nested query yields unexpected results Dashboards influxdb , influxql	0	529	November 23, 2018
Graphing the sum of several counters Dashboards	5	6608	March 16, 2017
Graphing the sum of packet sizes per minute as mbits/sec Dashboards	2	812	May 11, 2020
Aggregation of data arriving ”late” in kapacitor kapacitor	5	957	October 16, 2020

Summing multiple series

Related topics