Aggregation of data arriving ”late” in kapacitor

oplehto · January 31, 2019, 7:49am

We have a data stream with sends points every 1s per host and would like to aggregate this as mean rate over 10s interval for groups of hosts, differentiated by a tag (“bu”), and written to InfluxDB.

The challenge is that a handful of points may land in Kapacitor so late that they miss the initial 10s window where data is aggregated, creating a new window and ending up overwriting the previous point in InfluxDB.

I’ve tried various methods focused mainly around having another window to buffer the results from the first one but have failed to find a workable solution. Any help is greatly appreciated.

One constraint is that the amount of incoming data is significant (total amount of points is around 50k / sec) so the solution should be fairly resource-efficient.

Below is the best I’ve come up with thus far where the late points are dropped completely but is not really useful as we need to have all the points. Another solution would be to make the window larger but we would lose the 10s resolution which is also important.

var data = stream
|from()
  .measurement('m')
  .groupBy('bu','host')
  .where(lambda: isPresent("val"))
  .round(1s)
|derivative('val')
    .unit(1s)
    .nonNegative()
    .as('val')
|window()
  .period(10s)
  .every(10s)
  .align()
|where(lambda: unixNano(now()) - unixNano("time") < 10000000000)
|mean('val')
|GroupBy('bu')
|sum('mean')
|InfluxDBOut()
  .measurement('faststats')
  .database('tg_udp')
  .retentionPolicy('tg_udp')

Ashish_Sikarwar · July 14, 2020, 6:56pm

have you ever got a solution/workaround for this?

oplehto · July 15, 2020, 5:28am

Unfortunately not. The workaround of dropping old points has been working well enough in conjunction of reducing the tail latency of our metrics pipeline.

A more elegant workaround could be to do a relatively simple UDF that sets up a “fixed accumulating window” so all the data for a given fixed time period gets accumulated into a window which gets passed on only after it reaches a certain age.

Any data that arrives really late for that time period and misses the window is either dropped or written to a new window which has its timestamp is slightly jittered so that they will not overwrite the initial window. I think both approaches for managing old data can be useful depending a bit on how one wants to use it.

Ashish_Sikarwar · October 12, 2020, 5:57pm

Thanks a lot @oplehto

Anaisdg · October 16, 2020, 5:41pm

Hello @Ashish_Sikarwar,
You might be interested in the execd processor plugin
telegraf/plugins/processors/execd at master · influxdata/telegraf · GitHub.

Ashish_Sikarwar · October 16, 2020, 6:08pm

Thanks a lot @Anaisdg will try it.

Topic		Replies	Views
groupBy time not working with stream influxdb , kapacitor	5	698	December 15, 2020
Combine after groupBy Kapacitor kapacitor	0	568	March 10, 2020
Kapacitor - Missing from point cannot aggregate influxdb , kapacitor	0	636	April 15, 2020
Processing irregular (in time) sensor data Kapacitor kapacitor , iot , datalifecycle , time , influxql	0	1146	October 14, 2017
How to aggregate data on duration Telegraf kapacitor	2	707	October 30, 2018

Aggregation of data arriving ”late” in kapacitor

Related topics