Hello. I manage an InfluxDB which is currently handling about 3MB per second / 25000 points per second. I have been experimenting with implementing a filters and re-formatting for some of the incoming metrics by inserting Kapacitor between our hundreds of Telegraf instances and and our InfluxDb instance.
Currently we have Telegraf posting metrics to InfluxDb, and Kapacitor subscribing over UDP to get some metrics for alerts.
But I was experimenting with having the hundreds of Telegraf instances hit the /write endpoint on Kapacitor first, then define a task which can filter / modify the data before sending to Influx.
I started out with the following task because I needed to be able to filter on the measurement name:
stream |from() .database('aws') @createTagFromMeasurementName() |delete() .tag('measurementName') |influxDBOut() .database('aws') .retentionPolicy('autogen')
the source code for the UDF is here: https://gist.github.com/forestjohnsonpeoplenet/a5fb6fd2916b696e167e753c37fd9f10
It worked, but kapacitor started allocating about 100MB of RAM per second and not releasing it, eventually resulting in a crash. This was the version 1.2.1 docker image of kapacitor with the following patch applied to kapacitord: https://github.com/influxdata/kapacitor/compare/master...forestjohnsonpeoplenet:patch-inputvalidation?expand=1
Also, during the time it was allocating all that ram, the HTTP api would respond with 404 for an unknown url, but any real methods would hang forever, such as
kapacitor list tasks or
kapacitor show xyz
During this time I looked at
top and the
kapacitord process was the one mongling memory, not the UDF process.
I tried removing the measurementName UDF from the script, yielding:
stream |from() .database('aws') |influxDBOut() .database('aws') .retentionPolicy('autogen')
That script seemed to be able to run just fine, allocating about 10 MB of RAM per second and topping out at about 400MB. Also the HTTP api remained perfectly available while running this task.
I don’t know if this matters but we are running kapacitor in Rancher as a docker container along with a couple configuration containers. The same host is running HAProxy and InfluxDb. Its a 16 core host with 64GB of RAM (m4.4xlarge in AWS) .