Efficient way to process lots of data with Kapacitor

aysenur · April 2, 2021, 1:10pm

Hello,

I am working on processing sensor values that I store on InfluxDB with Kapacitor. I have a UI that enables users to define transform functions to sensor values. The approach I implemented is:

First write all sensor data to InfluxDB
Then using tickscripts read that data, do transforms on the data(I am using stream node) and write results into another database.
I am not sure if we can replace the data in the database, I could’t find a way to do this, so I am writing the processed data to another db with the same measurement and field names.

I don’t like this approach since:

I am storing data in 2 different databases. Even though one is storing processed data, some sensor values may not need pre-processing (the transform functions defined by users through UI I implemented.). In this case, my automatically created tickscript will write the same data on the other database. Seems like a replication of data and using memory unnecessarily.
If I have lots of sensors, lets says 1000, then I will have at least 1000 tickscripts. I am concerned that this will put too much work on CPU (since I am using stream node to transform data points).

I could’t think of a more efficient way. Maybe someone had encounter these problems as well and can help me. Or can you share better approach ideas with me?

Thanks in advance,
Aysenur

Anaisdg · April 6, 2021, 8:27pm

Hello @aysenur,
Welcome!
Yah I wouldn’t recommend using Kapacitor for this type of transformation work.
Are you against using a client to handle transformation and write to a new bucket? Or simply perform the transformation and visualize the results for the user?
Why does the transformed data need to be written back into InfluxDB?
Are users performing different transformations n the same datasets?
Finally, have you considered using 2.0, Flux, and Tasks?

aysenur · April 7, 2021, 2:00pm

Thank you for your response! @Anaisdg

I want to let users play with the processed data so they can create their own dashboards, maybe use this new data for machine learning models(like for anomaly detection) that are provided in my app. This is why I need them in the db.

In my implementation (I used Telegraf, InfluxDB, Kapacitor and changing some ui in the Chronograf), every field value can be assigned to a function to transform. And since I’m almost complete with my project, I’m not willing to to use 2.0.

The main problem seems like Kapacitor since it uses CPU a lot. Maybe I’m not experienced enough to use it and maybe, as you suggessted, I should look into a client to handle transformations. At some point, I even ended up filling up my whole disk with Kapacitor logs

Anaisdg · April 7, 2021, 2:57pm

Hello @aysenur,
Yes, I’d recommend trying out a client. Please let me know how you’re getting along. This seems like a really cool project.

aysenur · April 8, 2021, 2:12pm

I will! Thank you for your help! @Anaisdg

Topic		Replies	Views
Real time data on third party apps InfluxDB 2 influxdb , telegraf , kapacitor	1	478	September 21, 2020
Kapacitor reactive Batch Querying influxdb , telegraf , kapacitor	1	409	May 17, 2021
[Solved]Create a data generator Store influxdb , time-series , kapacitor	1	1261	May 27, 2018
Dump converted data to a new measurement? Store influxdb	9	1846	July 2, 2019
Upload local influxdb query to influxdb cloud influxdb , telegraf , kapacitor , influxdb-cloud-2-0 , pandas	1	937	December 15, 2021

Efficient way to process lots of data with Kapacitor

Related topics