I have a set up with Influx + Grafana to which I inject/write various data through a python script. I want to optimize this script to write that data.
In my script I convert my data format to a line format, limiting and dividing the data so that the injected data is grouped together depending on the timestamp and frequency of the data and I make sure to limit the amount of fields in each line.
Usually I write about 1 gb per hour, sometimes it can be at once, and some times periodically over that time. Which I realize isn’t much. The thing is that I would like to increase that amount in some instances - for example if I need to reprocess the data, or add extra large amount of data from a different source. In such cases I’ve encountered some performance issues when trying to inject too much.
I’m still trying to optimize the data injection to make it quicker and more resource efficient.
I was wondering about the settings for the writeOptions:
For example I’ve tried playing with the batch_size:
I tried a dynamically set batch_size as I thought it would better - depending on the amount of data that needs to be written. So currently the batch size is a third of the amount of data points that needs to be injected.
I am wondering how should I improve my writing script?
What other parameters and strategies should I implement to improve the writing performance and reduce the amount of post-processing that influx itself needs to do?
Thanks for the quick reply Jay!
In the table, “Writes per second” value is data points or batches/lines?
Currently I’m running a 2 core 4 GB machine, depending on a few parameters I might send write requests of 50K lines with 40 data points each. Would you say its too much?
I’ve seen the optimize writes document, my first batch_size was derived from there, but I had trouble during some writes - especially with a large amount of lines with a small timestamp difference (so for example many 40 Hz signals over 50K lines).
This is a learning experience for my, so I’m trying to understand where I’m going over the limits.
Would you say that its too much?
In the optimization document, its said that the optimal batch size is 5000, so if we are talking about data points, and I have 1.5-2 million data points to inject, should I make 300~ write requests with 5000 data points?
I tried passing the 1.5 KK data points to the influx write api with the batch_size of 5K, I guess this was one of the issues that impacted the performance? so my write requests should fit the defined batch_size?