Optimizing writing performance

DavidHy · November 25, 2022, 10:58am

Hi all,

I have a set up with Influx + Grafana to which I inject/write various data through a python script. I want to optimize this script to write that data.
In my script I convert my data format to a line format, limiting and dividing the data so that the injected data is grouped together depending on the timestamp and frequency of the data and I make sure to limit the amount of fields in each line.

Usually I write about 1 gb per hour, sometimes it can be at once, and some times periodically over that time. Which I realize isn’t much. The thing is that I would like to increase that amount in some instances - for example if I need to reprocess the data, or add extra large amount of data from a different source. In such cases I’ve encountered some performance issues when trying to inject too much.

I’m still trying to optimize the data injection to make it quicker and more resource efficient.

I was wondering about the settings for the writeOptions:

For example I’ve tried playing with the batch_size:

    write_client = self.client.write_api(
        write_options=WriteOptions(
            batch_size=50_000,
            flush_interval=100,
            jitter_interval=2_000,
            retry_interval=2_500,
        )
    )

I tried a dynamically set batch_size as I thought it would better - depending on the amount of data that needs to be written. So currently the batch size is a third of the amount of data points that needs to be injected.

I am wondering how should I improve my writing script?
What other parameters and strategies should I implement to improve the writing performance and reduce the amount of post-processing that influx itself needs to do?

Jay_Clifford · November 25, 2022, 12:07pm

Hi @DavidHy
1GB at once is fair going as an at-once ingest. So this table gives you an idea of influxdb capabilities with ingest.

I would check out these pages:

and this example:

github.com

influxdata/influxdb-client-python/blob/master/examples/write_batching_by_bytes_count.py

"""
How to use RxPY to prepare batches by maximum bytes count.
"""

from csv import DictReader
from functools import reduce
from typing import Collection

import reactivex as rx
from reactivex import operators as ops, Observable

from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write.retry import WritesRetry
from influxdb_client.client.write_api import SYNCHRONOUS


def csv_to_generator(csv_file_path):
    """
    Parse your CSV file into generator
    """

This file has been truncated. show original

DavidHy · November 25, 2022, 1:42pm

Thanks for the quick reply Jay!
In the table, “Writes per second” value is data points or batches/lines?

Currently I’m running a 2 core 4 GB machine, depending on a few parameters I might send write requests of 50K lines with 40 data points each. Would you say its too much?

I’ve seen the optimize writes document, my first batch_size was derived from there, but I had trouble during some writes - especially with a large amount of lines with a small timestamp difference (so for example many 40 Hz signals over 50K lines).
This is a learning experience for my, so I’m trying to understand where I’m going over the limits.

Would you say that its too much?

In the optimization document, its said that the optimal batch size is 5000, so if we are talking about data points, and I have 1.5-2 million data points to inject, should I make 300~ write requests with 5000 data points?
I tried passing the 1.5 KK data points to the influx write api with the batch_size of 5K, I guess this was one of the issues that impacted the performance? so my write requests should fit the defined batch_size?

Thank you for your help,
David

Topic		Replies	Views
Increasing InfluxDB insertion rate via Influx-Python lib	6	4510	January 2, 2019
Odd Batching Behavior - small amounts of data InfluxDB 2	5	677	August 16, 2024
Poor Write Performance on Bulk Import Store	3	2042	October 2, 2018
What is the highest-performance method of getting data in/out of InfluxDB Telegraf influxdb , time-series	12	26757	October 22, 2020
Write data to InfluxDB with Java Client library	7	8356	November 28, 2019

Optimizing writing performance

Related topics