InfluxDB v2 Java client write performance

Hello,

I have been using InfluxDB v1.x for nearly 1.5 years now. Recently, we started exploring InfluxDB v2.0. However, in the initial analysis it seems that the v2.0 write performance or the WriteApi in influxdb-client-java for v2/0, substantially underperforms compared to an older version like v1.8, using the older java client APIs for write. A write task of about 80K rows, was taking ~350ms with v1.x, and it is taking around 1200ms for the exact same data and time precision, in v2.0. While it is understandable that there would be optimizations incorporated in v2.0 in the near future, I was surprised to see a drop of ~4X.

I would like to know if this is a known issue and if there are workarounds to overcome this issue. I would be happy to provide more information if needed.

Hi @dhruvgarg,

there isn’t know issue that impact performance for v2 client.

Could you share how your code looks like for v1 and v2 client?

Regards

Hello @bednar ,

I had written a code to read 8 files of line protocol data (nearly 1MB each, 5760 rows per file) → obtain a connection to the InfluxDB instance → perform writes in parallel using a thread-pool of size 8. For both v1 and v2, one connection is created and that same connection is passed on to the 8 threads. As of now, I am attaching snippets from the code that perform the write to InfluxDB.

Influx v1 client

Creating a connection with the v1.8 Influxdb instance.

InfluxDB influxDB = InfluxDBFactory.connect("http://localhost:8086", "root", "root");

Each data file is read in advance and stored in the data structure List<String> as accepted by Influxdb write APIs. The connection and the file data is then passed to the task given below.

class TaskV1 implements Runnable {
public InfluxDB influxDB;
public List<String> filedata;

public TaskV1(InfluxDB influxDB, List<String> filedata) {
    this.influxDB = influxDB;
    this.filedata = filedata;
}

public void run() {
    long ingest_start = System.currentTimeMillis();
    influxDB.write("temp", "autogen", InfluxDB.ConsistencyLevel.ONE, TimeUnit.MILLISECONDS, filedata);
    long ingest_end = System.currentTimeMillis();
    System.out.println("Data ingestion time: " + (ingest_end - ingest_start));
    }
}

Similarly, for InfluxDB v2 client

Creating a connection

InfluxDBClient influxDBClient1 = InfluxDBClientFactory.create("http://localhost:8086", token, org, bucket);

Passing the connection and file data to the parallel task

class TaskV2 implements Runnable {
public InfluxDBClient influxDBClient;
public List<String> filedata;

public TaskV2(InfluxDBClient influxDBClient, List<String> filedata) {
    this.influxDBClient = influxDBClient;
    this.filedata = filedata;
}

public void run() {
    long ingest_start = System.currentTimeMillis();
    try (WriteApi writeApi = influxDBClient.getWriteApi()) {
        writeApi.writeRecords("temp", "org", WritePrecision.MS, filedata);
    }
    long ingest_end = System.currentTimeMillis();
    System.out.println("Data ingestion time: " + (ingest_end - ingest_start));
    }
}

Please let me know if this is suffice or more information is needed. I had followed the influxdb-client-java page for the data-write example. One difference between v1 and v2 write is the try part in the v2 client. I am not sure if that is causing some wait leading to extended write time. I have also tried creating 8 separate connections and passing it to the individual threads. However, I got the same result. I would be happy to try out any changes that you suggest to make the v2 write comparable or better than v1 write.

Thanks in advance!

@dhruvgarg, thanks for your detail information.

The influxDBClient.getWriteApi() is suppose to run as a long live singleton, because its use a underlaying batching in background thread.

For your purpose will be better to use WriteApiBlocking.

I am happy to help you to achieve an expected performance.

Hello @bednar ,

Changing from WriteApi (asynchronous, non-blocking) to WriteApiBlocking worked! I am getting write performance similar to v1.X now. Thanks a lot for the help. :smile:
This means that while InfluxDB java client for v1.X did not mention asynchronous or synchronous writes, it was implicitly synchronous?

Also, is there any updated documentation for writing data into InfluxDB in case I want to further optimize the ingestion rate with v2? Optimal batch size for writing a set of data points?

Hello @dhruvgarg,

You could use batch writes.

Currently the InfluxDB OSS guidelines is also relevant for OSS 2.0.

Hello @bednar ,

I am now writing data to InfluxDB v2 instances running on an amd64 desktop as well as an arm64 Raspberry Pi. As you had suggested earlier, I am using the WriteApiBlocking method for bulk-writes into InfluxDB.

I was exploring the link on batch writes you had sent previously, but it seems to be on v1.X. Where do I configure the batching options for the v2 influxdb-java-client? How much performance improvement can I expect?

Also, to remind you of some previous details - I have line protocol data in text files and am providing its data to writeApiBlocking(). However, I wanted to know of the most optimal way to write data into InfluxDB from a file. Its okay if the file contents are not generic (like a csv) but rather customized for InfluxDB (like the line protocol).

Or if writeApiBlocking() and enable batch options are my best bet, I can continue using that. Please do let me know.

If you have a file with LineProtocols that the best performance will be probably with WriteApiBlocking. You could prepare batches that fit: Hardware sizing guidelines | InfluxDB OSS v1 Documentation. Something like can work pretty well:

package example;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;

import com.influxdb.client.InfluxDBClient;
import com.influxdb.client.InfluxDBClientFactory;
import com.influxdb.client.WriteApiBlocking;
import com.influxdb.client.domain.WritePrecision;
import com.influxdb.exceptions.InfluxException;

import io.reactivex.Flowable;

public class WriteData{

    private static char[] token = "my-token".toCharArray();
    private static String org = "my-org";
    private static String bucket = "my-bucket";

    public static void main(final String[] args) {

        try (InfluxDBClient client = InfluxDBClientFactory.create("http://localhost:8086", token, org, bucket)) {

            WriteApiBlocking writeApi = client.getWriteApiBlocking();

            File file = new File("/path/to/line_protocols.txt");

            // Create Line-By-Line reader
            Flowable<String> lineProtocols = Flowable.using(
                    () -> new BufferedReader(new FileReader(file)),
                    reader -> Flowable.fromIterable(() -> reader.lines().iterator()),
                    BufferedReader::close
            );

            lineProtocols
                    // create batch by 5_000 rows
                    .buffer(5_000)
                    // write 5_000 rows
                    .subscribe(batch -> {
                        System.out.println("Writing... " + batch.size());
                        writeApi.writeRecords(WritePrecision.NS, batch);
                    });

        } catch (InfluxException ie) {
            System.out.println("Exception: " + ie);
        }
    }
}

Hello @bednar , thank you very much for your assistance and the sample code. I used this code for the data ingestion, however, I got almost similar results. This is understandable because the batch size is 5000 rows and each file itself has about 5,800 rows. I also tried multiple batch sizes (1000/2000/3000 rows), but again got similar results.

Is there something else that I can try? I also changed the timestamp (in the file) and time precision in the writeRecords() call, but did not notice major differences. I was expecting to get reduced ingestion time when I moved to coarser timestamps e.g. from MS (milliseconds) to S (seconds). Is support for minutes expected too? And will it result in better speed?

Also, in the writeApi, there are methods that can write multiple points. It might be possible to read from a file → build points → ingest points into InfluxDB. I would like to know from you if bulk data ingestion through line protocol is still better than bulk writes from multiple points?

Is there something else that I can try?

Probably not.

Is support for minutes expected too?

No, we only supports precisions specified in: Write data to InfluxDB | InfluxDB Cloud (TSM) Documentation

And will it result in better speed?

The best performance will be with S precision - Optimize writes to InfluxDB | InfluxDB Cloud (TSM) Documentation

Also, in the writeApi, there are methods that can write multiple points . It might be possible to read from a file → build points → ingest points into InfluxDB. I would like to know from you if bulk data ingestion through line protocol is still better than bulk writes from multiple points ?

Yes, the writes with bulk of LineProtocols is better, because you avoid underlaying type casting.