Intensive insertion of little batch size


#1

Hello,

Our use case is a lot of devices sending us very few data but very frequently. We make a prototype of data collect using influxdb, but our use case lead us to a lot of insert into influxdb with very small batch size (through java client).

Thanks to the technical paper, we know that the performance of small batch size is very poor compared to a larger batch size. Howerver, we are able to saturate our influxdb node with some very low input.
Only 3000 field values per second on a 8 CPU, 16 GB ram server are able to saturate our node.

Is there a specific approach to manage our use case ?
Also when influxdb is saturate it only returns a 500, is there not any queue process available ?

As a solution we think about using a queue mechanism in entrance that will bufferise timeseries entries before to send them to influx with bigger batch size but we want to be sure that influxdb does not have any feature that could help us better.


#2

Hi,

An increasing number of people are using influx for IoT but I don’t think Influx is built to handle huge quantites of small http call.

I think the best approach would be to use Influx with the rest of the TICK stack. Telegraph along with Kapacitor is probably the tools that you need to increase your throughput.

In any case, I’m not an Influxdata expert and I will be very interested in a more technical answer.


#3

What may work best for you as an architectural approach is to stick a durable pub/sub queue between your IoT devices and Influx. There’s a lot out there, Apache Kafka probably being the most famous. Then, read off that queue in larger batches and load into influx. We took a similar approach in a product I work on, and we gather batches of 5,000 values or 2 seconds worth of values (whichever we get first), to more optimally load influx. These queue technologies typically handle the write load very well with pretty minimal performance impact.


#4

My current company already use extensively kafka in front of large collection systems.
Performance tests demonstrated this is a very limited solution as kafka stores data on disk temporarely instead of improving throughput, it divides it by 2 before doing anything else, as I/O is the bottleneck on writes.

I think a broker solution using in-memory queues would be more efficient.