Intensive insertion of little batch size

bloub · September 3, 2018, 5:21pm

Hello,

Our use case is a lot of devices sending us very few data but very frequently. We make a prototype of data collect using influxdb, but our use case lead us to a lot of insert into influxdb with very small batch size (through java client).

Thanks to the technical paper, we know that the performance of small batch size is very poor compared to a larger batch size. Howerver, we are able to saturate our influxdb node with some very low input.
Only 3000 field values per second on a 8 CPU, 16 GB ram server are able to saturate our node.

Is there a specific approach to manage our use case ?
Also when influxdb is saturate it only returns a 500, is there not any queue process available ?

As a solution we think about using a queue mechanism in entrance that will bufferise timeseries entries before to send them to influx with bigger batch size but we want to be sure that influxdb does not have any feature that could help us better.

Thomas_Jalabert · September 4, 2018, 12:40pm

Hi,

An increasing number of people are using influx for IoT but I don’t think Influx is built to handle huge quantites of small http call.

I think the best approach would be to use Influx with the rest of the TICK stack. Telegraph along with Kapacitor is probably the tools that you need to increase your throughput.

In any case, I’m not an Influxdata expert and I will be very interested in a more technical answer.

gdfast · September 12, 2018, 10:35pm

What may work best for you as an architectural approach is to stick a durable pub/sub queue between your IoT devices and Influx. There’s a lot out there, Apache Kafka probably being the most famous. Then, read off that queue in larger batches and load into influx. We took a similar approach in a product I work on, and we gather batches of 5,000 values or 2 seconds worth of values (whichever we get first), to more optimally load influx. These queue technologies typically handle the write load very well with pretty minimal performance impact.

Thomas_Jalabert · September 13, 2018, 5:26pm

My current company already use extensively kafka in front of large collection systems.
Performance tests demonstrated this is a very limited solution as kafka stores data on disk temporarely instead of improving throughput, it divides it by 2 before doing anything else, as I/O is the bottleneck on writes.

I think a broker solution using in-memory queues would be more efficient.

Topic		Replies	Views
Good fit for influxdb? influxdb	3	934	June 16, 2017
Could Someone Give me Advice for Managing Large Scale IoT Data with InfluxDB? influxdb , telegraf	0	29	December 24, 2024
What is the best way to insert 100k messages per second Telegraf	2	798	November 13, 2018
Performance limits on low-end hardware Store influxdb , iot	0	871	July 27, 2018
Massive writes combined with single continuous query Telegraf influxdb	0	533	May 24, 2018

Intensive insertion of little batch size

Related topics