What are the expected performance penalties with unsorted tag keys in line protocol?

yuyu · March 21, 2017, 5:57pm

There is a performance tip in influx line protocol docs:

Sort tags by key before sending them to the database.

Could you clarify:

What are the expected performance penalties when this recomendation is not followed?
What performance does it affect and how much?
Will it slow down data load into influxdb only or also affect all future SELECT/SHOW queries?

In my use case I need to merge data lines from csv file with some extra lookup info to produce influx line protocol strings. The final full tags key-value set includes fields from both sources. I suspect, that overhead of sorting this set by key in python script may be higher then performance loss from unsorted tags.

If unsorted tags only affect line protocol data load performance and not queries I’d prefer to keep loader script logic as simple as possible.

One more question:
Are there any guidelines or benchmarks available to compare bulk data loading via http api and “influx -import”? My current input data stream is approx. 500K (will grow to1.5-3M) lineprotocol lines every 5 minutes.

jackzampolin · March 21, 2017, 6:03pm

@yuyu The overhead in Python would be much higher than in golang (InfluxDB)! The unsorted keys only affect write performance, and only at the margins.

I’ve actually found influx -import a little slower than the HTTP API for usecases like this. When you are writing just make sure to break the points up into batches. Shoot for batches of around 10k field values.

Hope this helps!

yuyu · March 21, 2017, 6:21pm

Thanks for prompt reply! That was exactly what I expected. I already do http post in batches, didn’t notice big difference in 4K-10K batch size range (no exact benchmarking though).

What I observe are some rare sporadic return_code 500: {“error”:“timeout”} errors after batch post call, usually 1 more post retry is enough to push batch to db. But thats another story.

jackzampolin · March 21, 2017, 6:26pm

@yuyu You should build your clients to expect backpressure and retry those requests. Telegraf implements this functionality natively. Occasional 500s are expected, but anything with more frequency can point to issues.

You are right, not a big difference between 4k and 10k batches.

Topic		Replies	Views
Import tool tags vs fields Store	3	681	February 13, 2019
What is the highest-performance method of getting data in/out of InfluxDB Telegraf influxdb , time-series	12	26546	October 22, 2020
Does storing UUID as tag affect performance of influx? influxdb	0	986	June 6, 2019
Export to line protocol looses precision InfluxDB 2 influxdb , cli	3	951	October 5, 2021
Insert Performance Bulk Data InfluxDB 2 php	1	1707	January 3, 2022

What are the expected performance penalties with unsorted tag keys in line protocol?

Related topics