There is a performance tip in influx line protocol docs:
Sort tags by key before sending them to the database.
Could you clarify:
What are the expected performance penalties when this recomendation is not followed?
What performance does it affect and how much?
Will it slow down data load into influxdb only or also affect all future SELECT/SHOW queries?
In my use case I need to merge data lines from csv file with some extra lookup info to produce influx line protocol strings. The final full tags key-value set includes fields from both sources. I suspect, that overhead of sorting this set by key in python script may be higher then performance loss from unsorted tags.
If unsorted tags only affect line protocol data load performance and not queries I’d prefer to keep loader script logic as simple as possible.
One more question:
Are there any guidelines or benchmarks available to compare bulk data loading via http api and “influx -import”? My current input data stream is approx. 500K (will grow to1.5-3M) lineprotocol lines every 5 minutes.