Hi Jack,
I am looking at moving many million measurement data points from PostgreSQL to InfluxDB. The data goes back to early 2014 and comes from roughly 1500 devices, with roughly a dozen field values per measurement.
I have worked out my series tags and fields and I would like some suggestions for how to import the data in such a way that the database will be as efficient as possible. You said, “Inserting data in chronological order is also more performant … sorting [tags] into alphabetical order can also increase write throughput …”
My original intention was to load InfluxDB series-by-series i.e. grab the PostgreSQL measurements corresponding to a particular InfluxDB series and load those into InfluxDB in timestamp order, then move on to the measurements for the next series, and so on. Would it be better to instead simply work through the PostgreSQL table in timestamp order rather than in “series” order?
Finally, would my loader (a Perl script) be better off writing batches directly to InfluxDB using HTTP, or would it be faster to generate flat files containing InfluxDB line protocol and then import them using the influx -import
command?
Thanks!