We deal with projects that need high rate of writes per second. Also our datasources(sensors) are tree-organized, there can be thousands of datasources with the same basic name and that cannot be easily translated in a compact name. And the problem is that API requires that we have to specify measurement name for every single point that we write. This is quite a flaw if you deal with high rate data series on a single measurement.
We then optimize this in a way that measurements are named as integer numbers.
1 value=76.000000 1549492316487504000
1 value=78.000000 1549492316497504000
1 value=80.000000 1549492316507504000
1 value=82.000000 1549492316517504000
1 value=84.000000 1549492316527504000
Then we maintain internal mapping definition that translates our long identification names to these integer numbers. The drawback is there is confusion of measurement names when editing dashboard in grafana. It’s very hard to differentiate what is what and we have to manually look at the mapping definitions to set aliases in grafana.
Do you know some better ways to optimize payloads? What are additional strategies to improve performance / payload size?
–
We’ve been thinking about possible ways how you could maybe improve bottlenecks, please correct me if any of these is already supported:
-
Use of views where we could define custom translations from numbers to aliases like in SQL databases
-
Grouping of data chunks under measurement name and tags which is defined in a head of a chunk, and also tags, if dealing with high cardinality could be grouped in this fashion, something like
measurement=velocity1,event=runup
value=76.000000 1549492316487504000
value=78.000000 1549492316497504000
value=80.000000 1549492316507504000
value=82.000000 1549492316517504000
…
measurement = velocity2, event=runup
value=16.000000 1549492316487504000
value=18.000000 1549492316497504000
value=10.000000 1549492316507504000
value=12.000000 1549492316517504000
And Influxdb would take care that data is inserted in belonging measurements.
-
Use offsets for time points. Write payload could be shrinked if we could state base time offset in head and on following points we would just state time deltas.
measurement=velocity1, numerator=100, [1549492316487504000]
value=76.000000 0
value=78.000000 1000000
value=80.000000 2000000
value=82.000000 3000000
[1549492316487504000] is time of first sample following by others as delta time, the formatting is just an example.
Influxdb server would then internally take care of putting correct timestamps onto separate points before storing data to database. -
Suport for synchronous data. Deltas for data in our case are always constant. Our data has fixed sample rate, so payload could look like:
measurement=velocity1, [1549492316487504000, 100/1]
value=76.000000
value=78.000000
value=80.000000
value=82.000000
[1549492316487504000, 100/1] means initial time and samplerate (numerator/denumerator), this is just an example of formatting.
Influxdb would internally take care of putting correct timestamps onto separate points. -
Better API support – dealing with data as text is extremely slow process (serialization, deserialization, large payloads). It would be better that you would provide drivers like mature SQL databases. Fast optimized payloads would be entirelly cared by Influx.