Write optimization possibilities

influxdb
#1

We deal with projects that need high rate of writes per second. Also our datasources(sensors) are tree-organized, there can be thousands of datasources with the same basic name and that cannot be easily translated in a compact name. And the problem is that API requires that we have to specify measurement name for every single point that we write. This is quite a flaw if you deal with high rate data series on a single measurement.

We then optimize this in a way that measurements are named as integer numbers.
1 value=76.000000 1549492316487504000
1 value=78.000000 1549492316497504000
1 value=80.000000 1549492316507504000
1 value=82.000000 1549492316517504000
1 value=84.000000 1549492316527504000

Then we maintain internal mapping definition that translates our long identification names to these integer numbers. The drawback is there is confusion of measurement names when editing dashboard in grafana. It’s very hard to differentiate what is what and we have to manually look at the mapping definitions to set aliases in grafana.

Do you know some better ways to optimize payloads? What are additional strategies to improve performance / payload size?

We’ve been thinking about possible ways how you could maybe improve bottlenecks, please correct me if any of these is already supported:

  1. Use of views where we could define custom translations from numbers to aliases like in SQL databases

  2. Grouping of data chunks under measurement name and tags which is defined in a head of a chunk, and also tags, if dealing with high cardinality could be grouped in this fashion, something like
    measurement=velocity1,event=runup
    value=76.000000 1549492316487504000
    value=78.000000 1549492316497504000
    value=80.000000 1549492316507504000
    value=82.000000 1549492316517504000

    measurement = velocity2, event=runup
    value=16.000000 1549492316487504000
    value=18.000000 1549492316497504000
    value=10.000000 1549492316507504000
    value=12.000000 1549492316517504000

And Influxdb would take care that data is inserted in belonging measurements.

  1. Use offsets for time points. Write payload could be shrinked if we could state base time offset in head and on following points we would just state time deltas.
    measurement=velocity1, numerator=100, [1549492316487504000]
    value=76.000000 0
    value=78.000000 1000000
    value=80.000000 2000000
    value=82.000000 3000000
    [1549492316487504000] is time of first sample following by others as delta time, the formatting is just an example.
    Influxdb server would then internally take care of putting correct timestamps onto separate points before storing data to database.

  2. Suport for synchronous data. Deltas for data in our case are always constant. Our data has fixed sample rate, so payload could look like:
    measurement=velocity1, [1549492316487504000, 100/1]
    value=76.000000
    value=78.000000
    value=80.000000
    value=82.000000
    [1549492316487504000, 100/1] means initial time and samplerate (numerator/denumerator), this is just an example of formatting.
    Influxdb would internally take care of putting correct timestamps onto separate points.

  3. Better API support – dealing with data as text is extremely slow process (serialization, deserialization, large payloads). It would be better that you would provide drivers like mature SQL databases. Fast optimized payloads would be entirelly cared by Influx.

#2

Hi @sampler,

That’s a great question and I do not know the answer. I’ve reached out to my colleagues and will report back when I know more.

#3

Hi @sampler, can you provide a couple examples of the problematic lines you were sending to InfluxDB when you realized you wanted to compact them?

#4

@Sam it’s rather simple. In order to insert a series of tens of thousands of point of same kind of data, we need to include same repeating keywords to each data point. The result is excessive overhead slugging building influxdb command, sending it through the network and probably also parsing / processing it in influxdb.

In our case measurement names (source names) can be, along with tags, hundreds of characters. The reason is that we computer generate names to ensure uniquity accross organization where we manage tens of thousands of data sources and will continue to grow. Therefore the model is tree-organized. We cannot use end-leaf aliases for measurement names (without the tree path) because aliases don’t identify nodes uniquely.

Example:

  • current way of how Influx requires that inserts are specified:
    “rotationmachineryplugin/channels/setupname/data/Math/module1/power1/7000;0;0;2000011”,numerator=100,denumerator=1,clock=asynchronous value=76.000000 1549492316487504000
    “rotationmachineryplugin/channels/setupname/data/Math/module1/power1/7000;0;0;2000011”,numerator=100,denumerator=1,clock=asynchronous value=86.000000 1549492316487505000
    “rotationmachineryplugin/channels/setupname/data/Math/module1/power1/7000;0;0;2000011”,numerator=100,denumerator=1,clock=asynchronous value=96.000000 1549492316487506000
    “rotationmachineryplugin/channels/setupname/data/Math/module1/power1/7000;0;0;2000011”,numerator=100,denumerator=1,clock=asynchronous value=106.000000 1549492316487507000
    “rotationmachineryplugin/channels/setupname/data/Math/module1/power1/7000;0;0;2000011”,numerator=100,denumerator=1,clock=asynchronous value=116.000000 1549492316487508000

  • possible optimized way Influx could deal with same data:
    “myplugin/channels/setupname/data/Math/module1/power1/7000;0;0;2000011”,numerator=100,denumerator=1,clock=asynchronous 1549492316487504000
    value=76.000000 0
    value=86.000000 1000
    value=96.000000 2000
    value=106.000000 3000
    value=116.000000 4000

Imagine inserts of hundreds of thousand points of the same kind like in a batch for both cases. With reducing payload of insert parameters we could reduce footprint massively. If we used measurement names with 255 chars in length, this optimization could us save more than 90% of payload.

Because this is not supported, we use (autoincrement) integers to map between both worlds but then, like I said, grafana user experience suffers although there’s even more room for improvement like I specified in my first post. Hope this clears the problem a bit more, otherwise feel free to ask. Thanks.

#5

@sampler Yep, totally understood. That’s a particularly long measurement name and I’m sure you have many of them. What does value represent?

I ask because when I see value= in a line of Line Protocol, it indicates to me that there are data encoded into the measurement names. In your case, I can’t presume what that might be. However, if you are, you could significantly reduce payload by taking advantage of how Line Protocol is truly designed. Line Protocol is designed to send multiple/many metrics per line/record:

Take the dataset below for example:

First the way that creates the larger payload:

cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_system=15.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_guest=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_guest_nice=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_idle=35.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_iowait=0.2 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_irq=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_nice=1.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_steal=2.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_softirq=2.5 <timestamp>

Now the smaller payload (optimized Line Protocol–the way Telegraf outputs by default):

cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0,usage_system=15.0,usage_guest=0.0,usage_guest_nice=0.0,usage_idle=35.0,usage_iowait=0.2,usage_irq=0.0,usage_irq=0.0,usage_nice=1.0,usage_steal=2.0,usage_softirq=2.5 <timestamp>

Btw, InfluxDB also supports Gzip.

Hope that helps!

#6

@Sam, the value is in general a physical quantity (or derivation) e.g. acceleration, velocity, volt, ampere, temperature, pressure etc… However, sources are independent - the data is acquired from different sources at different rates or generated on non-related events. Timestamps accross different sources rarely coincide.
By applying real time data linking via timestamp accross sources we would unfortunately gain very little at fairly high expense (resources).
Additionally we also need to store arrays, more precisely, FFT data. In this case the syntax is more similar to your second example. But the resolution is always fixed per one measurement - for a single timestamp we would always store exactly the same amount of values (FFT lines).
I hope this clarifies why different sources are given different measurements and explains why additional optimization strategies could support more generic solutions.
Thanks.

#7

@sampler, I’m not sure I understand the issue. You said:

possible optimized way Influx could deal with same data:
“myplugin/channels/setupname/data/Math/module1/power1/7000;0;0;2000011”,numerator=100,denumerator=1,clock=asynchronous 1549492316487504000
value=76.000000 0
value=86.000000 1000
value=96.000000 2000
value=106.000000 3000
value=116.000000 4000 

The optimal Line Protocol I posted earlier is pretty much exactly that…with named metrics instead of “value”. You could certainly just have a bunch of “value” fields per record but that, to me, seems meaningless.

#8

@Sam let me try to clarify…
0, 1000… 4000 are time offsets to initial time 1549492316487504000. Initial time is defined in header. This saves payload size instead of stating full epoch time with each value. All the tags and measurement name are also stated in header.
This saves payload size massively.
Yes, indeed fieldname value is obsolete and we could just do it straight, something like:

“myplugin/channels/setupname/data/Math/module1/power1/7000;0;0;2000011”,numerator=100,denumerator=1,clock=asynchronous [value] [1549492316487504000]
76.000000 0
86.000000 1000
96.000000 2000
106.000000 3000
116.000000 4000 

So a fully expanded csv of this case would basically (without the tags and measurement name) look like this:

time,value
1549492316487504000,76.000000
1549492316487505000,86.000000
1549492316487506000,96.000000
1549492316487507000,106.000000
1549492316487508000,116.000000

Are we maybe any closer on terms of understanding?

#9

Yes, regarding only the timestamps, offsets would make the payload smaller. It would also, however, add more compute requirements to the storage engine. Design choice probably. It’s a good note to take back to our engineering/product team–thanks!

You can also use gzip to send data to Influx if payload size is really an issue: https://docs.influxdata.com/influxdb/v1.7/guides/writing_data/#configure-gzip-compression

#10

@Sam, when parsing string stream, the converting integer from character is way more expensive than the addition of two integers. One integer add operation is just negligible if we compare it to parsing long string (epoch time) instead of parsing shorter string (offset). And gzip is just substantially more expensive operation and also serves another purpose (doesn’t make internal operations smarter).

Curently we need to repeat (long) measurement name and the tags all over again for every single measurement point. Imagine, how much processing power you could save on the engine, if you know that the next line of data belongs to same location [tags x measurement] as the previous line of data, because it was already defined by batch header. There is simply no same repeating characters that need to be parsed on every single data point.

Also on the client side the rendering of such payload saves enormous processing power.
And performance of entire system can be pushed further up.

So, if we could put batch data commons (like measurement name, tags, starting time, or even sample rate…) in the batch header (see my examples) that would enable massive savings on all parties. So please consider with you engineering team all the suggestions I posted. Remark those are just pseudo mocks and I’m aware they have to be refined to produce well thought orthogonal and generic solution.

If you have any further questions or need any further advice/help, please let me know. Thanks for listening and all your help.