we’re planning to use various field keys, like “chart_linear_temperature[°C]”, “chart_linear_humidity[%]”, “chart_bar_dewpoint[°C]”, etc. for our measurement.
Is there any limitation on field keys (> 200) or is there any recommended approach to handle a wide range of various sensor data containing display type, label and unit?
@h2n
If you’re planning on having a multitude of sensors, I don’t recommend using the fields key.
That’s because the fields aren’t indexed.
The field will only store your value.
However, you can use the tags :
A tag is sort of what you make your “where” clause on, in your case it’s your sensor.
There aren’t lots of limitation in naming, some special chars like commas, equals and spaces need to be escaped (see the docs).
Having more than 200 metrics to track should not cause any problem as long as you don’t exceed some limits, see “General Limitation” below (which happens only if you have huge strings).
For what I’ve seen so far there are two main approaches
1. The Normal Structure
This probably is the one you are already planning, tags are used to provide context for the metrics, each metric has his own “field” (column)
Sample:
Time
Machine
Sensor
Temperature[C]
Something
x
Machine1
Sensor1
10
12
y
Machine1
Sensor1
11
13
2. a sort of EAV structure (entity-attribute-value)
in which you use tag to track the context of the metric (and so far everything is normal), plus a tag “metric” or “counter” that defines what the value is represented in the “row”, in the end, you have only one field “value” (if you need different data types, more fields will be needed)
Sample:
Time
Machine
Sensor
counter
value
x
Machine1
Sensor1
Temperature[C]
10
x
Machine1
Sensor1
Something
12
y
Machine1
Sensor1
Temperature[C]
11
y
Machine1
Sensor1
Something
13
Pros:
Extremely flexible if you need to add more metrics, in fact you will only have more points (rows)
Renaming “counters” causes fewer issues as you manage different rows instead of different columns, the structure will still be “clean” (meaning that “old” rows will disappear because of the RP)
Cons:
Not always “comfortable” to query, you might end up having a query for each metric you want to gather in your chart
It is not immediately clear what the measurement contains
General limitations
maximum key size - given by time + tag set, it cannot exceed 64k
maximum body size - Http request size, it is configurable and can be disabled (see docs)
Encoding - Telegraf and InfluxDB encode strings in UTF-8, before using special chars check if they are actually printable in UTF-8
Here is a list of my personal suggestions:
Avoid special chars when possible. they make querying harder in general
Put the “unit of measure”(um) inside the field name, in this way it is immediately clear how it represents data. (tools like Grafana can automatically display the data with the best “um”, once you tell them which is the “um” of the field)
If the metrics are of several “types”, like cumulative value | instantaneous value | etc it might be useful to put this information in the field name itself, so you know how to correctly manage it without having to test the data in each field
If you have all those metrics (over 200), it might not be easy to navigate/find them. Consider using different measurements (tables) if appropriate.
I’m not a fan of the EAV design since it puts some limit on querying data, which could be more or less mitigated by the tool you use. If you can decide the structure of your data, have a “field” for each metric