One vs many fields when mixing telemetry from different customers

spaceandrew · November 6, 2019, 5:30pm

I have a schema design question for you.

Imagine data was flowing from satellites, with records containing satellite name, subsystem name, metric name, metric value, and timestamp.

Basically, telemetry that could be restructured to look something like this:

telemetry,satellite=Hubble,subsystem=gps lat=90,lng=10 1465839830100400200
telemetry,satellite=Hubble,subsystem=camera fov=90 1465839830100400300
telemetry,satellite=ISS,subsystem=gps lat=10,lng=20 1465839830100400400

etc.

Users would want to be able to compare metrics (such as the gps values above) across satellites and subsystems.

My questions are:

Is it better to store single or multiple fields per entry in InfluxDB? That is, we could store multiple metrics per entry by storing our metric name as a field, as in the line protocol example above. Alternatively, we could add a metric tag and a value field, like this:

telemetry,satellite=Hubble,subsystem=gps,metric=lat value=90 1465839830100400200
telemetry,satellite=Hubble,subsystem=gps,metric=lng value=10 1465839830100400200

Note that it’s sparse: many metrics (fields) will only exist on a single subsystem. Is one better than the other? Are there performance implications?

Would it make sense to replace the measurement name (currently “telemetry”) with the satellite name (e.g., “Hubble”, “ISS”), making many more measurements that each have smaller cardinality? Could users still compare across satellites? E.g.:

hubble,subsystem=gps lat=90,lng=10 1465839830100400200 # with many fields
hubble,subsystem=gps,metric=lat value=90 1465839830100400200 # or with metric as a tag instead of a field

Again, are there performance implications to consider?

Any thoughts are much appreciated!

daniel · November 6, 2019, 10:15pm

There are minor performance advantages in my experience towards using multiple fields, the line protocol has less duplication and it is slightly faster to parse. There may be some additional performance differences but I am under the impression that they would be very minor, and the cardinality is the same

In Telegraf we prefer the first format for style reasons. The measurement name gives you a nice level of namespacing when you have many metrics. Using field names like lat long allow you to do the faster line protocol encoding and also avoid a meaningless name like “value”.

When it comes to queries the first two work well and are similar but I would avoid the 3rd form since it will be harder to compare with InfluxQL. When using Flux to query all forms would work. Depending on the tool you are using to visualize the data, one form may work better than others for querying and comparing. I suggest trying out some example data to see how it they forms work.

Topic		Replies	Views
What is the criteria to use multiple fields per measurement?	6	13610	August 30, 2017
Best practices for choosing measurement, tags and fields Store	15	1953	August 4, 2025
Tags or Fields when there are many duplicated "tags" and "fields: influxdb	0	617	March 29, 2019
Millions of measurements vs multiple file/value pairs per measurement Store influxdb	0	576	August 31, 2018
Basic advices on database layout Store influxdb , schema	3	629	June 8, 2021

One vs many fields when mixing telemetry from different customers

Related topics