Lazy Grafana dashboarding using field vs tag

Hello,

I’m collecting telemetry from devices. The field names follow a convention, like counts of errors start error_. I want Graph the fields over time for a specific device, probably using an aggregateWindow function.
I had the mac address (used to uniquely identify the device) as a tag until I hit cardinality issues.
With mac as a tag, an approximation of my data in line protocol looks like:

telemetry_mac_is_tag,model=a,mac=abc error_counta=10,error_countb=100,error_countc=1000 1616544000
telemetry_mac_is_tag,model=a,mac=def error_counta=1,error_countb=1,error_countc=1 1616544000
telemetry_mac_is_tag,model=a,mac=abc error_counta=20,error_countb=200,error_countc=2000 1616544060
telemetry_mac_is_tag,model=a,mac=def error_counta=2,error_countb=2,error_countc=2 1616544060

With mac as a tag, I can easily plot the errors for a specific mac and critically, if I decide to add more fields like error_countd in the device, I don’t need to modify the query, error_countd just pops out in the graph:

from(bucket: "test-bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "telemetry_mac_is_tag")
  |> filter(fn: (r) => r["mac"] == "abc")
  |> filter(fn: (r) => r["_field"] =~ /error_count/)
  |> aggregateWindow(every: 5m, fn: sum, createEmpty: false)
  |> group()

Now, having moved to mac as a field, it seems I can’t be lazy and have to copy/paste the queries changing a few things. My new line protocol approximation looks like:

telemetry_mac_is_field,model=a error_counta=10,error_countb=100,error_countc=1000,mac="abc" 1616544001
telemetry_mac_is_field,model=a error_counta=1,error_countb=1,error_countc=1,mac="def" 1616544002
telemetry_mac_is_field,model=a error_counta=20,error_countb=200,error_countc=2000,mac="abc" 1616544063
telemetry_mac_is_field,model=a error_counta=2,error_countb=2,error_countc=2,mac="def" 1616544064

And my query:

import "influxdata/influxdb/schema"
from(bucket: "test-bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "telemetry_mac_is_field")
  |> schema.fieldsAsCols()
  |> filter(fn: (r) => r["mac"] == "abc")
  |> aggregateWindow(every: 5m, column:"error_counta", fn: sum, createEmpty: false)

It seems I have to change the value of “column” in the aggregateWindow function and now when I add error_countd, I need to create a new query just to plot error_countd.
Is there an easier way to do this, am I missing something?

Thanks,
Tom

Hi Tom.

I think that it’s really better for mac address to be a tag, since if you happen to have two devices that create points in the error count fields at the same time, one of them will get overwritten, and you won’t have any way to know which device the point that “won” belongs to.

Out of curiosity, how many different devices are you dealing with? And what version of InfluxDB are you using? Since it seems like mac address is ideally a tag in InfluxDB’s data model, I wonder if there’s other ways to address the resulting cardinality issues.

Maybe you are thinking of the issue of points from different devices colliding is okay because timestamps have nanosecond resolution, so it will be infrequent enough to not be a real problem. To get the same behavior that you had before, you would need to undo the schema.fieldsAsCols() transformation after filtering for the particular mac address you care about. Offhand I’m not sure of a way to do that, but maybe there is.

Cheers,
Chris

Hi Chris,

Thanks for the reply.

I’m on Influx cloud and am anticipating 100k+ devices. I originally had mac as a tag since its always going to be used in the sense “where mac =” so it makes sense to have “mac” indexed.

I anticipate retaining metrics for long enough that a mac will be associated with multiple versions over the retention time of the metrics. I also have another measurement tracking online/offline status and during scale tests with around 70k devices, with mac as a tag, we hit the 1 million cardinality limit and Influx stopped accepting new data.

It seems in Influx 2.x the number of fields counts towards the cardinality, I understand the formula is
#measurements” x “unique tag set” x “# field keys”… so even with a unique tag set of say 2 (1 mac with 2 versions) at 100k devices with 5 fields, we quickly get to 1 million.

I managed to recreate the “collision” issue you describe when putting together this example data!
There are a few other tags I haven’t included but I’m confident its unlikely two devices with the same tag set will try and send telemetry at the same nanosecond.

But I agree, it fundamentally seems better to have mac as a tag. In any case, mac as a field is working and its really just inconvenient that I have to update dashboards looking at the data on an individual device level. Looking at the data across the population is unaffected. I don’t pay more for extra processing scanning the data when mac is a field, but I guess I am paying slightly more as I have to make more queries…

Would be nice if there was a way to reverse the fieldsAsCols!

Thanks,
Tom