Lazy Grafana dashboarding using field vs tag

thopewell · April 8, 2021, 9:33pm

Hello,

I’m collecting telemetry from devices. The field names follow a convention, like counts of errors start error_. I want Graph the fields over time for a specific device, probably using an aggregateWindow function.
I had the mac address (used to uniquely identify the device) as a tag until I hit cardinality issues.
With mac as a tag, an approximation of my data in line protocol looks like:

telemetry_mac_is_tag,model=a,mac=abc error_counta=10,error_countb=100,error_countc=1000 1616544000
telemetry_mac_is_tag,model=a,mac=def error_counta=1,error_countb=1,error_countc=1 1616544000
telemetry_mac_is_tag,model=a,mac=abc error_counta=20,error_countb=200,error_countc=2000 1616544060
telemetry_mac_is_tag,model=a,mac=def error_counta=2,error_countb=2,error_countc=2 1616544060

With mac as a tag, I can easily plot the errors for a specific mac and critically, if I decide to add more fields like error_countd in the device, I don’t need to modify the query, error_countd just pops out in the graph:

from(bucket: "test-bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "telemetry_mac_is_tag")
  |> filter(fn: (r) => r["mac"] == "abc")
  |> filter(fn: (r) => r["_field"] =~ /error_count/)
  |> aggregateWindow(every: 5m, fn: sum, createEmpty: false)
  |> group()

Now, having moved to mac as a field, it seems I can’t be lazy and have to copy/paste the queries changing a few things. My new line protocol approximation looks like:

telemetry_mac_is_field,model=a error_counta=10,error_countb=100,error_countc=1000,mac="abc" 1616544001
telemetry_mac_is_field,model=a error_counta=1,error_countb=1,error_countc=1,mac="def" 1616544002
telemetry_mac_is_field,model=a error_counta=20,error_countb=200,error_countc=2000,mac="abc" 1616544063
telemetry_mac_is_field,model=a error_counta=2,error_countb=2,error_countc=2,mac="def" 1616544064

And my query:

import "influxdata/influxdb/schema"
from(bucket: "test-bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "telemetry_mac_is_field")
  |> schema.fieldsAsCols()
  |> filter(fn: (r) => r["mac"] == "abc")
  |> aggregateWindow(every: 5m, column:"error_counta", fn: sum, createEmpty: false)

It seems I have to change the value of “column” in the aggregateWindow function and now when I add error_countd, I need to create a new query just to plot error_countd.
Is there an easier way to do this, am I missing something?

Thanks,
Tom

Chris_Wolff · April 9, 2021, 4:50pm

Hi Tom.

I think that it’s really better for mac address to be a tag, since if you happen to have two devices that create points in the error count fields at the same time, one of them will get overwritten, and you won’t have any way to know which device the point that “won” belongs to.

Out of curiosity, how many different devices are you dealing with? And what version of InfluxDB are you using? Since it seems like mac address is ideally a tag in InfluxDB’s data model, I wonder if there’s other ways to address the resulting cardinality issues.

Maybe you are thinking of the issue of points from different devices colliding is okay because timestamps have nanosecond resolution, so it will be infrequent enough to not be a real problem. To get the same behavior that you had before, you would need to undo the schema.fieldsAsCols() transformation after filtering for the particular mac address you care about. Offhand I’m not sure of a way to do that, but maybe there is.

Cheers,
Chris

thopewell · April 9, 2021, 5:30pm

Hi Chris,

Thanks for the reply.

I’m on Influx cloud and am anticipating 100k+ devices. I originally had mac as a tag since its always going to be used in the sense “where mac =” so it makes sense to have “mac” indexed.

I anticipate retaining metrics for long enough that a mac will be associated with multiple versions over the retention time of the metrics. I also have another measurement tracking online/offline status and during scale tests with around 70k devices, with mac as a tag, we hit the 1 million cardinality limit and Influx stopped accepting new data.

It seems in Influx 2.x the number of fields counts towards the cardinality, I understand the formula is
“#measurements” x “unique tag set” x “# field keys”… so even with a unique tag set of say 2 (1 mac with 2 versions) at 100k devices with 5 fields, we quickly get to 1 million.

I managed to recreate the “collision” issue you describe when putting together this example data!
There are a few other tags I haven’t included but I’m confident its unlikely two devices with the same tag set will try and send telemetry at the same nanosecond.

But I agree, it fundamentally seems better to have mac as a tag. In any case, mac as a field is working and its really just inconvenient that I have to update dashboards looking at the data on an individual device level. Looking at the data across the population is unaffected. I don’t pay more for extra processing scanning the data when mac is a field, but I guess I am paying slightly more as I have to make more queries…

Would be nice if there was a way to reverse the fieldsAsCols!

Thanks,
Tom

Chris_Wolff · April 16, 2021, 4:41pm

Hi Tom,

We have an existing issue for reversing fieldsAsCols(), we call it “unpivot”. The issue for it is here:

github.com/influxdata/flux

Add unpivot functionality

opened 06:27PM - 20 Feb 20 UTC

closed 04:24PM - 10 Nov 22 UTC

sanderson

team/query

Every now and then, I run into a use case where it would be really useful to hav…e the ability to unpivot data. For example, I defined this custom `minMaxMean()` function that uses `reduce()` to output the min, max, and mean values for each table. The issue is that, as-is, I can't 1) visualize the results in the UI or 2) write them back to the db because the output schema doesn't meet the requirements for writing back into InfluxDB. ```js import "experimental" minMaxMean = (tables=<-) => tables |> reduce( identity: {count: 0, sum: 0.0, min: 0.0, max: 0.0, mean:0.0}, fn: (r, accumulator) => ({ r with count: accumulator.count + 1, sum: r._value + accumulator.sum, min: if accumulator.count == 0 then r._value else if r._value < accumulator.min then r._value else accumulator.min, max: if accumulator.count == 0 then r._value else if r._value > accumulator.max then r._value else accumulator.max, mean: if accumulator.count == 0 then r._value else (r._value + accumulator.sum) / float(v: accumulator.count + 1) }) ) |> drop(columns: ["count", "sum"]) ``` To accomplish those things, I have to create multiple filtered streams, then union them back together: ```js data = from(bucket: v.bucket) |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn: (r) => r._measurement == "mem") |> filter(fn: (r) => r._field == "used_percent") |> window(every: v.windowPeriod) |> minMaxMean() |> duplicate(column: "_stop", as: "_time") |> window(every: inf) min = data |> map(fn: (r) => ({r with _value: r.min, _metricType: "min"})) |> drop(columns: ["max", "mean"]) max = data |> map(fn: (r) => ({r with _value: r.max, _metricType: "max"})) |> drop(columns: ["min", "mean"]) mean = data |> map(fn: (r) => ({r with _value: r.mean, _metricType: "mean"})) |> drop(columns: ["min", "max"]) union(tables: [min, max, mean]) |> experimental.group(columns: ["_metricType"], mode: "extend") ``` This could be simplified with an `unpivot()` function. ``` unpivot( columns: ["col1", "col2", "col3"], columnDst: "_metricType", valueDst: "_value" ) ``` So given the following input data: | _time | col1 | col2 | col3 | | ----- | ---- | ---- | ---- | | 0001 | val1.1 | val2.1 | val3.1 | | 0002 | val1.2 | val2.2 | val3.2 | | 0003 | val1.3 | val2.3 | val3.3 | `unpivot()` would output: | _time | _metricType | _value | | ----- | ----------- | ------ | | 0001 | col1 | val1.1 | | 0001 | col2 | val2.1 | | 0001 | col3 | val3.1 | | 0002 | col1 | val1.2 | | 0002 | col2 | val2.2 | | 0002 | col3 | val3.2 | | 0003 | col1 | val1.3 | | 0003 | col2 | val2.3 | | 0003 | col3 | val3.3 |

Feel free to comment or +1 on it.

During a discussion about this today someone came up with a potential solution for your particular problem. If your query splits your data into the “mac” field on one side and the remaining fields on the other side, you can join them, and that would seem to get you want you want:

import "experimental"

rawData = from(bucket: "test-bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "telemetry_mac_is_field")

macData = rawData 
  |> filter(fn: (r) => r._field == "mac" and r._value == "abc")
  // remove _field from the group key
  |> group(columns: ["model"])

otherFields = rawData 
  |> filter(fn: (r) => r._field != "mac")
  // remove _field from the group key
  |> group(columns: ["model"])

exerimental.join(left: macData, right: otherFields, fn: (left, right) => ({right with mac: left._value}))
    |> group(columns: ["model", "mac", "_field", "_measurement"])
    |> aggregateWindow(every: 5m, fn: sum, createEmpty: false)

The result of this query should look like the output of your first query.

Hope this helps!

Cheers,
Chris

thopewell · April 19, 2021, 11:12pm

Hi Chris,

Just wanted to quickly follow up and thank you looking into this.
I’m hitting the error:

 runtime error @8:6-8:64: filter: cannot compile @ 8:17-8:63: unsupported binary expression float == string

I’m trying to figure out why, it looks like it should work.

My test data aged out and I might have made a typo recreating.
I get the same error on the “real data” too, but wouldn’t expect it to work there first time without some further tweaks.

Will post back here soon…
Longer term, I think Influx IOx is what we need!

Thanks,
Tom

Topic		Replies	Views
Tag Variable based on time Fluxlang	11	1795	December 20, 2023
Use _field as a tag in query Fluxlang influxdb , grafana	3	708	August 26, 2023
Combing tags and measurements	6	646	December 16, 2019
One vs many fields when mixing telemetry from different customers Telegraf	1	669	November 6, 2019
Flux pivot creating columns with special characters, issues with map Fluxlang flux	4	1472	September 2, 2020

Lazy Grafana dashboarding using field vs tag

Related topics