Flux pivot creating columns with special characters, issues with map

Hello,

Apologies for the long read, if I could buy you a beer for getting to the bottom, I would.

I’m collecting telemetry from devices. At scale, there will be 10000’s devices reporting many metrics (error and warn in my example data, but many more in practice) along with some tags, like version. My goal is to try and spot regressions, the delta between the average of some_field for version1 and the average of some_field for version2. The ultimate goal would be to alert when “the average of some_field for v2 is x% higher than v1” or basically, it looks like v2 is worse than v1 for some_field.
Unfortunately, I have no control over the verison tag and when I use a pivot, it is creating column names I cannot use.

Here is some sample data that works ok:

insert test,version=v1,macaddress=aa errors=4,warnings=6
insert test,version=v2,macaddress=bb errors=5,warnings=7
insert test,version=v2,macaddress=cc errors=14,warnings=16

I’m using pivot because I read that its more efficient that join and the data is in the same measurement.

from(bucket: "telegraf/autogen")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "test" 
  and (r._field == "errors"))
  |> drop(columns: ["macaddress"])
  |> aggregateWindow(
     every: 5m,
     fn: mean,
     createEmpty: false
  )
  |> pivot(rowKey:["_time"], columnKey: ["version"], valueColumn: "_value")  
  |> map(fn: (r) => ({
      _time: r._time,
      _field: "delta",
      _value: r.v2 - r.v1
    })
  ) 

This creates the expected result, apologies for formatting:

table 
_field  
_time  
_value
0       
delta 
2020-08-25T17:10:00Z
5.5

If I use some more realistic data:

insert test,version=v1_2019-a,macaddress=aa errors=4,warnings=6
insert test,version=v2_2020-b,macaddress=bb errors=5,warnings=7
insert test,version=v2_2020-b,macaddress=cc errors=14,warnings=16

With a modified map function:

  |> map(fn: (r) => ({
      _time: r._time,
      _field: "delta",
      _value: r.v2_2020-b - r.v1_2019-a
    })
  ) 

The map throws up an error, which I’m pretty sure is due to the “-”:

Error: type error 15.20-15:21: unidentified identifier "b"

I’ve tried looking at ways of rewriting the tags/column name using regex/replaceto remove the special characters but can’t figure it out.

I can workaround this problem using a join, but at scale this might be very inefficient, especially since I ultimately want to perfom the delta on all fields, which might run into the 100’s.

v1 = from(bucket: "telegraf/autogen")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "test" 
  and (r._field == "errors")
  and (r.version=="v1_2019-a"))
  |> truncateTimeColumn(unit: 5m)

  
v2 = from(bucket: "telegraf/autogen")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "test" 
  and (r._field == "errors")
  and (r.version=="v2_2020-b"))
  |> truncateTimeColumn(unit: 5m)


join(tables: {v1: v1, v2: v2}, on: ["_time"])
  |> map(fn: (r) => ({
      _time: r._time,
      _field: "delta",
      _value: r._value_v2 - r._value_v1
    })
  )

And finally, is there a clever way I can compare all the fields between versions programatically? Or if I have x fields, do I have to write x flux scritps to make the comparison?

Many thanks for reading!
I’m new to flux, so maybe there is a very simple answer to all of this…
Tom

1 Like

Hello @thopewell,
Thanks for writing such a detailed question!
Have you tried using the rename function with any luck?

Hi Anaisdg,
Thanks for the reply!
Great - it works, once I quote the original column names, so with the above data, this gets me the expected result:

from(bucket: "telegraf/autogen")
  |> range(start: -1d)
  |> filter(fn: (r) => r._measurement == "test" 
  and (r._field == "errors"))
  |> drop(columns: ["macaddress"])
  |> aggregateWindow(
     every: 5m,
     fn: mean,
     createEmpty: false
  )
  |> pivot(rowKey:["_time"], columnKey: ["version"], valueColumn: "_value")  
  |> rename(columns: {"v1_2019-a": "v1","v2_2020-b": "v2"})
  |> map(fn: (r) => ({
      _time: r._time,
      _field: "delta",
      _value: r.v2 - r.v1
    })
  ) 

Thanks for the help!

Another thing you can do in this situation is use the [] syntax to access the data.

  |> map(fn: (r) => ({
      _time: r._time,
      _field: "delta",
      _value: r["v2_2020-b"] - r["v1_2019-a"]
    })
  ) 

Using ["key"] works for non-standard fields and is meant for data like this where the column name is something not representable with an identifier.

This only works with string literals though so you cannot use a variable to access records in this way.

1 Like

Thanks Jonatahan, I double checked and that also works. Good to know, for other use cases too.