Get good series names in Flux

mathieulongtin · November 28, 2018, 9:05pm

I’m trying to build a dashboard where a queue is measured once an hour, but the data doesn’t arrive on the hour, it arrives at random when the measurement is done.

So far I have this:

from(bucket: “zzz/autogen”)
|> range(start: dashboardTime)
|> filter(fn: (r) => r._measurement == “zzz” and (r._field == “queue_size”))
|> aggregateWindow(every: 1h, fn: mean)
|> group(by: [“sysname”])

Issues:

The series are all named “_value[sysname=something]”, is there anyway that Chronograf or Flux names them by the “sysname”?
In aggregateWindow, why can’t I use “fn: max”?
If I have the group item, Chronograf only shows a set of boxes instead of data series names
If I don’t have the group item, the series have super long name with all variables involved

An aside:
Since just about every query will have a filter on measurement and on field, wouldn’t it make sense to have a separate function for those?

nathaniel · November 28, 2018, 9:20pm

Thanks for the detailed question. I can answer a few of those questions for you.

In aggregateWindow, why can’t I use “fn: max”?

This is because max and mean have different function signatures. Specifically the max function just selects the row that is the maximum per table given a single column. As such all columns on the original row are preserved. In contrast the mean function performs an aggregation across a list of columns and only those columns are preserved (in addition to the group key columns).

So that is the reason why max does not work with the aggregateWindow function, but obviously
you would like to be able to use it. Perhaps we should create a similar helper function that manages the window for you but given a selector (like max) makes it behave like an aggregate (like mean) so that you don’t have to manage that yourself.

I created this issue to track that idea Should be able to use selector functions with the aggregateWindow function · Issue #336 · influxdata/flux · GitHub

As a workaround you can make max behave like a aggregator by wrapping it in a function with the correct signature that picks the first column only.

maxAgg = (columns, tables=<-) => table |> max(column:columns[0])

 // ...
  |> aggregateWindow(fn:maxAgg)

An aside:
Since just about every query will have a filter on measurement and on field, wouldn’t it make sense to have a separate function for those?

We are planning on it, we have been looking to the community to figure out what combinations of these kinds of functions would be best to provide. So again thank you for the feedback. This confirms our intuition.

chrishenn · November 28, 2018, 10:45pm

Hey @mathieulongtin ,

In Chronograf, the series names cannot currently be configured. I can certainly understand why you would want to though.

We’re planning to redesign how the legends work in Chronograf soon—I opened up an issue to investigate possible improvements to how the series are named in legends as well: https://github.com/influxdata/platform/issues/1615

To give you an idea of how we pick the names currently: the response to a Flux query consists of a collection of tables, with multiple columns in each table. A column in a table may be part of that table’s group key, which is what the line

|> group(by: [“sysname”])

specifies. If a column is part of the group key, then the value of that column is the same in every row within a particular table in the Flux response.

In other words, the group key specifies how tables are split up in a Flux response.

In Chronograf, we plot one series for each numeric column in each table. So to disambiguate series, we use the following information:

The name of the column being plotted
The key and value of each column in the group key for the table (you can think of this as a unique identifier for the table)

So in this case,

_value is the column being plotted, and sysname = <some sysname> is the key and value of the single column in the group key. We format this as _value[sysname=<some sysname>].

If I have the group item, Chronograf only shows a set of boxes instead of data series names

I’m not sure what you mean by this. Would you mind posting a screenshot?

mathieulongtin · November 29, 2018, 9:47pm

Isn’t the column name always going to be _value with Flux?

nathaniel · November 29, 2018, 10:09pm

No, the data returned from the from function always has a _value column, but further transformations on the data may rename or create new columns.

A few examples

Using map to create two columns:

from(bucket:"telegraf/autogen")
    |> range(start:-1m)
    |> filter(fn:(r) => r._measurement == "cpu" and r._field == "usage_idle")
    |> map(fn:(r) => ({_value: r._value, _value2: r._value*2}))

Using pivot to rotate the data so that each field is its own column

from(bucket:"telegraf/autogen")
    |> range(start:-1m)
    |> filter(fn:(r) => r._measurement == "cpu" and ( r._field == "usage_idle" or r._field == "usage_user")
    |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
    // Now data has no _value column but a usage_idle and a usage_user column

mathieulongtin · November 29, 2018, 10:34pm

How about this instead:

from(bucket:"telegraf/autogen", measurement: "cpu")
    |> range(start:-1m)
    |> select(columns: ["_time", "usage_idle", "usage_user"])

ps: Look at Spark for a good example of DataFrame processing, which is basically what Flux is trying to do.

mathieulongtin · November 30, 2018, 8:17pm

Ok, so I kind of figured out what the overall problem here by looking at the raw data. It’s not always clear what outputs multiple tables, and what outputs a single table. And sometimes, Chronograf sees empty tables, resulting in empty square boxes in the legend.

I’m not sure that’s a feature, it’s hugely confusing. It should be one pipeline == one table, unless some function called groupInTables is called, or something equally explicit.

Chronograf should use only tags that don’t start with _ as labels. That would take care of the problem.

I ended up with this, because sysname is the only tag, no need to group. But I would be screwed if that wasn’t the case.

from(bucket: "zzz/autogen")
  |> range(start: dashboardTime)
  |> filter(fn: (r) => r._measurement == "zzz" and r._field == "queue_size")
  |> aggregateWindow(every: 1h, fn: mean)

Also this _value thing is a pain in the ass. I gave my metrics names, I should be able to use them.

suikast42 · April 3, 2019, 4:59pm

Hi I am lokking for the same posbility especially in grafana.

For example I wan’t to show my used diskspace with

from(bucket: “telegraf”)
|> range(start: dashboardTime)
|> filter(fn: (r) => r._measurement == “disk” )

I want to add legend names like {{_tag.host}} {{_tags.device}} {{r._field }}

I hope that’s is planed. Otherwise that approch is not usefull for business reporting.

OhadGal · February 8, 2020, 10:52pm

Try adding

|> keep(columns: ["_value", “_time”, “_field”])

this will remove all other tags from the label.
not optimal, but will help somewhat

Topic		Replies	Views
Flux aggregate function to calculate the difference from last to first Fluxlang	1	617	April 11, 2023
Howto get max/min difference per hour from counter in flux? Fluxlang	5	1505	August 6, 2019
Sequential aggregate functions Fluxlang flux	2	26	April 28, 2025
Using Flux for retrieving montly data Telegraf influxdb , flux	4	3547	August 2, 2019
Flux window or aggregatewindow versus v1 group by InfluxDB 2 time-series	3	646	December 20, 2022

Get good series names in Flux

Related topics