Count distinct string values in timeframe

Once before: yes, I found some topics nearly related but all of them didn’t solve my problem.

I’m collecting data of a cluster about current state of a node in a specific cluster. (idle,drain,down,alloc)
So with help of this forum I ended up here:

from(bucket: "cluster")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "nodestate")
  |> filter(fn: (r) => r["cluster"] == "m")
  |> filter(fn: (r) => r["_field"] == "state")
  |> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)
  |> drop(columns: ["cluster","_measurement"])
  |> group(columns: ["_value"])
  |> rename(columns: {_value: "State"})
  |> set(key: "Count", value: "statecount")
  |> count(column: "Count")

But this is not what I want. At the end I want to the count of each distinct value as the new table value so I can generate a graph how the cluster is used. This script above gives me the count of each value in whole timerange.
So, I have a query where I filtered out all not needed elements

from(bucket: "cluster")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "nodestate")
  |> filter(fn: (r) => r["cluster"] == "m")
  |> filter(fn: (r) => r["_field"] == "state")
  |> drop(columns: ["node","cluster","_measurement"])

Can somebody give me a hint how to get a table

start | end | timeframe | _value | count-of-state-in-timeframe

So I can get a nice graphical overview over time about state of cluster-nodes? Eg, for lines, one for each of the distinct values possible listed in the beginning.

Thanks

R.

@ra-albrecht This is a case where you want to use aggregateWindow() to calculate the count. It returns a stream of tables with aggregate values per window of time. And because you want a count of each distinct value, you could group by _value, but this will cause aggregateWindow() to fail because you can’t aggregate a column that’s in the group key. So to accomplish the same thing, duplicate the _value column into a new column and add the new column to the group key. This will give you unique tables for each distinct value without losing the rows with those values.

from(bucket: "cluster")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r["_measurement"] == "nodestate")
    |> filter(fn: (r) => r["cluster"] == "m")
    |> filter(fn: (r) => r["_field"] == "state")
    |> drop(columns: ["node","cluster","_measurement"])
    |> duplicate(column: "_value", as: "state")
    |> group(columns: ["_time", "_value"], mode: "except")
    |> aggregateWindow(every: v.windowPeriod, fn: count)

Many thanks!

This looks very good. Looks like I have learn much more about syntax :slight_smile: To get a usefull diagram I had to change it:

from(bucket: "cluster")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r["_measurement"] == "nodestate")
    |> filter(fn: (r) => r["cluster"] == "m")
    |> filter(fn: (r) => r["_field"] == "state")
    |> drop(columns: ["node","cluster","_measurement"])
    |> duplicate(column: "_value", as: "state")
    |> group(columns: ["_time", "_value"], mode: "except")
    |> aggregateWindow(every: 1m, fn: count)

Eg, I had to set the aggregation window to one minute (interval of measurement), because using windowPeriod would result in a strange behavior of result data.

And looks like I have to skip the very first column because the count there is always 0.

many thanks again!