Must use DROP to calculate SUM?

Hello, I am getting very strange result without using drop in flux for particular string.

  1. When the DROP is not applied (1.case) to msgid field, looks like there is no calculation for SUM. Why do I need to drop msgid field in order to get the sum result (2.case)?
  2. Even when not apply DROP (1.case) and use SUM function for _value field, why the SUM is not calculated ? I am pointing sum up _value fields which are numbers so just calculate those or ?

Thanks for helping out to understand

1. case: Not applying DROP (no SUM result)

from(bucket: "test")
  |> range(start: -15m)
  |> filter(fn: (r) =>
    r._measurement == "quantity" and
    r._field == "countfld")
|> sum(column: "_value")

#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,long
    #group,false,false,true,true,true,true,true,true,true,true,true,true,false
    #default,_result,,,,,,,,,,,,
    ,result,table,_start,_stop,_field,_measurement,deviceid,green,msgid,red,trigger,yellow,_value
    ,,0,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,265,0,cycle,0,1
    ,,1,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,266,0,cycle,0,1
    ,,2,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,267,0,cycle,0,1
    ,,3,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,268,0,cycle,0,1
    ,,4,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,269,0,cycle,0,1
    ,,5,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,270,0,cycle,0,1
    ,,6,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,271,0,cycle,0,1
    ,,7,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,272,0,cycle,0,1
    ,,8,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,273,0,cycle,0,1
    ,,9,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,274,0,cycle,0,1
    ,,10,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,275,0,reject,0,1
    ,,11,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,276,0,cycle,0,1
    ,,12,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,277,0,cycle,0,1
    ,,13,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,278,0,cycle,0,1
    ,,14,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,279,0,cycle,0,1
    ,,15,2020-01-17T12:53:00.29679007Z,2020-01-17T13:08:00.29679007Z,countfld,quantity,test12,0,280,0,cycle,0,1

2. case: Applying DROP (sum is OK)

from(bucket: "test")
  |> range(start: -15m)
  |> filter(fn: (r) =>
    r._measurement == "quantity" and
    r._field == "countfld")
|> drop(columns: ["msgid"])
|> sum(column: "_value")

#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,long
#group,false,false,true,true,true,true,true,true,true,true,true,false
#default,_result,,,,,,,,,,,
,result,table,_start,_stop,_field,_measurement,deviceid,green,red,trigger,yellow,_value
,,0,2020-01-17T12:57:00.699043677Z,2020-01-17T13:12:00.699043677Z,countfld,quantity,test12,0,0,cycle,0,15
,,1,2020-01-17T12:57:00.699043677Z,2020-01-17T13:12:00.699043677Z,countfld,quantity,test12,0,0,reject,0,1

@salvq What version of Flux are you using?

See below, also examples above are triggered via Grafana, have not tried in influxdb yet.

root@189d6517cfa3:/# influx -type=flux -database test -username admin -password XXXX
Connected to http://localhost:8086 version 1.7.9
InfluxDB shell version: 1.7.9
>

Just tried in influxdb directly and same results as in grafana

> from(bucket: "test")
  |> range(start: -15m)
  |> filter(fn: (r) =>
    r._measurement == "quantity" and
    r._field == "countfld")
|> drop(columns: ["msgid"])
|> sum(column: "_value")
Result: _result
Table: keys: [_start, _stop, _field, _measurement, deviceid, green, red, trigger, yellow]
                   _start:time                      _stop:time           _field:string     _measurement:string         deviceid:string            green:string              red:string          trigger:string           yellow:string                  _value:int
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  --------------------------
2020-01-17T20:41:26.876047238Z  2020-01-17T20:56:26.876047238Z                countfld                quantity                 test12                       0                       0                   cycle                       0                          13

> from(bucket: "test")
  |> range(start: -15m)
  |> filter(fn: (r) =>
    r._measurement == "quantity" and
    r._field == "countfld")
|> sum(column: "_value")
Result: _result
Table: keys: [_start, _stop, _field, _measurement, deviceid, green, msgid, red, trigger, yellow]
                   _start:time                      _stop:time           _field:string     _measurement:string         deviceid:string            green:string            msgid:string              red:string          trigger:string           yellow:string                  _value:int
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  --------------------------
2020-01-17T20:42:25.021238475Z  2020-01-17T20:57:25.021238475Z                countfld                quantity                 test12                       0                       1                       0                   cycle                       0                           1
Table: keys: [_start, _stop, _field, _measurement, deviceid, green, msgid, red, trigger, yellow]
                   _start:time                      _stop:time           _field:string     _measurement:string         deviceid:string            green:string            msgid:string              red:string          trigger:string           yellow:string                  _value:int
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  --------------------------
2020-01-17T20:42:25.021238475Z  2020-01-17T20:57:25.021238475Z                countfld                quantity                 test12                       0                     794                       0                   cycle                       0                           1
Table: keys: [_start, _stop, _field, _measurement, deviceid, green, msgid, red, trigger, yellow]
                   _start:time                      _stop:time           _field:string     _measurement:string         deviceid:string            green:string            msgid:string              red:string          trigger:string           yellow:string                  _value:int
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  --------------------------
2020-01-17T20:42:25.021238475Z  2020-01-17T20:57:25.021238475Z                countfld                quantity                 test12                       0                     795                       0                   cycle                       0                           1
Table: keys: [_start, _stop, _field, _measurement, deviceid, green, msgid, red, trigger, yellow]
                   _start:time                      _stop:time           _field:string     _measurement:string         deviceid:string            green:string            msgid:string              red:string          trigger:string           yellow:string                  _value:int
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  --------------------------
2020-01-17T20:42:25.021238475Z  2020-01-17T20:57:25.021238475Z                countfld                quantity                 test12                       0                     796                       0                   cycle                       0                           1
Table: keys: [_start, _stop, _field, _measurement, deviceid, green, msgid, red, trigger, yellow]
                   _start:time                      _stop:time           _field:string     _measurement:string         deviceid:string            green:string            msgid:string              red:string          trigger:string           yellow:string                  _value:int
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  --------------------------
2020-01-17T20:42:25.021238475Z  2020-01-17T20:57:25.021238475Z                countfld                quantity                 test12                       0                     797                       0                   cycle                       0                           1
Table: keys: [_start, _stop, _field, _measurement, deviceid, green, msgid, red, trigger, yellow]
                   _start:time                      _stop:time           _field:string     _measurement:string         deviceid:string            green:string            msgid:string              red:string          trigger:string           yellow:string                  _value:int
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  --------------------------
2020-01-17T20:42:25.021238475Z  2020-01-17T20:57:25.021238475Z                countfld                quantity                 test12                       0                     798                       0                   cycle                       0                           1
Table: keys: [_start, _stop, _field, _measurement, deviceid, green, msgid, red, trigger, yellow]
                   _start:time                      _stop:time           _field:string     _measurement:string         deviceid:string            green:string            msgid:string              red:string          trigger:string           yellow:string                  _value:int
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  --------------------------
2020-01-17T20:42:25.021238475Z  2020-01-17T20:57:25.021238475Z                countfld                quantity                 test12                       0                     799                       0                   cycle                       0                           1
Table: keys: [_start, _stop, _field, _measurement, deviceid, green, msgid, red, trigger, yellow]
                   _start:time                      _stop:time           _field:string     _measurement:string         deviceid:string            green:string            msgid:string              red:string          trigger:string           yellow:string                  _value:int
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  --------------------------
2020-01-17T20:42:25.021238475Z  2020-01-17T20:57:25.021238475Z                countfld                quantity                 test12                       0                     800                       0                   cycle                       0                           1
Table: keys: [_start, _stop, _field, _measurement, deviceid, green, msgid, red, trigger, yellow]
                   _start:time                      _stop:time           _field:string     _measurement:string         deviceid:string            green:string            msgid:string              red:string          trigger:string           yellow:string                  _value:int
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  --------------------------
2020-01-17T20:42:25.021238475Z  2020-01-17T20:57:25.021238475Z                countfld                quantity                 test12                       0                     801                       0                   cycle                       0                           1
Table: keys: [_start, _stop, _field, _measurement, deviceid, green, msgid, red, trigger, yellow]
                   _start:time                      _stop:time           _field:string     _measurement:string         deviceid:string            green:string            msgid:string              red:string          trigger:string           yellow:string                  _value:int
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  --------------------------
2020-01-17T20:42:25.021238475Z  2020-01-17T20:57:25.021238475Z                countfld                quantity                 test12                       0                     802                       0                   cycle                       0                           1
Table: keys: [_start, _stop, _field, _measurement, deviceid, green, msgid, red, trigger, yellow]
                   _start:time                      _stop:time           _field:string     _measurement:string         deviceid:string            green:string            msgid:string              red:string          trigger:string           yellow:string                  _value:int
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  --------------------------
2020-01-17T20:42:25.021238475Z  2020-01-17T20:57:25.021238475Z                countfld                quantity                 test12                       0                     803                       0                   cycle                       0                           1
Table: keys: [_start, _stop, _field, _measurement, deviceid, green, msgid, red, trigger, yellow]
                   _start:time                      _stop:time           _field:string     _measurement:string         deviceid:string            green:string            msgid:string              red:string          trigger:string           yellow:string                  _value:int
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  --------------------------
2020-01-17T20:42:25.021238475Z  2020-01-17T20:57:25.021238475Z                countfld                quantity                 test12                       0                     804                       0                   cycle                       0                           1
Table: keys: [_start, _stop, _field, _measurement, deviceid, green, msgid, red, trigger, yellow]
                   _start:time                      _stop:time           _field:string     _measurement:string         deviceid:string            green:string            msgid:string              red:string          trigger:string           yellow:string                  _value:int
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  ----------------------  --------------------------
2020-01-17T20:42:25.021238475Z  2020-01-17T20:57:25.021238475Z                countfld                quantity                 test12                       0                     805                       0                   cycle                       0                           1
>

Looking a little closer at the data before drop(), I see that each point is actually in its own table. sum() operates on each table in a stream of tables, so running sum() on tables where each only has one row will simply return that row and stream of tables unchanged. The reason your data is structured in one-row tables is because msgid is part of the group key.

Group keys define how tables are grouped. They are a list of columns for which all rows in the table share the same value. The value of msgid is different for each point and, because msgid is part of the group key, the unique value in each row forces the point into a new table. By dropping msgid, you’re removing the data that makes each row unique (other than _value and _time which aren’t in the group key), so everything then gets grouped into a single table and sum() works as you expect.

If you want to preserve the msgid column, but still calculate the sum as you would by dropping the column, you need to redefine the group key with group(). I’d use the except mode for group():

from(bucket: "test")
  |> range(start: -15m)
  |> filter(fn: (r) =>
    r._measurement == "quantity" and
    r._field == "countfld")
  |> group(columns: ["_time", "_value", "msgid"], mode: "except")
  |> sum(column: "_value")

Here’s a video that walks through how this all works in Flux. 9:16 to 17:04 will help I think.

Great, thanks for the link to video. Will definitely look to understand

Appreciate your help…

@scott I just looked at whole video, great and helped me a lot. Many of my questions have been answered but there is still one…

  • How group keys are defined before group function i.e. after range and filter function without group function ? Or making group keys is fundamental step (best practice) to define group after filtering the data ?

Thanks

@salvq Great question! The underlying storage engine defines that group key when it returns the data. So in this case, it’s a combination of the from() and range() functions, which are unique to InfluxDB. By default, InfluxDB returns data grouped by series. A series is a collection of data that shares the same measurement, tagset, and field key (so really, everything except _time and _value). from() is what returns series. range() defines the initial values for _start and _stop. These are also part of the default group key, but since they’re the same for all rows (until you window data), they don’t really affect the grouping.

Got it now sir, clear and thanks :+1: