AggregateWindow with mixed types

AnalogJ · January 16, 2022, 12:54am

Hi,
I’m trying to create an aggregateWindow (for downsampling) that contains data points with fields that have multiple types (string, int & float).

Here’s an incredibly simplified version of what’s happening

First we’ll create a empty influxdb v2 db container

docker run --rm -p 8086:8086 \
  -e DOCKER_INFLUXDB_INIT_MODE=setup \
  -e DOCKER_INFLUXDB_INIT_USERNAME=admin \
  -e DOCKER_INFLUXDB_INIT_PASSWORD=password12345 \
  -e DOCKER_INFLUXDB_INIT_ORG=scrutiny \
  -e DOCKER_INFLUXDB_INIT_BUCKET=metrics \
  -e DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=my-super-secret-auth-token \
  influxdb:2.0

After that, we’ll populate influxDB with 4 data points: 2 points for each “device_wwn”

curl --request POST \
"http://localhost:8086/api/v2/write?org=scrutiny&bucket=metrics&precision=ns" \
  --header "Authorization: Token my-super-secret-auth-token" \
  --header "Content-Type: text/plain; charset=utf-8" \
  --header "Accept: application/json" \
  --data-binary '
    smart,device_wwn=diskdeviceid01,protocol="NVMe" temperature=70.00,attr.power_cycles.attribute_id="power_cycles",attr.power_cycles.value=100,attr.host_reads.attribute_id="host_reads",attr.host_reads.value=1000 1642291849000000000
    smart,device_wwn=diskdeviceid01,protocol="NVMe" temperature=80.00,attr.power_cycles.attribute_id="power_cycles",attr.power_cycles.value=110,attr.host_reads.attribute_id="host_reads",attr.host_reads.value=2000 1642291909000000000
    smart,device_wwn=diskdeviceid02,protocol="ATA" temperature=70.00,attr.1.attribute_id="1",attr.1.value=100,attr.2.attribute_id="2",attr.2.value=1000 1642291649000000000
    smart,device_wwn=diskdeviceid02,protocol="ATA" temperature=80.00,attr.1.attribute_id="1",attr.1.value=110,attr.2.attribute_id="2",attr.2.value=2000 1642291709000000000
    '

Finally we’ll attempt to aggregate/downsample the data we just wrote (down to 1 data point per unique “device_wwn”:

Ideally the two datapoints should be:

smart,device_wwn=diskdeviceid01,protocol="NVMe" temperature=75.00,attr.power_cycles.attribute_id="power_cycles",attr.power_cycles.value=105,attr.host_reads.attribute_id="host_reads",attr.host_reads.value=1500 1642291909000000000

smart,device_wwn=diskdeviceid02,protocol="ATA" temperature=75.00,attr.1.attribute_id="1",attr.1.value=105,attr.2.attribute_id="2",attr.2.value=15000 1642291709000000000

This aggregateWindow query fails

curl -vvv --request POST "http://localhost:8086/api/v2/query?org=scrutiny" \
  --header 'Authorization: Token my-super-secret-auth-token' \
  --header 'Accept: application/csv' \
  --header 'Content-type: application/vnd.flux' \
  --data 'import "influxdata/influxdb/schema"

smart_data = from(bucket: "metrics")
|> range(start: -2y, stop: now())
|> filter(fn: (r) => r["_measurement"] == "smart" )
|> filter(fn: (r) => r["_field"] !~ /(_measurement|protocol|device_wwn)/)

smart_data
|> aggregateWindow(fn: mean, every: 1w)
|> group(columns: ["device_wwn"])
|> schema.fieldsAsCols()'


{"code":"invalid","message":"unsupported input type for mean aggregate: string"}%

But if we filter out the “attribute_id” field (which is of type string), everything works:

curl -vvv --request POST "http://localhost:8086/api/v2/query?org=scrutiny" \
  --header 'Authorization: Token my-super-secret-auth-token' \
  --header 'Accept: application/csv' \
  --header 'Content-type: application/vnd.flux' \
  --data 'import "influxdata/influxdb/schema"

smart_data = from(bucket: "metrics")
|> range(start: -2y, stop: now())
|> filter(fn: (r) => r["_measurement"] == "smart" )
|> filter(fn: (r) => r["_field"] !~ /(_measurement|protocol|device_wwn|attribute_id)/)

smart_data
|> aggregateWindow(fn: mean, every: 1w)
|> group(columns: ["device_wwn"])
|> schema.fieldsAsCols()'

As I mentioned above, this is an incredibly simplified version of my dataset, and we have dozens of fields for each point, with 1/3 being string values (which are constants). I need to find a way to have them copied into the aggregated data.

Anaisdg · January 18, 2022, 5:03pm

Hello @AnalogJ,
You can use this package:

But I’m having trouble thinking of way that you can programmatically filter out specific string fields and match them back into their aggregate value/series.
You could maybe filter out the string types and convert them to tags with a map() function and filtering out those string fields before performing the aggregateWindow().

Topic		Replies	Views
aggregateWindow() for multiple columns Fluxlang	2	3455	March 9, 2021
Downsampling Task fail with error (unsupported input type for mean aggregate) Tasks influxdb , downsample	14	3330	April 7, 2022
Task Failing after server reboot / update (unsupported aggregate column type string) influxdb	3	340	March 6, 2023
Aging DB or FLUX thing? Cannot aggregate empty table bc string input type InfluxDB 2 influxdb , flux	2	410	April 13, 2023
Generic down sampling task Fluxlang schema , query , flux	2	913	May 21, 2021

AggregateWindow with mixed types

Related topics