Issue with min - max - mean tasks

sineverba · March 8, 2022, 6:54am

Hi to all.
I have a bucket (homeassistant) that collects info from a temperature / humidity sensor.

Of course, I want to downsample his data to min/max/mean of previous day.

Inspired by this post, I created a new bucket downsample and two tasks for the moment:

option task = {name: "BALCONE_MAX", cron: "0 0 * * *"}
option v = {timeRangeStart: -1d, timeRangeStop: now()}

from(bucket: "homeassistant")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == "0x00158d00067beedf_temperature")
    |> filter(fn: (r) => r["_field"] == "value")
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> yield(name: "max")
    |> to(bucket: "downsample", org: "sineverba")

And

option task = {name: "BALCONE_MIN", cron: "0 0 * * *"}
option v = {timeRangeStart: -1d, timeRangeStop: now()}

from(bucket: "homeassistant")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == "0x00158d00067beedf_temperature")
    |> filter(fn: (r) => r["_field"] == "value")
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> yield(name: "min")
    |> to(bucket: "downsample", org: "sineverba")

They run both at 00:00 and save min and max in same bucket.

But today I checked and… I have only a single point (only the min value, in reality), not the max.

Is it possible to save both value inside same bucket?

Anaisdg · March 8, 2022, 10:16pm

Hello @sineverba,
yes there are multiple ways to do this:

option task = {name: "BALCONE_MAX", cron: "0 0 * * *"}
option v = {timeRangeStart: -1d, timeRangeStop: now()}

data = from(bucket: "homeassistant")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == "0x00158d00067beedf_temperature")
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> set(key: "_field",value: "max_value")
    |> yield(name: "max")
    |> to(bucket: "downsample", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> set(key: "_field",value: "min_value")
    |> yield(name: "min")
    |> to(bucket: "downsample", org: "sineverba")

Or you could use reduce and get two columns:

option task = {name: "BALCONE_MAX", cron: "0 0 * * *"}
option v = {timeRangeStart: -1d, timeRangeStop: now()}

data = from(bucket: "homeassistant")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == "0x00158d00067beedf_temperature")
    |> filter(fn: (r) => r["_field"] == "value")

data
|> reduce(
      identity: {count: 0.0, sum: 0.0, min: 0.0, max: 0.0, mean: 0.0},
      fn: (r, accumulator) => ({
        count: accumulator.count + 1.0,
        sum: r._value + accumulator.sum,
        min: if accumulator.count == 0.0 then r._value else if r._value < accumulator.min then r._value else accumulator.min,
        max: if accumulator.count == 0.0 then r._value else if r._value > accumulator.max then r._value else accumulator.max,
        mean: (r._value + accumulator.sum) / (accumulator.count + 1.0)
      })
    )
|> drop(columns: ["mean", "count", "sum"])
|> to(bucket: "downsample", org: "sineverba")

I included the sum and mean just because I think it’s cool and then dropped them so you could learn about that too.

If you go with the first approach its worth mentioning that the aggregatewindow function is redundant since you’re aggregating over the entire range you’re querying for in that task. It could be also written as:

option task = {name: "BALCONE_MAX", cron: "0 0 * * *"}
option v = {timeRangeStart: -1d, timeRangeStop: now()}

data = from(bucket: "homeassistant")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == "0x00158d00067beedf_temperature")
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> max() 
    |> set(key: "_field",value: "max_value")
    |> yield(name: "max")
    |> to(bucket: "downsample", org: "sineverba")

data
    |> min()
    |> set(key: "_field",value: "min_value")
    |> yield(name: "min")
    |> to(bucket: "downsample", org: "sineverba")

However bare selectors drop columns that aren’t in the group key so I understand why you wanted to use aggregateWindow(). Just worth keeping in mind.
I like to test my tasks by submitting the query in the data explorer while commenting out the to() function (with cmd + / ) to make sure that the results are what I expect before creating the task.

I hope this helps let me know what I can clarify.

sineverba · March 9, 2022, 7:36am

I want say thank you, you solved my question.

I want to do you another one:

I went with your first approach. If I have two or more entity_id, how can I place all of them in same bucket?

Simply duplicating so many task so many entity id I have?

E.g:

option task = {
    name: "BALCONE DOWNSAMPLE",
    cron: "0 0 * * *",
}

option v = {timeRangeStart: -1d, timeRangeStop: now()}

data = from(bucket: "homeassistant-aprilia")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == "0x00158d00067beedf_temperature" || r.entity_id == "another_one")
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> set(key: "_field", value: "balcone_max_value")
    |> yield(name: "balcone_max")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> set(key: "_field", value: "balcone_min_value")
    |> yield(name: "balcone_min")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: mean, createEmpty: false)
    |> set(key: "_field", value: "balcone_mean_value")
    |> yield(name: "balcone_mean")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

sineverba · March 10, 2022, 6:21am

@Anaisdg a small up for me…

Anaisdg · March 21, 2022, 4:49pm

Hello @sineverba,
I’m not quite sure I understand your question.
You want to find min and max across entity_id?
You can group() on entity_id.

Or just filter for multiple entity_ids.

|> filter(fn: (r) => r.entity_id == “0x00158d00067beedf_temperature” or r.entity_id == “another_one”)

sineverba · March 24, 2022, 7:44pm

At the end I wrote three different tasks.

option task = { 
  name: "CAMERA",
  cron: "0 0 * * *",
}

option v = {timeRangeStart: -1d, timeRangeStop: now()}
option entity_id = "0x00camera_temperature"
option prefix = "camera_"

data = from(bucket: "homeassistant-aprilia")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == entity_id)
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> set(key: "_field", value: prefix+"max_value")
    |> yield(name: prefix+"max")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> set(key: "_field", value: prefix+"min_value")
    |> yield(name: prefix+"min")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: mean, createEmpty: false)
    |> set(key: "_field", value: prefix+"mean_value")
    |> yield(name: prefix+"mean")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

option task = { 
  name: "BALCONE",
  cron: "0 0 * * *",
}

option v = {timeRangeStart: -1d, timeRangeStop: now()}
option entity_id = "0x00158d00067beedf_temperature"
option prefix = "balcone_"

data = from(bucket: "homeassistant-aprilia")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == entity_id)
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> set(key: "_field", value: prefix+"max_value")
    |> yield(name: prefix+"max")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> set(key: "_field", value: prefix+"min_value")
    |> yield(name: prefix+"min")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: mean, createEmpty: false)
    |> set(key: "_field", value: prefix+"mean_value")
    |> yield(name: prefix+"mean")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

option task = { 
  name: "SALONE",
  cron: "0 0 * * *",
}

option v = {timeRangeStart: -1d, timeRangeStop: now()}
option entity_id = "0x00158d0aaaaaa_temperature"
option prefix = "salone_"

data = from(bucket: "homeassistant-aprilia")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == entity_id)
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> set(key: "_field", value: prefix+"max_value")
    |> yield(name: prefix+"max")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> set(key: "_field", value: prefix+"min_value")
    |> yield(name: prefix+"min")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: mean, createEmpty: false)
    |> set(key: "_field", value: prefix+"mean_value")
    |> yield(name: prefix+"mean")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

You can see, same task duplicated x 3 times, difference is only the starting ENTITY.

IS it possible to merge them in only one task?

sineverba · March 28, 2022, 7:21pm

@Anaisdg an up for me.

“IS it possible to merge them in only one task?”

Thank you

Anaisdg · April 1, 2022, 8:06pm

Hello @sineverba,
You can do:

...
data = from(bucket: "homeassistant-aprilia")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == entity_id1 or  r.entity_id == entity_id2 or  r.entity_id == entity_id3)
    |> filter(fn: (r) => r["_field"] == "value")
...

sineverba · April 2, 2022, 7:55am

And… set key or yeld section?

Every entity need to have his custom name… or no?

Anaisdg · April 4, 2022, 6:46pm

The data should be grouped by each tag or entity_id. So each aggregate window will be applied to each entity_id.
How you want to display and name and any additional transformations is up to you.
I recommend trying it out like that and using a yield() statement before running the task to make sure that the output is what you want.

Idk if you’re a InfluxQL user but these might helpful:

sineverba · April 5, 2022, 6:39pm

I tried to do like following code:

option task = {
    name: "DOWNSAMPLE APRILIA TEMPERATURE",
    cron: "0 * * * *",
}

option v = {timeRangeStart: -1d, timeRangeStop: now()}
option balcone_entity_id = "0x00158d00067beedf_temperature"
option salone_entity_id = "0x00158d000484fb9a_temperature"
option camera_entity_id = "0x00158d0004a01639_temperature"

data = from(bucket: "homeassistant-aprilia")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == balcone_entity_id or r.entity_id == salone_entity_id or r.entity_id == camera_entity_id)
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> group(columns: ["entity_id"], mode: "by")
    |> yield(name: "max")
    |> to(bucket: "downsample-temperature-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> group(columns: ["entity_id"], mode: "by")
    |> yield(name: "min")
    |> to(bucket: "downsample-temperature-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: mean, createEmpty: false)
    |> group(columns: ["entity_id"], mode: "by")
    |> yield(name: "mean")
    |> to(bucket: "downsample-temperature-aprilia", org: "sineverba")

But, I can imagine that last yield(name) overwrite 2 previous value, cause I get only 3 values instead of 9 values attended.

If I try the code in script editor, WITHOUT the to section, I get all 9 values.

So, I can imagine that I need to customize the yield name for every entity.

Please, see following image to demonstrate my words. First two, are tests in bucket script editor (without the to section).

Last one, is result from a task (so, with a to section)

sineverba · April 7, 2022, 1:54pm

@Anaisdg An up for me

Anaisdg · April 7, 2022, 5:00pm

Hello @sineverba,
Thanks for mentioning me.
I’d try to explicitly call your entity tag in the to() funciton.

you also don’t need to group by entitiy id as tables are grouped by tags by default and you loose the rest of your default grouping.

to(
    bucket: "my-bucket",
    org: "my-org",
    host: "https://us-west-2-1.aws.cloud2.influxdata.com",
    token: "mY5uP3rS3cRe7t0k3n",
    timeColumn: "_time",
    tagColumns: ["tag1", "tag2", "tag3"],
)

sineverba · April 7, 2022, 5:22pm

Mmmm… I cannot understand you.

In a previous post, you told me to use the group by.

Now I don’t need to use the group by and instead use the tag (I don’t have tags…, or it seems to me, you can see my data in previous screenshots)…

Can I get a paid support of 30 minutes / 1 hour?

I called also an Italian company our partner. They don’t answered to me after quote request (probably they don’t want do a consulting for a private).

I would solve my question one time at all…

Thank you

Anaisdg · April 7, 2022, 8:47pm

Hello @sineverba,
I’m sorry I see how my wording is confusing. entity_id is a tag.
When I said

The data should be grouped by each tag or entity_id.

I meant that it already is tagged. But fret not. Adding a group shouldn’t really hurt. Im sorry for confusing you.

Anaisdg · April 7, 2022, 8:56pm

This should work:

option task = {
    name: "DOWNSAMPLE APRILIA TEMPERATURE",
    cron: "0 * * * *",
}

option v = {timeRangeStart: -1d, timeRangeStop: now()}
option balcone_entity_id = "0x00158d00067beedf_temperature"
option salone_entity_id = "0x00158d000484fb9a_temperature"
option camera_entity_id = "0x00158d0004a01639_temperature"

data = from(bucket: "homeassistant-aprilia")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == balcone_entity_id or r.entity_id == salone_entity_id or r.entity_id == camera_entity_id)
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> yield(name: "max")
    |> to(bucket: "downsample-temperature-aprilia", org: "sineverba", tagColumns: ["entity_id"])

data
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> yield(name: "min")
    |> to(bucket: "downsample-temperature-aprilia", org: "sineverba", tagColumns: ["entity_id"])

data
    |> aggregateWindow(every: 24h, fn: mean, createEmpty: false)
    |> yield(name: "mean")
    |> to(bucket: "downsample-temperature-aprilia", org: "sineverba", tagColumns: ["entity_id"])

Please give that a try and lmk if it works. Thanks for bearing with me. We’ll get this.

sineverba · April 7, 2022, 9:02pm

No problem for confusing
Your solution by the way seems exactly the same that I tried at this post (#11):

Or am I wrong?

In other words, seems that yield not customized overwrite other 2 (of three total) values, when used with “to”.

See my #11 post.

Anaisdg · April 7, 2022, 9:28pm

It’s not the same because we added the tagColumns. However you have a point because according the documentation,

* `_time`
* `_measurement`
* `_field`
* `_value`

*All other columns are written to InfluxDB as [tags](https://docs.influxdata.com/influxdb/v2.2/reference/key-concepts/data-elements/#tags).*

So adding that should be redundant. Can you query for your entity tag? in your second screenshot?

sineverba · April 8, 2022, 6:27am

Also adding the entity_id, I got only 3 of 9 expected values.

sineverba · April 21, 2022, 6:22am

I’m looking for solve this issue.
Lost also @Anaisdg in PM, no more answer by her

Is there some staff support available? Thank you

Topic		Replies	Views
Task - daily min/max Tasks	3	1707	December 13, 2021
Downsampling data into another bucket Tasks cq , tasks , downsample	5	2065	June 4, 2021
Creating a Downsampling Task with conditions Tasks	4	245	February 16, 2024
Aggregating min and max for many fields InfluxDB 2	1	1288	March 21, 2023
How do I downsample data using multiple operations within the same bucket (in Influxdb 2.0)?	2	1959	September 23, 2020

Issue with min - max - mean tasks

Related topics