Issue with min - max - mean tasks

Hi to all.
I have a bucket (homeassistant) that collects info from a temperature / humidity sensor.

Of course, I want to downsample his data to min/max/mean of previous day.

Inspired by this post, I created a new bucket downsample and two tasks for the moment:

option task = {name: "BALCONE_MAX", cron: "0 0 * * *"}
option v = {timeRangeStart: -1d, timeRangeStop: now()}

from(bucket: "homeassistant")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == "0x00158d00067beedf_temperature")
    |> filter(fn: (r) => r["_field"] == "value")
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> yield(name: "max")
    |> to(bucket: "downsample", org: "sineverba")

And

option task = {name: "BALCONE_MIN", cron: "0 0 * * *"}
option v = {timeRangeStart: -1d, timeRangeStop: now()}

from(bucket: "homeassistant")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == "0x00158d00067beedf_temperature")
    |> filter(fn: (r) => r["_field"] == "value")
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> yield(name: "min")
    |> to(bucket: "downsample", org: "sineverba")

They run both at 00:00 and save min and max in same bucket.

But today I checked and… I have only a single point (only the min value, in reality), not the max.

Is it possible to save both value inside same bucket?

Hello @sineverba,
yes there are multiple ways to do this:

option task = {name: "BALCONE_MAX", cron: "0 0 * * *"}
option v = {timeRangeStart: -1d, timeRangeStop: now()}

data = from(bucket: "homeassistant")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == "0x00158d00067beedf_temperature")
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> set(key: "_field",value: "max_value")
    |> yield(name: "max")
    |> to(bucket: "downsample", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> set(key: "_field",value: "min_value")
    |> yield(name: "min")
    |> to(bucket: "downsample", org: "sineverba")

Or you could use reduce and get two columns:

option task = {name: "BALCONE_MAX", cron: "0 0 * * *"}
option v = {timeRangeStart: -1d, timeRangeStop: now()}

data = from(bucket: "homeassistant")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == "0x00158d00067beedf_temperature")
    |> filter(fn: (r) => r["_field"] == "value")

data
|> reduce(
      identity: {count: 0.0, sum: 0.0, min: 0.0, max: 0.0, mean: 0.0},
      fn: (r, accumulator) => ({
        count: accumulator.count + 1.0,
        sum: r._value + accumulator.sum,
        min: if accumulator.count == 0.0 then r._value else if r._value < accumulator.min then r._value else accumulator.min,
        max: if accumulator.count == 0.0 then r._value else if r._value > accumulator.max then r._value else accumulator.max,
        mean: (r._value + accumulator.sum) / (accumulator.count + 1.0)
      })
    )
|> drop(columns: ["mean", "count", "sum"])
|> to(bucket: "downsample", org: "sineverba")

I included the sum and mean just because I think it’s cool and then dropped them so you could learn about that too.

If you go with the first approach its worth mentioning that the aggregatewindow function is redundant since you’re aggregating over the entire range you’re querying for in that task. It could be also written as:

option task = {name: "BALCONE_MAX", cron: "0 0 * * *"}
option v = {timeRangeStart: -1d, timeRangeStop: now()}

data = from(bucket: "homeassistant")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == "0x00158d00067beedf_temperature")
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> max() 
    |> set(key: "_field",value: "max_value")
    |> yield(name: "max")
    |> to(bucket: "downsample", org: "sineverba")

data
    |> min()
    |> set(key: "_field",value: "min_value")
    |> yield(name: "min")
    |> to(bucket: "downsample", org: "sineverba")

However bare selectors drop columns that aren’t in the group key so I understand why you wanted to use aggregateWindow(). Just worth keeping in mind.
I like to test my tasks by submitting the query in the data explorer while commenting out the to() function (with cmd + / ) to make sure that the results are what I expect before creating the task.

I hope this helps let me know what I can clarify.

1 Like

I want say thank you, you solved my question.

I want to do you another one:

I went with your first approach. If I have two or more entity_id, how can I place all of them in same bucket?

Simply duplicating so many task so many entity id I have?

E.g:

option task = {
    name: "BALCONE DOWNSAMPLE",
    cron: "0 0 * * *",
}

option v = {timeRangeStart: -1d, timeRangeStop: now()}

data = from(bucket: "homeassistant-aprilia")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == "0x00158d00067beedf_temperature" || r.entity_id == "another_one")
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> set(key: "_field", value: "balcone_max_value")
    |> yield(name: "balcone_max")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> set(key: "_field", value: "balcone_min_value")
    |> yield(name: "balcone_min")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: mean, createEmpty: false)
    |> set(key: "_field", value: "balcone_mean_value")
    |> yield(name: "balcone_mean")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

@Anaisdg a small up for me…

Hello @sineverba,
I’m not quite sure I understand your question.
You want to find min and max across entity_id?
You can group() on entity_id.

Or just filter for multiple entity_ids.

|> filter(fn: (r) => r.entity_id == “0x00158d00067beedf_temperature” or r.entity_id == “another_one”)

At the end I wrote three different tasks.

option task = { 
  name: "CAMERA",
  cron: "0 0 * * *",
}

option v = {timeRangeStart: -1d, timeRangeStop: now()}
option entity_id = "0x00camera_temperature"
option prefix = "camera_"

data = from(bucket: "homeassistant-aprilia")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == entity_id)
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> set(key: "_field", value: prefix+"max_value")
    |> yield(name: prefix+"max")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> set(key: "_field", value: prefix+"min_value")
    |> yield(name: prefix+"min")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: mean, createEmpty: false)
    |> set(key: "_field", value: prefix+"mean_value")
    |> yield(name: prefix+"mean")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

option task = { 
  name: "BALCONE",
  cron: "0 0 * * *",
}

option v = {timeRangeStart: -1d, timeRangeStop: now()}
option entity_id = "0x00158d00067beedf_temperature"
option prefix = "balcone_"

data = from(bucket: "homeassistant-aprilia")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == entity_id)
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> set(key: "_field", value: prefix+"max_value")
    |> yield(name: prefix+"max")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> set(key: "_field", value: prefix+"min_value")
    |> yield(name: prefix+"min")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: mean, createEmpty: false)
    |> set(key: "_field", value: prefix+"mean_value")
    |> yield(name: prefix+"mean")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

option task = { 
  name: "SALONE",
  cron: "0 0 * * *",
}

option v = {timeRangeStart: -1d, timeRangeStop: now()}
option entity_id = "0x00158d0aaaaaa_temperature"
option prefix = "salone_"

data = from(bucket: "homeassistant-aprilia")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == entity_id)
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> set(key: "_field", value: prefix+"max_value")
    |> yield(name: prefix+"max")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> set(key: "_field", value: prefix+"min_value")
    |> yield(name: prefix+"min")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: mean, createEmpty: false)
    |> set(key: "_field", value: prefix+"mean_value")
    |> yield(name: prefix+"mean")
    |> to(bucket: "downsample-aprilia", org: "sineverba")

You can see, same task duplicated x 3 times, difference is only the starting ENTITY.

IS it possible to merge them in only one task?

@Anaisdg an up for me.

“IS it possible to merge them in only one task?”

Thank you

Hello @sineverba,
You can do:

...
data = from(bucket: "homeassistant-aprilia")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == entity_id1 or  r.entity_id == entity_id2 or  r.entity_id == entity_id3)
    |> filter(fn: (r) => r["_field"] == "value")
...

And… set key or yeld section?

Every entity need to have his custom name… or no?

The data should be grouped by each tag or entity_id. So each aggregate window will be applied to each entity_id.
How you want to display and name and any additional transformations is up to you.
I recommend trying it out like that and using a yield() statement before running the task to make sure that the output is what you want.

Idk if you’re a InfluxQL user but these might helpful:

I tried to do like following code:

option task = {
    name: "DOWNSAMPLE APRILIA TEMPERATURE",
    cron: "0 * * * *",
}

option v = {timeRangeStart: -1d, timeRangeStop: now()}
option balcone_entity_id = "0x00158d00067beedf_temperature"
option salone_entity_id = "0x00158d000484fb9a_temperature"
option camera_entity_id = "0x00158d0004a01639_temperature"

data = from(bucket: "homeassistant-aprilia")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == balcone_entity_id or r.entity_id == salone_entity_id or r.entity_id == camera_entity_id)
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> group(columns: ["entity_id"], mode: "by")
    |> yield(name: "max")
    |> to(bucket: "downsample-temperature-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> group(columns: ["entity_id"], mode: "by")
    |> yield(name: "min")
    |> to(bucket: "downsample-temperature-aprilia", org: "sineverba")

data
    |> aggregateWindow(every: 24h, fn: mean, createEmpty: false)
    |> group(columns: ["entity_id"], mode: "by")
    |> yield(name: "mean")
    |> to(bucket: "downsample-temperature-aprilia", org: "sineverba")

But, I can imagine that last yield(name) overwrite 2 previous value, cause I get only 3 values instead of 9 values attended.

If I try the code in script editor, WITHOUT the to section, I get all 9 values.

So, I can imagine that I need to customize the yield name for every entity.

Please, see following image to demonstrate my words. First two, are tests in bucket script editor (without the to section).

Last one, is result from a task (so, with a to section)



@Anaisdg An up for me :slight_smile:

Hello @sineverba,
Thanks for mentioning me.
I’d try to explicitly call your entity tag in the to() funciton.

you also don’t need to group by entitiy id as tables are grouped by tags by default and you loose the rest of your default grouping.

to(
    bucket: "my-bucket",
    org: "my-org",
    host: "https://us-west-2-1.aws.cloud2.influxdata.com",
    token: "mY5uP3rS3cRe7t0k3n",
    timeColumn: "_time",
    tagColumns: ["tag1", "tag2", "tag3"],
)

Mmmm… I cannot understand you.

In a previous post, you told me to use the group by.

Now I don’t need to use the group by and instead use the tag (I don’t have tags…, or it seems to me, you can see my data in previous screenshots)…

Can I get a paid support of 30 minutes / 1 hour?

I called also an Italian company our partner. They don’t answered to me after quote request (probably they don’t want do a consulting for a private).

I would solve my question one time at all…

Thank you

Hello @sineverba,
I’m sorry I see how my wording is confusing. entity_id is a tag.
When I said

The data should be grouped by each tag or entity_id.

I meant that it already is tagged. But fret not. Adding a group shouldn’t really hurt. Im sorry for confusing you.

This should work:

option task = {
    name: "DOWNSAMPLE APRILIA TEMPERATURE",
    cron: "0 * * * *",
}

option v = {timeRangeStart: -1d, timeRangeStop: now()}
option balcone_entity_id = "0x00158d00067beedf_temperature"
option salone_entity_id = "0x00158d000484fb9a_temperature"
option camera_entity_id = "0x00158d0004a01639_temperature"

data = from(bucket: "homeassistant-aprilia")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r.entity_id == balcone_entity_id or r.entity_id == salone_entity_id or r.entity_id == camera_entity_id)
    |> filter(fn: (r) => r["_field"] == "value")

data
    |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    |> yield(name: "max")
    |> to(bucket: "downsample-temperature-aprilia", org: "sineverba", tagColumns: ["entity_id"])

data
    |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    |> yield(name: "min")
    |> to(bucket: "downsample-temperature-aprilia", org: "sineverba", tagColumns: ["entity_id"])

data
    |> aggregateWindow(every: 24h, fn: mean, createEmpty: false)
    |> yield(name: "mean")
    |> to(bucket: "downsample-temperature-aprilia", org: "sineverba", tagColumns: ["entity_id"])

Please give that a try and lmk if it works. Thanks for bearing with me. We’ll get this.

No problem for confusing :slight_smile:
Your solution by the way seems exactly the same that I tried at this post (#11):

Or am I wrong?

In other words, seems that yield not customized overwrite other 2 (of three total) values, when used with “to”.

See my #11 post.

It’s not the same because we added the tagColumns. However you have a point because according the documentation,

* `_time`
* `_measurement`
* `_field`
* `_value`

*All other columns are written to InfluxDB as [tags](https://docs.influxdata.com/influxdb/v2.2/reference/key-concepts/data-elements/#tags).*

So adding that should be redundant. Can you query for your entity tag? in your second screenshot?

Also adding the entity_id, I got only 3 of 9 expected values.

I’m looking for solve this issue.
Lost also @Anaisdg in PM, no more answer by her :frowning:

Is there some staff support available? Thank you