Best practices for storing same data but with different frequency

I have data that comes in 2times per hour.
I use aggregatewindow to generate the last value for each hour
Now i want so store the same data but daily and monthly.

Should i just use tags and store within the same measurement or use a new measurement instead?

for example:

add a tag frequency = daily

or better generate a new measurement like “Temperature_daily”

Now i decided to go for using a tag.

from(bucket: “Temp”)
|> range(start: -48h)
|> filter(fn: (r) => r[“_measurement”] == “Gaszähler(aktueller Zählerstand)”)
|> aggregateWindow(every: 1h, fn: last, createEmpty: false)
|> rename(columns: {source: “physicaladdress”})
|> set(key: “manufacturer”, value: “arcus-eds”)
|> set(key: “device”, value: “KNX-IMPZ1-SK01”)
|> set(key: “source”, value: “KNX”)
|> set(key: “agg_type”, value: “hourly”)
|> drop(columns: [“host”])
|> to(bucket: “Gas Langzeit”)

is this the way to go?

adding some tags and rename the tag “source” into physicaladress and using the old tag “source” for the static information “KNX”

what happens if the tag “source” doesnt exist? Will the script fail or just skip this line?

@user642 The recommended path here would be to store the “downsampled” data in a separate bucket with a different retention period. I assume you want to keep the lower frequency data around for longer and “expire” the high frequency data after a given period of time. This is why buckets have a retention period–so they can automatically evict data that is no longer needed.

For example, let’s say you have three buckets:

  • raw (retention period: 1d)
  • daily (retention period: 30d)
  • monthly (retention period: 365d)

You would then configure two tasks:

  • One that queries data from the raw bucket, downsamples it into daily summaries, and stores the daily summaries in the daily bucket.
  • One that queries data from the daily bucket, downsamples it into monthly summaries, and stores the daily summaries in the monthly bucket.

That way you don’t have to modify the data at all, other than downsampling in some way. And if you wanted to query data from all the buckets at the same time, you would do this:

startTime = -1y
stopTime = now()

getData = (bucket) =>
    from(bucket: bucket)
        |> range(start: startTime, stop: stopTime)
        |> filter(fn: (r) => ...)

union(tables: [getTable(bucket: "raw"), getTable(bucket: "daily"), getTable(bucket: "monthly"])