Hello,
I am currently starting to learn how I can downsample / aggregate historical data to reduce the amount of storage required.
As a start I am using some data from my energy meter. The issue I am facing now is that the result of a query against both the raw and the aggregated data is not the same (not even close enough).
The query for my raw data looks like this:
from(bucket: "Smarthome")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "Energie")
|> filter(fn: (r) => r._field == "exportedWh" or r._field == "generatedWh" or r._field == "importedWh")
|> difference()
|> sum()
|> pivot(rowKey: ["_start"], columnKey: ["_field"], valueColumn: "_value")
|> map(fn: (r) => ({r with _value: r.generatedWh - r.exportedWh + r.importedWh}))
|> keep(columns: ["_value"])
Now I have created two different tasks for downsampling. Both eventually lead to the same result later on, but just for completeness:
Option 1:
import "date"
option task = {name: "generatedWh", every: 1h, offset: 5m}
fullHourTime = date.truncate(t: now(), unit: 1h)
startTime = date.sub(from: fullHourTime, d: task.every)
from(bucket: "Smarthome")
|> range(start: startTime, stop: fullHourTime)
|> filter(fn: (r) => r._measurement == "Energie" and r._field == "generatedWh")
|> filter(fn: (r) => r._value > 0)
|> difference()
|> aggregateWindow(every: 15m, fn: sum, createEmpty: true)
|> to(bucket: "Smarthome-history", fieldFn: (r) => ({"generatedWh": r._value}))
Option 2:
option task = {name: "Test_Aggregation_generatedWh", every: 1h, offset: 5m}
from(bucket: "Smarthome")
|> range(start: -task.every)
|> filter(fn: (r) => r["_measurement"] == "Energie")
|> filter(fn: (r) => r["_field"] == "generatedWh")
|> filter(fn: (r) => r._value > 0)
|> difference()
|> aggregateWindow(every: 15m, fn: sum, createEmpty: true)
|> to(bucket: "Smarthome_aggregation_test", fieldFn: (r) => ({"generatedWh": r._value}))
I then use the following type of query to get results from the aggregation and compare it with the raw results:
from(bucket: "Smarthome-history")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "Energie" and r._field == "generatedWh")
|> sum()
I have no clue where those differences are actually coming from. Can you have a look and help me to understand this.
Thanks and regards,
Matthias