Topic: Down sampling is not foolproof.
I enthusiastically started with flux
, and use it with custom functions in a project that includes:
-
Energy meter readings (smart meter) (1 second sample).
-
Weather data (10 min sample).
-
Solar panels data real time 10 minutes sample.
-
Central heating installation temperature (water-side adjustment temperature control) (30 sec sample).
-
Degree days calculations (KNMI) (custom function flux).
-
Variable hourly rate calculation (custom function flux) source: European Entsoe Database.
all these 6 sources come in through mqtt
and are put into a raw_bucket for downsampling.
So far it works perfectly.
Down sample sum up issue.
The following is a Down-Sample issue and is theoretical.
When downsampling, the last value always goes to the next sample.
It is not a problem with a long run, graphics or an average calculation,
As long as my salary doesn’t depend on it
But it has a great effect in a down-sampling chain.
-
you always miss the last sample
-
If you down-sampled in a chain from a down-sampled source, the error becomes much larger.
Example worked out
down sample chain 10m => 1h => 1d
There are 1440 minutes in a day
-
if i have a value 1 representation every minute then i expect 1440 units in a day get 1439 for that and the 1 goes to the next day.
-
If I first sum up per hour in a new bucket and get the info for summing up to a day from this bucket, then I miss 60 minutes and I get 1380 units . 60 goes to the next day.
-
If I down-sample from 10 minutes to hours and then from hours to days then I miss 10 minutes and 60 minutes = 70 minutes.
Here’s a flux script to demonstrate this.
Experiment script place or remove comment //
and see what happend.
import "generate"
start_proc = 2023-01-01T00:00:00Z
stop_proc = 2023-01-02T00:00:00Z
// for 2 day 's window'
//stop_proc = 2023-01-02T00:00:00Z
//for testing day range
//stop_proc = 2023-01-01T23:59:00Z
minuut_data =
generate.from(
// 24 * 60 = 1440 minuten
count: 1440,
// 2 * 24 * 60 =2880 for 2 days window
//count: 2880,
//fn: (n) => (n/60) ,
fn: (n) => (1) ,
//fn: (n) => (n),
start: start_proc,
stop: stop_proc
)
m10_data =
minuut_data
|> range(start: start_proc , stop: stop_proc)
|> aggregateWindow(every: 10m, fn: sum, createEmpty: false)
|> yield(name: "10 Minuten info")
uur_data =
m10_data
|> range(start: start_proc , stop: stop_proc)
|> aggregateWindow(every: 1h, fn: sum, createEmpty: false)
|> yield(name: "hour info")
uur_data
|> range(start: start_proc , stop: stop_proc)
|> aggregateWindow(every: 1d, fn: sum, createEmpty: false)
|> keep(columns: ["_time", "_value"])
|> yield(name: "test_output")
Meter reading with spread demo.
Downsample error with spread or (diff)
The error with downsampling increases if you use spread (diff) to read meter readings.
Example the meter increments 1 unit every minute
-
Then you have a spread of 1439 units in one day. 1 unit goes to the next day
-
If you spread on the hour and add that for the whole day you have a total of 1357 units instead of 1440 missing (23 + 60)
-
If you spread per 10 min and you add that up per hour and you add the hour for the whole day then you have 1233 units, you miss 6 units every hour and the last hour.
Demo fux script:
import "generate"
start_proc = 2023-01-01T00:00:00Z
stop_proc = 2023-01-02T00:00:00Z
minuut_data =
generate.from(
count: 1440,
fn: (n) => (n),
start: start_proc,
stop: stop_proc
)
m10_data =
minuut_data
|> range(start: start_proc , stop: stop_proc)
|> aggregateWindow(every: 10m, fn: spread, createEmpty: false)
|> yield(name: "10 Minuten info")
uur_data =
m10_data
|> range(start: start_proc , stop: stop_proc)
|> aggregateWindow(every: 1h, fn: sum, createEmpty: false)
|> yield(name: "hour info")
uur_data
|> range(start: start_proc , stop: stop_proc)
|> aggregateWindow(every: 1d, fn: sum, createEmpty: false)
|> keep(columns: ["_time", "_value"])
|> yield(name: "test_output")
This is a learning moment conclusion for me
Downsample as close to the source as possible.
I wish you lots of ‘flux’ fun.