Data Aggregation/Reduction and TimeStamp?

Pete0815 · November 25, 2020, 3:46pm

Hi
as written before I´m very new and unskilled in programming and therefore trying to learn and find the right concept for my home automation system where an infludB is used for data like the current power of the freezer or dish washer etc. This data is stored into influxdB based on changes of the value. No change = No input to influxDB.
Based on these information I´m calculating the total power consumption per specified time range. e.g. 5kWh per day

Doing such queries for bigger time ranges will also mean a lot of data. Therefore I thought of a regular data reduction by calculating the integral for 1 hour and deleting all datapoints within this hour. But what should I do with the timestamp? According my understanding the timestamp is automatically set by influxdB. So in case I´ll do this data reduction once a day (at the end) each value for each hour will have the timestamp of the calculation.

What do you recommend? Isn´t there another possibility instead of doing this calculation every hour? The time needed for the calculation to the database writing will in any case be a “wrong” information.

Are there any templates/standard procedures for data aggegration/reduction?

Thank you very much!

Anaisdg · December 3, 2020, 4:55pm

Hello @Pete0815,
Are you using 2.x or 1.x?
Yes we generally recommend that you run tasks more frequently to reduce the load on the query engine.

your task would look something like:

option task = {name: "mytask", every:1h}

from(bucket: “solar”)
|> range(start: -task.every)
|> filter(fn: ® => r["_measurement"] == “battery”)
|> filter(fn: ® => r["_field"] == “kWh”)
|> integral(unit:1h)
|> to(bucket: "downsampled")

For this example my source bucket (solar) with the raw data would have a small retention policy (maybe 1d if I’m downsampling every hour, it depends on how conservative you want to be) and the data deletes after a day. Then perform the integral downsampling and write the data to your new bucket.

Here is the docs for integral() and to()

You can also use the starlark telegraf plugin

Or the execd plugin

For custom aggregations, calcuations, etc.

These blogs on tasks might be useful to you:

Pete0815 · December 4, 2020, 1:45pm

Hello @Anaisdg,

Thank you very much for your work in answering with such huge amount of information.
I have to check careful to understand.

I´m using influxDB 1.8.3 at a Raspberry Pi (32-Bit OS) so not 100% sure if switching to 64-BIT OS for influxdB 2.0 is possible because of other installed packages, will check further.

Using flux is also possible in v1.8.3 which I already tried and had my first success, but are tasks also possible in v2.0? I think I have to use continuous queries for automatic downsampling, right?

Therefore I´m also limited to influxQL by this continuous queries, right?

Currently I´m also checking by manual Queries if my problems with the timestamp can be solved by using Flux. Using influxQL/continuous queries brings up the effect that for integral the first timestamp of the defined period is used/transfered with the result of the query. This is a big problem for me and do not correspond to result which is shown by an aggregation like integral or sum.

When you are calculating an integral of electric power you would like to see the total electr. enery which is used by status/timestamp of today looking in the past.
Same for water flow. You are using sum() and the total amount of water which is used in the past a represented by the result of sum() but the timestamp should be today/end of grouped by period and not the beginning.

Therefore I´m struggeling so much whith the handling of influxDB because it´s looking very obvious that this is problematic for this used cases. Or have I understood sth. wrong?
Big Thx

Anaisdg · December 4, 2020, 8:51pm

Hello @Pete0815,
Unfortunately I don’t think that’s possible with CQ. I think you’d have to use Kapacitor and the shiftNode ShiftNode | Kapacitor 1.5 Documentation to get the output to include the timestamp at end of the integral window.

@scott Do you know if there’s any other solution for CQ?

scott · December 4, 2020, 9:32pm

If you use CQ advanced sytnax and the FOR clause, you can calculate mulitple “windows” per CQ execution. The output includes a single point for each window and the timestamp should be the right time bound of the window. For example:

CREATE CONTINUOUS QUERY "cq_integral_downsample" ON "example-db"
RESAMPLE FOR 3h
BEGIN
  SELECT mean("kWh") INTO "avg_kWh" FROM "example" GROUP BY time(1h)
END

Pete0815 · December 5, 2020, 10:53am

Hello @Anaisdg and @scott,

Thank you very much for great support.
Tried according your solution and to be sure I verfied the results. Which seems not to be diffrent from before. Please have a look:

Created CQ On longterm:

CREATE CONTINUOUS QUERY “15mPWRACGes” ON “longterm”
RESAMPLE FOR 1h
BEGIN
SELECT INTEGRAL(“value”) /3600000 AS value
INTO longterm.autogen.“javascript.0.scriptEnabled.PV.WRPACges”
FROM shortterm.autogen.“javascript.0.scriptEnabled.PV.WRPACges”
GROUP BY time(10m),*
END

For verfication I used normal query on shorterm where I´m sure from before that the result will have a wrong timestamp for me:

SELECT integral(“value”) /3600000 AS “value”
FROM “iobroker”.“autogen”.“javascript.0.scriptEnabled.PV.WRPACges”
WHERE time > now() -2h GROUP BY time(10m)

Comparing the results by CSV Export or Chronograf direct view the timestamp seems still to be wrong because the results are equal.

shortterm longterm
2020-12-05T09:20:00.000Z 0.7608138420222742 0.7606152186433977
2020-12-05T09:30:00.000Z 0.7950924374295 0.7948884269062314
2020-12-05T09:40:00.000Z 0.8231317022117439 0.8231317022117439
2020-12-05T09:50:00.000Z 0.8521580706754914 0.8521580706754914
2020-12-05T10:00:00.000Z 0.8729888967670355 0.8729888967670355

Am I using the advanced syntax in a non proper way?

Thx!

scott · December 7, 2020, 5:20pm

@Pete0815 Those results are what I would expect from this query. You’re querying data from the last 2 hours and grouping that data into 10 minute “windows.” Each point returned represents the integral of a window and has the timestamp of the right bound of the window.

Pete0815 · December 7, 2020, 7:34pm

Hello @scott,

Please have a look to the attached data export (Excel file). I tried to make it more visible.
Within that export you can see the raw data from the shortterm database.

Then there is included manual integral calculation for the total electr. Energy in kWh in accordance to the time stamp and also manually calculated for 10 minutes.
(for understanding you can also open the hidden colums/rows.

In addition I copied in the data from the longterm database which is created by the continuous query with original timestamp from longterm db.

Here you can see that the value is always 1 period too early / 1 step in advance.

This is my problem with the timestamp when using an integral function. I also recognized this for other aggregate functions but let´s reduce our talk to integral() to make it easier to transfer my understanding and clarify what´s wrong or better find a solution

My guess is still, that timestamp by CQ is used from BEGINNING of aggregation window. This brings up faulty result because for integral() it can only be END of aggregation window(), right?

Short_Manual_Lontterm.xls.gz (97.9 KB)

Pete0815 · January 11, 2021, 1:52pm

Hello @Anaisdg & @scott,

even after several tries by myself and also research to find a solution the “problem” still exists for me. Even if influxDB works very well in term of speed I cannot make use of it for practical usage without aggregation function incl. right time stamp handling.

Do you see a solution for this?

I´m not exprierenced in using github and maybe you can support by helping how exactly this issue is solved? I cannot understand out of the link to the influxDB documentation:

github.com/influxdata/influxdb

[Feature Request] More useful timestamps for Aggregation Functions without GROUP BY

opened 06:46PM - 15 Jul 15 UTC

closed 11:06PM - 17 May 16 UTC

jjfazio

RFC kind/feature-request

Currently the docs state: "When performing an aggregation without a GROUP BY cl…ause, the timestamp returned with the aggregated value(s) will be epoch 0 (1970-01-01T00:00:00Z)." I think it would be nice for the timestamp returned to be more useful. It could either be the lower or upper bound in the WHERE clause. For example ``` SELECT SUM(value) AS value FROM temperature WHERE time > now() - 1d; ``` Instead of returning epoch 0 as the timestamp, this could return either (a) the date that now() returns or (b) the date now() - 1d returns. b seems like it would be more useful since you are summing up values starting at that date.

astrakid · March 12, 2021, 11:57am

not a real solution but maybe a workaround: you could shift the timestamps with timeShift-function (flux). as long as you have fixed intervals this should work.

Topic		Replies	Views
Delete data without losing information InfluxDB 2	4	502	January 10, 2023
Downsampling Power to Energy with influxdb2 Tasks	8	515	May 22, 2023
Downsample + Aggerate = Wrong Time Stamp? Store influxdb , time-series	0	528	November 28, 2020
Downsampling after time possible? InfluxDB 2	0	93	January 13, 2025
Calculate the total power usage Dashboards	4	2727	February 6, 2024

Data Aggregation/Reduction and TimeStamp?

Related topics