I needed to perform an integral on power and availability data in a continuous query, typically used for energy monitoring where you perform an integral on power state information (ie 1 or 0) or power data (consumption/production in W).
In my case, the source is emitting data at 1min intervals, time is not aligned with the server: timestamps are taken from the source.
Take for instance the power state information - as it is easier to calculate. The goal is to count the number of seconds that the state was ‘1’ (numeric).
Example data: (Time > state)
10:00:51 > 1
10:01:51 > 1
10:02:51 > 1
10:03:52 > 1
10:04:52 > 1
10:05:52 > 1
10:06:51 > 1
10:07:51 > 1
10:08:51 > 1
10:09:51 > 1
10:10:51 > 1
10:11:51 > 1
10:12:52 > 1
10:13:52 > 1
The idea is to count the number of seconds that the state is 1 and aggregate this into 5min intervals.
Let’s run the CQ at 10:15.
The obvious thing to do is:
CREATE CONTINUOUS QUERY my_query ON db
BEGIN
SELECT integral(“state”) as “state_on”
INTO db.rp.count
FROM “db.rp.raw”
GROUP BY time(5m)
END
Unfortunately, when running at 10:15, due to lag/unsynchronized clocks, the datapoint which was supposed to come in at about 10:14:52 is not there yet so the computed value is incomplete.
That’s where the advanced CQ syntax comes into play. Right?
CREATE CONTINUOUS QUERY my_query ON db
RESAMPLE FOR 10m
BEGIN
SELECT integral(“state”) as “state_on”
INTO db.rp.count
FROM “db.rp.raw”
GROUP BY time(5m)
END
So now we’re still running the CQ every 5min’s but each time, we’re providing it with a double amount of data. That shoud fix it, right?
Unfortunately, the CQ run at 10:15 is now showing a new problem at the start of the bucket for 10:05. Since the query input is limited to the dataset after 10:05. So for the first 52 seconds of that bucket, there’s no value present and the integral will not count in that interval.
The final solution?
CREATE CONTINUOUS QUERY my_query ON db
RESAMPLE FOR 14m
BEGIN
SELECT integral(“state”) as “state_on”
INTO db.rp.count
FROM “db.rp.raw”
GROUP BY time(5m)
END
By using a RESAMPLE FOR 14m (3*(group by time)-1), you ensure that when the query reruns for the n-1’th bucket, the integral function has enough data to know the initial value but avoid running the query again for the n-2’th bucket, as that would yield the same problem as before.
I hope it makes sense. It took me a little while to get my head around this behaviour and I think/hope it will help others in similar cases.
The question is however: is this recommended practice? Do you have an alternative way to handle this?
Cheers,
Maarten