Continuous Query, Fill, last point missing - race?

Linwood · March 21, 2020, 3:16am

I have temperatures that arrive (from Home Assistant) and random times (whenever there is a change).

I have a continuous query that rolls these up into hourly temperatures, it looks like this (spacing mine for clarity):

CREATE CONTINUOUS QUERY Temp_Hourly ON HA 
    RESAMPLE EVERY 1h FOR 2d 
    BEGIN 
        SELECT mean(value) AS Temp INTO HA.autogen.HourlyTemps 
        FROM HA.autogen."°F" 
        GROUP BY *, time(1h) fill(previous) 
    END

This works fine, but is always an hour behind; the 10pm run for example fills to 9pm only.

If I ran it manually (between 10pm and 11pm as I just did) it works fine.

There is data between 2d ago and 10pm, plenty of it. Indeed for one series there was actually a data point at 21:58:29 present when it ran at 22:00, but it only created a data point for 21:00.

I tried the offset in the time() but it produced time stamps offfset, which is not what I wanted. This does work, I think, if I run it every 30 minutes, which is a viable workaround I guess but…

Should this store a time for 22:00 when run at 22:00?

Is it some kind of race condition, running at the hour, but thinking it’s not quite at the hour? I can’t find a way to run at (say) 22:05 but aggregate on whole hour boundaries. It is 100% reliable at producing data, just an hour behind.

If not, is there a more appropriate way to do this?

Using Influxdb 1.7.9 in a docker as part of Home Assistant running on Linux, setting the queries in the CLI not a UI.

PS. The reason for the 2d is just in case the sending system is down for a while; two days is probably 10 times too long, but at the moment I seem to have plenty of horsepower.

Linwood · March 21, 2020, 4:01pm

So I’ve been experimenting and looking at the documentation and I think I understand – maybe – the reason it is not working.

The documentation says the following:

When the CQ executes, it runs a single query for the time range between now() and now() minus the GROUP BY time() interval.

OK, that makes sense, the problem is that it is not precise. I found a more complete definition in the example in another place:

If the GROUP BY time() interval is one hour and the current time is 17:00, the query’s time range is between 16:00 and 16:59.999999999.

So I think what happens is the implied interval for continuous queries does NOT include the time it runs, e.g. 17:00 in the above example. To try to confirm I did two manual runs that looked like this:

and time <= '2020-03-21T11:00:00-04:00'  GROUP BY *, time(1h) fill(previous)
and time < '2020-03-21T11:00:00-04:00'  GROUP BY *, time(1h) fill(previous)

The former of these includes the 11:00 data point, the latter does not.

I cannot see a way to override this in the continuous query, as it seems to ignore any time frames (I tried using for example ‘now() + 3m’).

I think this makes a certain amount of sense; if you were doing accumulations like sum() you might want whole intervals only. But if you are doing selections like max, min, mean – then partial intervals make more sense. Even with accumulations having “to date” might make sense in some cases.

So back to the original question: is there a workaround for continuous queries to specify to include the end point? Right now I’m running it every 30 minutes to fill the gap, making it twice the work and still a 30 minute delay; doing this to get a 5 minute delay puts it at 12 times the work. Not ideal…

Another approach?

Topic		Replies	Views
Delay continuous queries execution influxdb	0	1008	April 2, 2018
Daily continuous query gets timestamp of previous day influxdb , time-series , query	6	2757	November 7, 2020
Continuous Query Lagged by One Interval	2	832	September 11, 2017
Dealing with missing fill data at the begin of a continuous query influxql , query	1	613	November 11, 2019
Continuous query gives different results than manual query execution InfluxDB 2 influxdb , query	2	667	June 24, 2020

Continuous Query, Fill, last point missing - race?

Related topics