Understanding how influx deals with time for GROUP BY and INTEGRAL?

#1

I’ve been struggling with two issues for a while, and hoping someone can shed some light.

One: The following query spans 10 days and uses a 1 day GROUP BY TIME, and I expected 10 records, however, received 11 where one of my expected records was split into the first and last record. Why does influx do this? Can the query be modified to return 10 records? How does influx determine which number of samples should go in these two boundary records (187 in one and 292 in the other for a total of 480). How can my integralCalc calculation be modified to return accurate results (first should be multiplied by 178/(178+302) and last multiplied by 302/(178+302))?

> SELECT MEAN(P49), COUNT(P49), INTEGRAL(P49), MEAN(P49)*1*24*60*60 AS integralCalc, (INTEGRAL(P49)-MEAN(P49)*1*24*60*60)/INTEGRAL(P49) AS integralError FROM L2 WHERE time >= NOW() - 10d AND time<NOW() GROUP BY TIME(1d)
name: L2
time                mean                count integral           integralCalc       integralError
----                ----                ----- --------           ------------       -------------
1542067200000000000 3.022019691011238   178   96852.55968955555  261102.50130337093 -1.6958761042587898
1542153600000000000 1.766747622916667   480   152624.3676301944  152646.99462       -0.00014825279971301814
1542240000000000000 2.2783763           480   196850.57924300004 196851.71232000002 -0.000005756025734514354
1542326400000000000 1.9705699333333326  480   170071.0132481389  170257.24223999993 -0.0010950072461162667
1542412800000000000 0.7606399062499993  480   65997.40708773605  65719.28789999994  0.00421409264406986
1542499200000000000 0.5598519185803762  479   48357.86496567498  48371.2057653445   -0.000275876523477486
1542585600000000000 1.0156483624999997  480   87860.30244070003  87752.01851999998  0.0012324555879276108
1542672000000000000 1.1093418062499987  480   95763.26122774168  95847.13205999989  -0.0008758142860104804
1542758400000000000 0.8864024520833331  480   76589.37853240839  76585.17185999999  0.0000549250103474821
1542844800000000000 0.49902364791666626 480   43120.054512824994 43115.64317999996  0.00010230350761081205
1542931200000000000 0.8652338443708607  302   47021.54529502501  74756.20415364236  -0.5898287409442442
>

Two: When performing a query without a GROUP BY TIME clause, why does INTEGRAL return time of zero when all other functions return the time 10 days ago?

> SELECT COUNT(P49), MEAN(P49), MEDIAN(P49), MODE(P49), SPREAD(P49), STDDEV(P49), SUM(P49), FIRST(P49), LAST(P49), MAX(P49), MIN(P49),INTEGRAL(P49) FROM L2 WHERE time >= NOW() - 10d AND time<NOW()
name: L2
time                count mean              median   mode    spread   stddev             sum               first   last     max     min      integral
----                ----- ----              ------   ----    ------   ------             ---               -----   ----     ---     ---      --------
0                                                                                                                                            1082157.106009501
1542119858119162885 4798  1.252782246769487 0.853222 0.39226 8.688586 1.1931054204209883 6010.849219999999 0.83357 0.284867 8.89962 0.211034
>

Thank you!

0 Likes

#2

For the first question, it has to do with the alignment of the timestamps relative to the Unix epoch (where time is 0). The raw timestamps within the WHERE range are truncated to the previous 1d boundary relative to the Unix epoch in order to determine which GROUP BY bucket they belong to. Perhaps you are expecting the truncation to occur relative to the beginning of the WHERE range.

0 Likes

#3

@mhfrantz, Thank you. I agree grouping by days when GROUP BY TIME units are explicitly set as days makes the most sense. Originally I was working in seconds and my query was SELECT MEAN(P49), COUNT(P49), INTEGRAL(P49) FROM L2 WHERE time >= NOW() - 864000s AND time<NOW() GROUP BY TIME(86400s) and yes I was expecting the truncation to occur relative to the beginning of the WHERE range (or at least the nearest second but not the nearest day). I think I need to read up on the advanced group by time syntax. Think it will help.

0 Likes

#4

Regarding the second question: the time is always set to 0 (Unix Epoch) when a single value is returned from an aggregator function.

To properly handle time zones (which seems to be the underlying goal here), I think it’s best to log your data in UTC, organize your integrated values by half hours using a CQ and then use a SUM aggregator on the range required and do the time zone alignment outside of Influx.

You might be interested in this as well: https://github.com/influxdata/influxdb/issues/13164

0 Likes