Trying to get an average over n days and I must be doing it wrong

Hello community,

I am fairly new to InfluxDB and have set it up with Grafana & Chronograf as clients. Data is coming in nicely (through a Smartthings app mostly) and I am able to create all sorts of useful graphs in Grafana.

I am attempting to do something a bit more complex and am stumped a bit.

The gist of it is that IoT devices are reporting Wattage usage regularly. If I look at the data, there are times where the device is on, reporting between 20W & 120W, roughly and times where it is off and it reports 0. However for some reasons, there are some times where no value is reported at a certain timestamp.

For the following query:
SELECT mean("value") AS "mean_value" FROM "db0"."autogen"."power" WHERE time > now() - 3d AND "deviceName"='3D Printer' GROUP BY time(:interval:) FILL(null)

It looks like this as a graph, which is expected:

And as a table (can’t include more than one image as a new user) I see data, a lot of 0 (expected, it wasn’t on for more than 18 hours over 72 hours) and a few empty fields for a given timestamp.

But a FILL(0) solves that problem

Now, what I am trying to figure out is how to compute an average (mean) for that 3 days period. If you look at the graph, you can estimate that out of those three days, it was on for about 18 hours and let’s say that during those 18 hours, it was most of the time around 100 W (it was less but indulge me).

If it’s for 18 hours at 100W and for 54 hours at 0W, the average should be around 25W or so.

But if I run this query:

SELECT mean("value") AS "mean_value" FROM "db0"."autogen"."power" WHERE time > now() - 3d AND "deviceName"='3D Printer'

The result is 94W or so and in table mode, I see a timestamp from 3 days ago (where power was 0 BTW)

Can anyone help me craft the right query? I am obviously not querying what I think I am querying.

Thanks in advance,

ferik

Moved to another category, was getting no response, I might have put it in the wrong place?

I’m guessing that when you’re calculating the mean for those 3 days, it’s not including the hours where no value came in. It’s only calculating based on the 18 hours at 100W or so.

Have you tried:

SELECT mean("value") AS "mean_value" FROM "db0"."autogen"."power" WHERE time > now() - 3d AND "deviceName"='3D Printer' GROUP BY time(:interval:) FILL(0)

Would you be able to share your data for that three day period?

This query will give you the mean of the field value for the past three days for the measurement db0.autogen.power and the tag deviceName with the value 3D Printer.

If your data if collected at regular intervals (uniformly distributed in time), then calculating the mean of the values will also give you the average power usage over time.

If your data is collected at irregular intervals, then calculating the mean of the values will not give you the expected results. As you mentioned that you had dropped values and were using fill(0), I suspect this could be a possible explanation for the results you’re seeing.

Let’s look at an example. I’m using the Influx CLI. I’ve inserted a number of values into a measurement called m1:

> select * from m1 where time > '2018-08-14T17:22:14Z' and time < '2018-08-14T17:23:20Z'
name: m1
time                         value
----                         -----
2018-08-14T17:22:14.159637Z  10
2018-08-14T17:22:16.3561521Z 20
2018-08-14T17:22:18.2251241Z 30
2018-08-14T17:22:20.18086Z   40
2018-08-14T17:23:19.8976057Z 50

Since I inserted the data manually, the values are not evenly spaced. The first four occur during the same minute, while the fifth occurs nearly a minute later. I can calculate the mean of these values:

> select mean(*) From m1
name: m1
time                 mean_value
----                 ----------
1970-01-01T00:00:00Z 30

And we get 30, as expected. The timestamp value is the beginning of the time window for which you’re calculating the mean. But, as you pointed out before, the duration of the measurement matters to the final result. This is what those points look like on a graph:

The majority of the time is spent between the value of 40 and the value of 50; we’re interpolating linearly between those points to draw a graph. The mean over time, therefore, is probably closer to 40 than it is to 30.

We can’t take the mean of the values for a series with irregularly spaced values, but instead what we can do is break up our time range into discreet units using the GROUP BY clause. This is what our data points look like grouped into ten second intervals:

> select mean(*) from m1 where time > '2018-08-14T17:22:14Z' and time < '2018-08-14T17:23:20Z' group by time(10s)
name: m1
time                 mean_value
----                 ----------
2018-08-14T17:22:10Z 20
2018-08-14T17:22:20Z 40
2018-08-14T17:22:30Z 
2018-08-14T17:22:40Z 
2018-08-14T17:22:50Z 
2018-08-14T17:23:00Z 
2018-08-14T17:23:10Z 50

Because we’re potentially grouping values together, we need to provide a function for the database to apply when there are multiple values within the same window—in this case, we’re using mean(). So for the 10-second period beginning 2018-08-14T17:22:10Z, we have three values, 10, 20, and 30, and the mean of those is 20. For the next window, we have a single value, and then we have a number of windows with no value before we see the final value of 50.

We can fill in those empty windows using fill() option of GROUP BY. This is what our data looks like using fill(previous):

select mean(*) from m1 where time > '2018-08-14T17:22:14Z' and time < '2018-08-14T17:23:20Z' group by time(10s) fill(previous)
name: m1
time                mean_value
----                ----------
1534267330000000000 20
1534267340000000000 40
1534267350000000000 40
1534267360000000000 40
1534267370000000000 40
1534267380000000000 40
1534267390000000000 50

We can calculate the mean of this data using subqueries:

select mean(*) from (select mean(*) from m1 where time > '2018-08-14T17:22:14Z' and time < '2018-08-14T17:23:20Z' group by time(10s) fill(previous))
name: m1
time mean_mean_value
---- ---------------
0    38.57142857142857

There’s another caveat: the window that we choose will have an impact on our final results. What if we use a 1s window instead of a 10s one?

> select mean(*) from (select mean(*) from m1 where time > '2018-08-14T17:22:14Z' and time < '2018-08-14T17:23:20Z' group by time(1s) fill(previous))
name: m1
time mean_mean_value
---- ---------------
0    38.333333333333336

Assuming that this is related to the issue you’re having, a query like this one might provide more accurate results:

SELECT mean("mean_value") FROM 
  (SELECT mean("value") AS "mean_value" FROM "db0"."autogen"."power" WHERE time > now() - 3d AND "deviceName"='3D Printer' GROUP BY time(10s) FILL(previous))

You will have to decide which time window and fill mode are appropriate for you.

2 Likes

Thank you for the thorough answer. That query at the end seems to do the work. I’ll ready through your answer so I can really understand what lead you to that answer. Thank you!

2 Likes