Spread() function - How to deal with anomaly inside a timeseries

Ehninchr · April 7, 2023, 9:45am

Hello all,

I’m recording values from an energy meter and I want to calculate the consumption with the spread() function. I’m using the following syntax

from(bucket: "EnergyMetering")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "HeatMeter")
  |> filter(fn: (r) => r["User"] == "001_1")
  |> filter(fn: (r) => r["_field"] == "Energy_kWh")
  |> spread()

I’m working with that over months but now I recognized that randomly inside the timeseries there is a wrong value inside. The energy meter should accumulate always the values… in case there was no usage of energy, the previous value should be used for the new entry. But as the picture below illustrate there is one negative peak to “0”.

Using the spread() function will create a huge failure.
Does anyone know a method how the flux script can look like to exclude such irregular values? First thought I had was to recognize that cases with difference() where a negative value can be recognized. But I’m struggling how to proceed with that information in a usefull way.

gazpachoking · April 10, 2023, 5:10pm

I have this exact issue with some of our energy meters. They periodically give some wrong value that I’d like to filter out. Since the counter sometimes resets, it can’t be as simple as just filtering out values that are less than the previous one. The best solution I can think of would be a flux function that allows us to use a moving median on some number of points (3 would be enough if there is only ever 1 spurious point in a row.) I put in an enhancement request here.

github.com/influxdata/influxdb

movingMedian function

opened 09:26PM - 17 Mar 23 UTC

gazpachoking

kind/feature-request area/flux

__Proposal:__ Similar to movingAverage, there should be a function for movingMedian to filter out anomalous data points. There have been requests for such a function in multiple issues, but I still see no way to accomplish this. EDIT: Or, even more flexible, a function like aggregateWindow, but that you specify a number of points rather than a time range. Such that `movingWindow(fn: median, n: 3)` would be the same as my proposed `movingMedian(n: 3)` __Current behavior:__ You must window the data by time, rather than by number of data points to get a moving median. __Desired behavior:__ A way to window data by number of points, or a movingMedian function to do the desired filtering directly. __Alternatives considered:__ Use an `aggregateWindow(every: 15m, period: 45m, fn: median)` or something of that nature. This works depending on how regular your data points are, but it also leaves several windows at the beginning and end of the window which only had less than 3 points when the median function was applied. __Use case:__ We have some sensors that occasionally emit a wrong data point which isn't in line with the rest of the values emitted by the sensor. Being able to grab the moving median of 3 points ensures we never do any calculations which use the spurious points.

michael2 · April 11, 2023, 8:47am

In my opinion, the best solution for erroneous data is to remove the erroneous data from the database. Either manually, or if there is a known pattern of the erroneous data, automatically. The best place to remove erroneous data automatically is to avoid errors during import.

In the long run, it is not worth it to deal with erroneous data everywhere. Also from the point of view that maybe in the future new or replaced data sources will produce other errors.

Ehninchr · April 12, 2023, 5:33am

@michael2: I followed your proposal and removed the wrong data.

Does anyone know how to make an “If” statement in front of the spread() function? The idea would be to make a if statement with “difference()” and if the result is >0 then the spread() function should follow. With such a conditional check, I would be aware if there is an erroneous data inside.

criticalboot · April 13, 2023, 5:39am

I am also suffering from the same issue with our energy meters. Instead of the spread operator, I am fetching the first and last data points and calculating the difference.

@michael2 suggestion is good, but in my case, in a set of 500 MFMs picking and removing error data is not a practical solution.

looking forward to a better solution for this problem.

Topic		Replies	Views
Flux Query/ Function to find the cumulative sum of the data InfluxDB 2 influxdb , query	2	860	November 16, 2022
Aggregate daily consumption from meter readings InfluxDB 2 influxdb , flux	2	1535	March 1, 2021
Monthly usage based on daily (cumulative) values Fluxlang	2	546	November 2, 2022
Count only negative values InfluxDB 2 flux	7	1138	May 27, 2023
Take difference of current value and value one week ago (Ever increasing data) Fluxlang time-series , flux	3	962	March 6, 2022

Spread() function - How to deal with anomaly inside a timeseries

Related topics