Spread() function - How to deal with anomaly inside a timeseries

Hello all,

I’m recording values from an energy meter and I want to calculate the consumption with the spread() function. I’m using the following syntax

from(bucket: "EnergyMetering")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "HeatMeter")
  |> filter(fn: (r) => r["User"] == "001_1")
  |> filter(fn: (r) => r["_field"] == "Energy_kWh")
  |> spread()

I’m working with that over months but now I recognized that randomly inside the timeseries there is a wrong value inside. The energy meter should accumulate always the values… in case there was no usage of energy, the previous value should be used for the new entry. But as the picture below illustrate there is one negative peak to “0”.

Using the spread() function will create a huge failure.
Does anyone know a method how the flux script can look like to exclude such irregular values? First thought I had was to recognize that cases with difference() where a negative value can be recognized. But I’m struggling how to proceed with that information in a usefull way.

I have this exact issue with some of our energy meters. They periodically give some wrong value that I’d like to filter out. Since the counter sometimes resets, it can’t be as simple as just filtering out values that are less than the previous one. The best solution I can think of would be a flux function that allows us to use a moving median on some number of points (3 would be enough if there is only ever 1 spurious point in a row.) I put in an enhancement request here.

In my opinion, the best solution for erroneous data is to remove the erroneous data from the database. Either manually, or if there is a known pattern of the erroneous data, automatically. The best place to remove erroneous data automatically is to avoid errors during import.

In the long run, it is not worth it to deal with erroneous data everywhere. Also from the point of view that maybe in the future new or replaced data sources will produce other errors.

@michael2: I followed your proposal and removed the wrong data.

Does anyone know how to make an “If” statement in front of the spread() function? The idea would be to make a if statement with “difference()” and if the result is >0 then the spread() function should follow. With such a conditional check, I would be aware if there is an erroneous data inside.

I am also suffering from the same issue with our energy meters. Instead of the spread operator, I am fetching the first and last data points and calculating the difference.

@michael2 suggestion is good, but in my case, in a set of 500 MFMs picking and removing error data is not a practical solution.

looking forward to a better solution for this problem.