Filter data glitches

Hi experts,

I started working with influxdb (and grafana) some weeks ago and I already fell in love :slight_smile:

Currently I collect 8 integer values (load cell data AD converted by 8 HX711s) every 25ms. Works fine including the nice dashboard possibilities in grafana. But a strange effect in the data source results in data glitches every now and then. Normal values are below 1000 without load, go up to 50000 with load on the load cells. Single values in between are simply measured wrong. The wrong values are different for the 8 cells but more or less static for each one of them (cell1: 34618-34621, cell2: 152776 or 152778, …). Meaning whenever a wrong value appears it is about the same wrong value per load cell. Strange and hard to avoid unfortunately.

Logically it is not too complex to handle these records:

Value differs more than GLITCH_MIN_DIFF from direct predecessor and successor? Delete it. (Or change it to the average of predecessor and successor). This would not even consider the fact that the wrong values are quite predictable per load cell.

Can you help me to implement such filter in influxdb (could run periodically to clean up new data) or grafana (filtering while reporting)?

Thanks and best regards

Hello @sauvant,
Thanks for your question.
It sounds like you want to use a Continuous Query, but I’m having trouble understanding your problem. Could you please share some of your data in lp so I can get a better understanding of your schema and what you’re trying to accomplish?

Hi, thanks for your help!

This is what my data looks like on the grafana dashboard:

Every color is one of the load cells. The data is simply 8 integers with a Timestamp. The diagram shows an idle situation. All measurements are near 0, everything else are false measurement results. I want to filter those.

The real data CAN reach the values of the wrong spikes but real data never has this gradient. The bad spikes are only ONE record surrounded by normal values. I need some kind of a low pass filter to filter the data. Or a query that considers let’s say 3 consecutive values and drops the middle one of it exceeds MAX_DIFF from the other two…

Does that make sense? :wink:

Hello @sauvant,
Yes. Thank you for the detail. Are you using 1.x or 2.0? Now that I understand what you’re trying to do, can you help me understand the bigger picture?/thing you’re trying to accomplish?


Thanks again for your attention :slight_smile:
I am using latest 1.x. Meanwhile I found out that “median” nearly does what I need. Currently I group time based and aggregate the values using median(). That eliminates the “bad” values as they are way off. But it decreases resolution. Is there a way to group the “latest 3” points with these latest three travelling through all the data…? Some kind of “moving median”? Is that possible with the advanced grouping possibilities?

Best regards