Continuous Query that rejects Outliers with count

Dave_Sprague · December 25, 2022, 8:28pm

I currently have a continuous query that runs and aggregates my raw data (coming in at about six samples per second) using a 1 minute mean aggregation. What I’d like to do though is aggregate not just a simple mean of all the measurements within the window but something like:

Calculate the 3 sigma of all the measurements
Reject outliers outside of the three sigma
Count the number of outliers removed
Recalculate the mean and stddev using only the remaining (un-rejected) measurements

Is this possible with a continuous Flux query and can you point me to resources that cover this kind of more complex aggregation?

grant1 · December 26, 2022, 2:09am

Hi @Dave_Sprague

That’s a brain teaser, and my Flux knowledge is still a work-in-progress, but borrowing heavily from this well-explained blog post, I think this we can restate your objectives 1 & 2 as “filter out any entries where the Z-score is > 3.0”, since the Z-score tells you how many standard deviations from the mean the entry is.

I am sure your objective #4 is possible, but I need to mull it over. Meanwhile, I think this works for #1, 2, and 3 (I tested it on some sample temperature data that I had access to).

sdev=from(bucket: "HyperEncabulator")
  |> range(start: -1m)
  |> filter(fn: (r) => r["_measurement"] == "TemperatureData")
  |> filter(fn: (r) => r["_field"] == "Temperature")
  |> stddev()
  |> findColumn(
	   fn: (key) => key._measurement == "TemperatureData", column: "_value"
       )

avg=from(bucket: "HyperEncabulator")
  |> range(start: -1m)
  |> filter(fn: (r) => r["_measurement"] == "TemperatureData")
  |> filter(fn: (r) => r["_field"] == "Temperature")
  |> mean()
  |> findColumn(
	   fn: (key) => key._measurement == "TemperatureData", column: "_value"
        )

from(bucket: "HyperEncabulator")
  |> range(start: -1m)
  |> filter(fn: (r) => r["_measurement"] == "TemperatureData")
  |> filter(fn: (r) => r["_field"] == "Temperature")
  |> map(fn: (r) => ({ r with StandardDev: sdev[0] }))
  |> map(fn: (r) => ({ r with Average: avg[0] }))
  |> map(fn: (r) => ({ r with ZScore: (r._value-avg[0])/sdev[0] }))
  |> filter(fn: (r) => r["ZScore"] > 3.0)
  |> count()

Dave_Sprague · December 26, 2022, 6:48pm

Hi Grant, thank you very much for you response and the blog link. This is extremely helpful. So perhaps for the fourth objective, I would need to create a new column that holds the Z-score for each measurement and then do another “pass” where I recompute avg and stddev using the Z-score column as a filter? I’ll work on it some and let you know what I how it goes.

grant1 · December 26, 2022, 11:20pm

Hi @Dave_Sprague

Maybe get rid of the count() function (unless you really want to know how many values had a ZScore > 3.0), then replace the last filter function with all of this. We are basically filtering out all the “good” (ZScore < 3.0) measurements and writing to a new table:

  |> filter(fn: (r) => r["ZScore"] < 3.0)

  |> map(
        fn: (r) => ({
            _value: r._value,
            _time: r._time,
            _measurement: "unrejected_dataset",
        }),
    )
    |> to(
  bucket:"HyperEncabulator",
  fieldFn: (r) => ({"_value": r._value})

)

  |> mean()
  |> yield(name: "Mean_of_unrejected_dataset")

which gives me this:

Note the mean above is 53.375

By contrast, the original dataset seemed to have a mean of 54.125, but you might want to double check on your own data.

Dave_Sprague · December 26, 2022, 11:41pm

Thanks, I’ll give this a try.

Dave

Topic		Replies	Views
Flux: aggregation function using result of another aggregation function InfluxDB 2 flux	3	470	December 9, 2022
Flux language join question InfluxDB 2 query , flux	2	596	January 25, 2021
Use a calculated mean and stddev in map Fluxlang influxql , query	2	338	September 11, 2023
How to turn influxQL sum() mean() into flux? InfluxDB 2 influxql , query , flux	4	581	October 14, 2022
Data aggregation using continuous queries	3	2296	October 12, 2018

Continuous Query that rejects Outliers with count

Related topics