Has anyone encountered the limitation mentioned in this enhancement Sigma Stateful function should have rolling window period?
Hello @Ashish_Sikarwar,
Thanks for your question. Iām going to start answering by adding some info about the sigma function for reference:
The sigma function
sigma("value") > 3.0
Each time that the expression is evaluated it updates the running statistics and then returns the deviation. The simple expression in Example 1 evaluates to false
while the stream of data points it has received remains within 3.0
standard deviations of the running mean.
If your data is nonstationary and you have seasonality, the deviation will be affected by the trend and amplitude. What this means is that you might have outliers that wonāt be identified by the sigma if you are taking the deviation of the default window. For example, if your amplitude is really high for 23 hours, but then the system changes and it falls dramatically on hour 23. Now peaks that once were once within 3 stdev of the data before hour 23 are actually anomalous. However the mean and deviation hasnāt been adjusted yet and they wonāt be identified. If you know that your amplitude can change on an hourly basis, you would benefit greatly from being able to specify that window to apply the sigma function to.
You can calculate sigma with flux and specify the window quite easily. Take a look at this blog for an example.
Thank you @Anaisdg for your explanation.
Let me draw here a little background.
We collect data with 1 minute interval. One of the tasks where we are checking CPU usage should trigger an alert if CPU Usage deviates by 3 or 3.5.
Follow up question:
Will the following tickscript be computing deviation on datapoints falling only within the defined time window .period(2m) and ignoring values older than 2 minutes? Or will it compute deviation on all the values it has ever seen.
**|window()**
**.period(2m)**
**.every(1m)**
I am curious cause as per the enhancement (Sigma Stateful function should have rolling window period), it says:
ā¦ but currently it doesnāt support limiting the window that it tracks. Meaning it computes the moving mean and stddev for all points seen forever. For it to be more useful, older values need to be forgotten via a moving window
I tested and cannot the recreate the issue, am i missing anything.
var data = stream
|from()
.measurement(Processor)
.groupBy(āhostā)
|window()
.period(2m)
.every(1m)
|where(lambda: isPresent(āPercent_Processor_Timeā))
|eval(lambda: sigma(āPercent_Processor_Timeā))
.as(āPercent_Processor_Time_Sigmaā)
.keep()
var data1 = data
|last(āPercent_Processor_Time_Sigmaā)
.as(ālastā)
|eval()
.keep()
|stateDuration(lambda: ālastā >= 3.5)
.unit(1m)
|alert()
.details(āN/Aā)
.crit(lambda: āstate_durationā >= 10)
Basically if it is honoring the .period(2m) then there is no issue and we are all set.
We have an issue if it does not.
I found an alternative to it you can use stddev instead of sigma it honors the given time window while calculates standard deviation.
|window()
.period(5m)
.every(5m)
|where(lambda: isPresent("Percent_Processor_Time"))
|eval(lambda: "Percent_Processor_Time")
.as('Percent_Processor_Time')
|stddev('Percent_Processor_Time')
.as('StdDev_1_Percent_Processor_Time')
Thank you for all your help @Anaisdg you gave me more material to ponder about anomaly detection!