Wrong prediction in Flux query with HoltWinters()

fluxator · November 24, 2021, 10:53am

The prediction of my Flux query is wrong and I don’t know why.
I only know that the second calculated “_value” of the holtWinters() output is weird (or wrong calculated):

→ And because of this second calculated “_value” the whole prediction is wrong!

Here my Flux query:

//define variables
window = 1d

data = from(bucket: "telegraf_365d")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "cluster_csv" and r._field == "SizeUsed" and r.Label == "CSV01")
  |> group(columns: ["FileSystemLabel"])
  |> keep(columns: ["_time", "_value", "FileSystemLabel"])
  |> aggregateWindow(every: window, fn: first, createEmpty: false)

data
  |> holtWinters(n: 20, interval: window, withFit: true)
  |> yield(name: "predictive")

data
  |> yield(name: "raw_data")

My Environment InfluxDB v2.0.5 OSS, but the same issue occurs with InfluxDB v.2.1.1 OSS.

My UseCase is very simple: I only want to predict the future like a “trend” does.
I hope someone can help me.

fluxator · November 24, 2021, 10:56am

Here additional an output of the raw data and the calculated data from the holtWinters() function:

Anaisdg · November 29, 2021, 5:12pm

Hello @fluxator,
I’ve only used holtwinters with influxql. I agree that looks very wrong.
I’ve done some digging and created an issue:

github.com/influxdata/flux

Holtwinters() function not performing as expected

opened 05:09PM - 29 Nov 21 UTC

closed 08:34PM - 12 Jul 22 UTC

Anaisdg

kind/bug

I used it for double exponential smoothing. That data barely has a negative slop…e. I would expect a much flatter line. <img width="1432" alt="Screen Shot 2021-11-29 at 11 09 56 AM" src="https://user-images.githubusercontent.com/30506042/143912198-4c54ed8d-b35b-41c5-bf51-a375b6e2adac.png"> ``` from(bucket: "system") |> range(start: 2021-11-17T21:07:40.000Z, stop: 2021-11-17T21:18:40.000Z) |> filter(fn: (r) => r["_measurement"] == "cpu") |> filter(fn: (r) => r["_field"] == "usage_system") |> filter(fn: (r) => r["cpu"] == "cpu-total") |> limit(n: 20) |> yield(name: "raw") |> holtWinters(n: 10, interval: 20s, seasonality: 0) ``` Here is the raw data with the holtwinters forecast output using seasonality=0 or double exponential smoothing: [2021-11-29_10_11_influxdb_data.csv](https://github.com/influxdata/flux/files/7619454/2021-11-29_10_11_influxdb_data.csv) t. The forecasted values are: 0.15788753424689228 -0.7799915747417391 -2.0884170385340113 -3.9138518779427898 -6.460646815371196 -10.013919881448576 -14.971478972807773 -21.888361504251364 -31.53897794557487 -45.0038170160562 statsmodels confirms that there should be a forecast with much less slope: ``` import pandas as pd import numpy as np import matplotlib.pyplot as plt from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt data = [2.4750000000057724,3.5041666666582914,3.5041666666582914,1.9586597766247285,2.1250000000035167,1.5291029540443428,3.204300179169086,3.88317153452289,2.1294328457749447,2.212131311446779,2.6789434213814647,1.9212335903300701,2.1872265966730393, 1.7298874531100965,2.5666666666631417, 2.9506147113993393, 1.8538576903864667, 1.1666666666648478, 1.3835062716186677, 0.8625000000012369] index = pd.date_range(start="2021-11-17T21:07:40", end="2021-11-17T21:14:00", freq="20s") data = pd.Series(data, index) fit = Holt(data, exponential=True, initialization_method="estimated").fit() fcast = fit.forecast(10).rename("Exponential trend") fcast #notice how the forecast values are very different from what the flux holtwinters() is producing ``` 2021-11-17 21:14:20 1.615233 2021-11-17 21:14:40 1.564648 2021-11-17 21:15:00 1.515647 2021-11-17 21:15:20 1.468181 2021-11-17 21:15:40 1.422201 2021-11-17 21:16:00 1.377662 2021-11-17 21:16:20 1.334517 2021-11-17 21:16:40 1.292723 2021-11-17 21:17:00 1.252238 2021-11-17 21:17:20 1.213021 Freq: 20S, Name: Exponential trend, dtype: float64 ``` plt.figure(figsize=(12, 8)) plt.plot(data, marker="o", color="black") plt.plot(fit.fittedvalues, color="blue") (line1,) = plt.plot(fcast, marker="o", color="blue") plt.show() ``` <img width="987" alt="Screen Shot 2021-11-29 at 11 01 02 AM" src="https://user-images.githubusercontent.com/30506042/143910935-7f6c6abe-d34a-4bc9-8efa-e8b8ec10caeb.png"> Also the interval shouldn't affect the output value, at the same timestamps. i.e. the value of the forecast for n=1 at interval = 40s should equal the value at n=2 at interval of 20s. The values are changing when the interval is being changed. I'd expect the interval just to change the forecast timestamps, but shouldn't affect the values of the forecast. Unless I'm missing something. <img width="1492" alt="Screen Shot 2021-11-29 at 11 06 40 AM" src="https://user-images.githubusercontent.com/30506042/143912014-3483c8d9-77cd-443a-93fb-59e4c6a3861d.png"> <img width="1499" alt="Screen Shot 2021-11-29 at 11 06 27 AM" src="https://user-images.githubusercontent.com/30506042/143912040-58575692-631b-4987-a3d1-96538622af1e.png"> Also please see this community question: https://community.influxdata.com/t/wrong-prediction-in-flux-query-with-holtwinters/22652

Out of curiosity can you set seasonality: 0 explicitly in your holtWinters() function. Does that change anything?

fluxator · November 30, 2021, 7:20am

Hi @Anaisdg
First I am a little relieved, I thought I was too stupid for this.
Thanks a lot for creating the GitHub Issue!
I hope this will fixed soon

I tried with seasonality: 0 but unfortunately no difference.

Topic		Replies	Views
InfluxDB - predictive analytics tools e.g. ARIMA model? InfluxDB 2 flux	3	720	September 6, 2021
HoltWinters with additive trend Fluxlang	2	465	December 20, 2022
Forecasting and Predictive analytics capabilities of InfluxDB influxdb , flux	2	230	January 24, 2024
Kapacitor: HoltWinters with streaming data Kapacitor kapacitor	4	1676	October 11, 2017
Holt Winters Tick Script kapacitor	1	1906	October 11, 2017

Wrong prediction in Flux query with HoltWinters()

Related topics