The prediction of my Flux query is wrong and I don’t know why.
I only know that the second calculated “_value” of the holtWinters() output is weird (or wrong calculated) :
→ And because of this second calculated “_value” the whole prediction is wrong!
Here my Flux query:
//define variables
window = 1d
data = from(bucket: "telegraf_365d")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "cluster_csv" and r._field == "SizeUsed" and r.Label == "CSV01")
|> group(columns: ["FileSystemLabel"])
|> keep(columns: ["_time", "_value", "FileSystemLabel"])
|> aggregateWindow(every: window, fn: first, createEmpty: false)
data
|> holtWinters(n: 20, interval: window, withFit: true)
|> yield(name: "predictive")
data
|> yield(name: "raw_data")
My Environment InfluxDB v2.0.5 OSS, but the same issue occurs with InfluxDB v.2.1.1 OSS.
My UseCase is very simple: I only want to predict the future like a “trend” does.
I hope someone can help me.
Here additional an output of the raw data and the calculated data from the holtWinters() function:
Anaisdg
November 29, 2021, 5:12pm
3
Hello @fluxator ,
I’ve only used holtwinters with influxql. I agree that looks very wrong.
I’ve done some digging and created an issue:
opened 05:09PM - 29 Nov 21 UTC
closed 08:34PM - 12 Jul 22 UTC
kind/bug
I used it for double exponential smoothing. That data barely has a negative slop… e. I would expect a much flatter line.
<img width="1432" alt="Screen Shot 2021-11-29 at 11 09 56 AM" src="https://user-images.githubusercontent.com/30506042/143912198-4c54ed8d-b35b-41c5-bf51-a375b6e2adac.png">
```
from(bucket: "system")
|> range(start: 2021-11-17T21:07:40.000Z, stop: 2021-11-17T21:18:40.000Z)
|> filter(fn: (r) => r["_measurement"] == "cpu")
|> filter(fn: (r) => r["_field"] == "usage_system")
|> filter(fn: (r) => r["cpu"] == "cpu-total")
|> limit(n: 20)
|> yield(name: "raw")
|> holtWinters(n: 10, interval: 20s, seasonality: 0)
```
Here is the raw data with the holtwinters forecast output using seasonality=0 or double exponential smoothing:
[2021-11-29_10_11_influxdb_data.csv](https://github.com/influxdata/flux/files/7619454/2021-11-29_10_11_influxdb_data.csv)
t.
The forecasted values are:
0.15788753424689228
-0.7799915747417391
-2.0884170385340113
-3.9138518779427898
-6.460646815371196
-10.013919881448576
-14.971478972807773
-21.888361504251364
-31.53897794557487
-45.0038170160562
statsmodels confirms that there should be a forecast with much less slope:
```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt
data = [2.4750000000057724,3.5041666666582914,3.5041666666582914,1.9586597766247285,2.1250000000035167,1.5291029540443428,3.204300179169086,3.88317153452289,2.1294328457749447,2.212131311446779,2.6789434213814647,1.9212335903300701,2.1872265966730393, 1.7298874531100965,2.5666666666631417, 2.9506147113993393, 1.8538576903864667, 1.1666666666648478, 1.3835062716186677, 0.8625000000012369]
index = pd.date_range(start="2021-11-17T21:07:40", end="2021-11-17T21:14:00", freq="20s")
data = pd.Series(data, index)
fit = Holt(data, exponential=True, initialization_method="estimated").fit()
fcast = fit.forecast(10).rename("Exponential trend")
fcast
#notice how the forecast values are very different from what the flux holtwinters() is producing
```
2021-11-17 21:14:20 1.615233
2021-11-17 21:14:40 1.564648
2021-11-17 21:15:00 1.515647
2021-11-17 21:15:20 1.468181
2021-11-17 21:15:40 1.422201
2021-11-17 21:16:00 1.377662
2021-11-17 21:16:20 1.334517
2021-11-17 21:16:40 1.292723
2021-11-17 21:17:00 1.252238
2021-11-17 21:17:20 1.213021
Freq: 20S, Name: Exponential trend, dtype: float64
```
plt.figure(figsize=(12, 8))
plt.plot(data, marker="o", color="black")
plt.plot(fit.fittedvalues, color="blue")
(line1,) = plt.plot(fcast, marker="o", color="blue")
plt.show()
```
<img width="987" alt="Screen Shot 2021-11-29 at 11 01 02 AM" src="https://user-images.githubusercontent.com/30506042/143910935-7f6c6abe-d34a-4bc9-8efa-e8b8ec10caeb.png">
Also the interval shouldn't affect the output value, at the same timestamps.
i.e. the value of the forecast for n=1 at interval = 40s should equal the value at n=2 at interval of 20s. The values are changing when the interval is being changed. I'd expect the interval just to change the forecast timestamps, but shouldn't affect the values of the forecast. Unless I'm missing something.
<img width="1492" alt="Screen Shot 2021-11-29 at 11 06 40 AM" src="https://user-images.githubusercontent.com/30506042/143912014-3483c8d9-77cd-443a-93fb-59e4c6a3861d.png">
<img width="1499" alt="Screen Shot 2021-11-29 at 11 06 27 AM" src="https://user-images.githubusercontent.com/30506042/143912040-58575692-631b-4987-a3d1-96538622af1e.png">
Also please see this community question:
https://community.influxdata.com/t/wrong-prediction-in-flux-query-with-holtwinters/22652
Out of curiosity can you set seasonality: 0
explicitly in your holtWinters() function. Does that change anything?
Hi @Anaisdg
First I am a little relieved, I thought I was too stupid for this.
Thanks a lot for creating the GitHub Issue!
I hope this will fixed soon
I tried with seasonality: 0
but unfortunately no difference.
1 Like