Sina
March 21, 2023, 5:42pm
1
I’m trying to use the statsmodels.linearRegression() function.
Whenever there is a gap in my data, aka the otherwise regular time increments of five minutes in between values is bigger, the function appears to either treat the values in between the gaps as separate data sets or it just makes up values where there are none.
I’ve tried rebuilding the function via the source code I found here flux/linearreg.flux at master · influxdata/flux · GitHub but the problem persists.
I’ve tried adding in an if loop, but maybe I’m using it in the wrong spot.
Sorry in advance for the flood of screenshots coming up…
Can anybody help with this?
And sorry again for the flood of screenshots, I didn’t know how else to illustrate my problem
Hello @Sina ,
Yes unfortunately the function was designed under the assumption that you have regular time series. I’d consider maybe using the following function:
Can you please include the script you created?
Sina
March 22, 2023, 8:02am
3
This is the script I wrote, very closely based on (I think it’s yours right?) function
import "influxdata/influxdb/monitor"
import "influxdata/influxdb/v1"
data =
from(bucket: "test_sis")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "20_3_Tiergarten_Teststand_3S")
|> filter(fn: (r) => r["_field"] == "PRdc_opt9328" )
|> drop(columns: ["_measurement", "host", "_start", "_stop"])
Regress = (tables =<-) => {
renameAndSum =
tables
//|> rename(columns: {_value: "z"})
|> map(fn: (r) => ({r with x: 1.0}))
|> cumulativeSum(columns: ["x"])
t =
renameAndSum
|> reduce (
fn: (r, accumulator) =>
({
sx: r.x + accumulator.sx,
suy: r._value + accumulator.suy,
N: accumulator.N + 1.0,
sxy: r.x * r._value + accumulator.sxy,
sxx: r.x * r.x + accumulator.sxx,
}),
identity: {
sxy: 0.0,
sx: 0.0,
suy: 0.0,
sxx: 0.0,
N: 0.0,
},
)
|> tableFind(fn: (key) => true)
|> getRecord(idx: 0)
divident = (t.N * t.sxx - (t.sx * t.sx))
slopesubt = (t.sx * t.suy)
intersubt = (t.sx * t.sxy)
slopetop = (t.N * t.sxy - slopesubt)
intertop = (t.suy * t.sxx - intersubt)
slope = (slopetop / divident)
intercept = (intertop / divident)
y_hat = (r) =>
({r with
y_hat: if exists r._value then slope * r.x + intercept else 0.0,
slope: slope,
intercept: t.intercept,
sx: t.sx,
sxy: t.sxy,
sxx: t.sxx,
N: t.N,
suy: t.suy,
})
output =
renameAndSum
|> map(fn: y_hat)
return output
}
data
|> Regress()
|> map(fn: (r) => ({r with _value: r.y_hat}))
//|> map(fn: y_hat)
|> yield()
|> set(key: "_measurement", value: "20_3_Tiergarten_Teststand_3S")
|> set(key: "_field", value: "PRdc_linreg_opt9328")
|> to(bucket: "test_sis", org: "pvlab_vanguard_service")
Sina
March 22, 2023, 8:14am
4
This is the script I wrote, very closely based on the function
import "influxdata/influxdb/monitor"
import "influxdata/influxdb/v1"
data =
from(bucket: "test_sis")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "20_3_Tiergarten_Teststand_3S")
|> filter(fn: (r) => r["_field"] == "PRdc_opt9328" )
|> drop(columns: ["_measurement", "host", "_start", "_stop"])
Regress = (tables =<-) => {
renameAndSum =
tables
//|> rename(columns: {_value: "z"})
|> map(fn: (r) => ({r with x: 1.0}))
|> cumulativeSum(columns: ["x"])
t =
renameAndSum
|> reduce (
fn: (r, accumulator) =>
({
sx: r.x + accumulator.sx,
suy: r._value + accumulator.suy,
N: accumulator.N + 1.0,
sxy: r.x * r._value + accumulator.sxy,
sxx: r.x * r.x + accumulator.sxx,
}),
identity: {
sxy: 0.0,
sx: 0.0,
suy: 0.0,
sxx: 0.0,
N: 0.0,
},
)
|> tableFind(fn: (key) => true)
|> getRecord(idx: 0)
divident = (t.N * t.sxx - (t.sx * t.sx))
slopesubt = (t.sx * t.suy)
intersubt = (t.sx * t.sxy)
slopetop = (t.N * t.sxy - slopesubt)
intertop = (t.suy * t.sxx - intersubt)
slope = (slopetop / divident)
intercept = (intertop / divident)
y_hat = (r) =>
({r with
y_hat: if exists r._value then slope * r.x + intercept else 0.0,
slope: slope,
intercept: t.intercept,
sx: t.sx,
sxy: t.sxy,
sxx: t.sxx,
N: t.N,
suy: t.suy,
})
output =
renameAndSum
|> map(fn: y_hat)
return output
}
data
|> Regress()
|> map(fn: (r) => ({r with _value: r.y_hat}))
//|> map(fn: y_hat)
|> yield()
|> set(key: "_measurement", value: "20_3_Tiergarten_Teststand_3S")
|> set(key: "_field", value: "PRdc_linreg_opt9328")
|> to(bucket: "test_sis", org: "pvlab_vanguard_service")