Issue using statsmodels.linearRegression() function; nonexistent values are being made up

Sina · March 21, 2023, 5:42pm

I’m trying to use the statsmodels.linearRegression() function.
Whenever there is a gap in my data, aka the otherwise regular time increments of five minutes in between values is bigger, the function appears to either treat the values in between the gaps as separate data sets or it just makes up values where there are none.

I’ve tried rebuilding the function via the source code I found here flux/linearreg.flux at master · influxdata/flux · GitHub but the problem persists.
I’ve tried adding in an if loop, but maybe I’m using it in the wrong spot.

Sorry in advance for the flood of screenshots coming up…

Can anybody help with this?

And sorry again for the flood of screenshots, I didn’t know how else to illustrate my problem

Anaisdg · March 21, 2023, 6:06pm

Hello @Sina,
Yes unfortunately the function was designed under the assumption that you have regular time series. I’d consider maybe using the following function:

Can you please include the script you created?

Sina · March 22, 2023, 8:02am

This is the script I wrote, very closely based on (I think it’s yours right?) function

import "influxdata/influxdb/monitor"
import "influxdata/influxdb/v1"
data =
  from(bucket: "test_sis")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "20_3_Tiergarten_Teststand_3S")
  |> filter(fn: (r) => r["_field"] == "PRdc_opt9328" )
  |> drop(columns: ["_measurement", "host", "_start", "_stop"])
Regress = (tables =<-) => {
 renameAndSum =
        tables
            //|> rename(columns: {_value: "z"})
            |> map(fn: (r) => ({r with x: 1.0}))
            |> cumulativeSum(columns: ["x"]) 
t =
    renameAndSum
        |> reduce (
        fn: (r, accumulator) =>
        ({
        sx: r.x + accumulator.sx,
        suy: r._value + accumulator.suy,
        N: accumulator.N + 1.0,
        sxy: r.x * r._value + accumulator.sxy,
        sxx: r.x * r.x + accumulator.sxx,
        }), 
        identity: {
            sxy: 0.0,
            sx: 0.0,
            suy: 0.0,
            sxx: 0.0,
            N: 0.0,        
        },
        )
        |> tableFind(fn: (key) => true)
        |> getRecord(idx: 0)
divident = (t.N * t.sxx - (t.sx * t.sx))
slopesubt = (t.sx * t.suy) 
intersubt = (t.sx * t.sxy)
slopetop = (t.N * t.sxy - slopesubt)
intertop = (t.suy * t.sxx - intersubt)
slope = (slopetop / divident)
intercept = (intertop / divident)
y_hat = (r) =>
        ({r with
            y_hat: if exists r._value then slope * r.x + intercept else 0.0,
            slope: slope,
            intercept: t.intercept,
            sx: t.sx,
            sxy: t.sxy,
            sxx: t.sxx,
            N: t.N,
            suy: t.suy,
        })
        output =
            renameAndSum
                |> map(fn: y_hat)
        return output        
        }
data
    |> Regress()
|> map(fn: (r) => ({r with _value:  r.y_hat}))
//|> map(fn: y_hat)
|> yield()
    |> set(key: "_measurement", value: "20_3_Tiergarten_Teststand_3S")
    |> set(key: "_field", value: "PRdc_linreg_opt9328")
    |> to(bucket: "test_sis", org: "pvlab_vanguard_service")

Sina · March 22, 2023, 8:14am

This is the script I wrote, very closely based on the function

import "influxdata/influxdb/monitor"
import "influxdata/influxdb/v1"
data =
  from(bucket: "test_sis")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "20_3_Tiergarten_Teststand_3S")
  |> filter(fn: (r) => r["_field"] == "PRdc_opt9328" )
  |> drop(columns: ["_measurement", "host", "_start", "_stop"])
Regress = (tables =<-) => {
 renameAndSum =
        tables
            //|> rename(columns: {_value: "z"})
            |> map(fn: (r) => ({r with x: 1.0}))
            |> cumulativeSum(columns: ["x"]) 
t =
    renameAndSum
        |> reduce (
        fn: (r, accumulator) =>
        ({
        sx: r.x + accumulator.sx,
        suy: r._value + accumulator.suy,
        N: accumulator.N + 1.0,
        sxy: r.x * r._value + accumulator.sxy,
        sxx: r.x * r.x + accumulator.sxx,
        }), 
        identity: {
            sxy: 0.0,
            sx: 0.0,
            suy: 0.0,
            sxx: 0.0,
            N: 0.0,        
        },
        )
        |> tableFind(fn: (key) => true)
        |> getRecord(idx: 0)
divident = (t.N * t.sxx - (t.sx * t.sx))
slopesubt = (t.sx * t.suy) 
intersubt = (t.sx * t.sxy)
slopetop = (t.N * t.sxy - slopesubt)
intertop = (t.suy * t.sxx - intersubt)
slope = (slopetop / divident)
intercept = (intertop / divident)
y_hat = (r) =>
        ({r with
            y_hat: if exists r._value then slope * r.x + intercept else 0.0,
            slope: slope,
            intercept: t.intercept,
            sx: t.sx,
            sxy: t.sxy,
            sxx: t.sxx,
            N: t.N,
            suy: t.suy,
        })
        output =
            renameAndSum
                |> map(fn: y_hat)
        return output        
        }
data
    |> Regress()
|> map(fn: (r) => ({r with _value:  r.y_hat}))
//|> map(fn: y_hat)
|> yield()
    |> set(key: "_measurement", value: "20_3_Tiergarten_Teststand_3S")
    |> set(key: "_field", value: "PRdc_linreg_opt9328")
    |> to(bucket: "test_sis", org: "pvlab_vanguard_service")

Topic		Replies	Views
Troubles with linear regression Fluxlang flux	7	919	August 17, 2022
Interpolation In Influx 1.8 InfluxDB 1 influxdb , influxql , query , flux	6	997	April 8, 2022
Internal error when using statsmodels.linearRegression() InfluxDB 2 influxdb , flux	2	437	May 23, 2022
Influxdb2 linear interpolation InfluxDB 2	2	1116	May 20, 2020
Trend line slope calculation / linear regression Fluxlang	4	4035	March 16, 2022

Issue using statsmodels.linearRegression() function; nonexistent values are being made up

Related topics