Issue using statsmodels.linearRegression() function; nonexistent values are being made up

I’m trying to use the statsmodels.linearRegression() function.
Whenever there is a gap in my data, aka the otherwise regular time increments of five minutes in between values is bigger, the function appears to either treat the values in between the gaps as separate data sets or it just makes up values where there are none.

I’ve tried rebuilding the function via the source code I found here flux/linearreg.flux at master · influxdata/flux · GitHub but the problem persists.
I’ve tried adding in an if loop, but maybe I’m using it in the wrong spot.

Sorry in advance for the flood of screenshots coming up…

Can anybody help with this?

And sorry again for the flood of screenshots, I didn’t know how else to illustrate my problem

Hello @Sina,
Yes unfortunately the function was designed under the assumption that you have regular time series. I’d consider maybe using the following function:

Can you please include the script you created?

This is the script I wrote, very closely based on (I think it’s yours right?) function

import "influxdata/influxdb/monitor"
import "influxdata/influxdb/v1"
data =
  from(bucket: "test_sis")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "20_3_Tiergarten_Teststand_3S")
  |> filter(fn: (r) => r["_field"] == "PRdc_opt9328" )
  |> drop(columns: ["_measurement", "host", "_start", "_stop"])
Regress = (tables =<-) => {
 renameAndSum =
        tables
            //|> rename(columns: {_value: "z"})
            |> map(fn: (r) => ({r with x: 1.0}))
            |> cumulativeSum(columns: ["x"]) 
t =
    renameAndSum
        |> reduce (
        fn: (r, accumulator) =>
        ({
        sx: r.x + accumulator.sx,
        suy: r._value + accumulator.suy,
        N: accumulator.N + 1.0,
        sxy: r.x * r._value + accumulator.sxy,
        sxx: r.x * r.x + accumulator.sxx,
        }), 
        identity: {
            sxy: 0.0,
            sx: 0.0,
            suy: 0.0,
            sxx: 0.0,
            N: 0.0,        
        },
        )
        |> tableFind(fn: (key) => true)
        |> getRecord(idx: 0)
divident = (t.N * t.sxx - (t.sx * t.sx))
slopesubt = (t.sx * t.suy) 
intersubt = (t.sx * t.sxy)
slopetop = (t.N * t.sxy - slopesubt)
intertop = (t.suy * t.sxx - intersubt)
slope = (slopetop / divident)
intercept = (intertop / divident)
y_hat = (r) =>
        ({r with
            y_hat: if exists r._value then slope * r.x + intercept else 0.0,
            slope: slope,
            intercept: t.intercept,
            sx: t.sx,
            sxy: t.sxy,
            sxx: t.sxx,
            N: t.N,
            suy: t.suy,
        })
        output =
            renameAndSum
                |> map(fn: y_hat)
        return output        
        }
data
    |> Regress()
|> map(fn: (r) => ({r with _value:  r.y_hat}))
//|> map(fn: y_hat)
|> yield()
    |> set(key: "_measurement", value: "20_3_Tiergarten_Teststand_3S")
    |> set(key: "_field", value: "PRdc_linreg_opt9328")
    |> to(bucket: "test_sis", org: "pvlab_vanguard_service")

This is the script I wrote, very closely based on the function

import "influxdata/influxdb/monitor"
import "influxdata/influxdb/v1"
data =
  from(bucket: "test_sis")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "20_3_Tiergarten_Teststand_3S")
  |> filter(fn: (r) => r["_field"] == "PRdc_opt9328" )
  |> drop(columns: ["_measurement", "host", "_start", "_stop"])
Regress = (tables =<-) => {
 renameAndSum =
        tables
            //|> rename(columns: {_value: "z"})
            |> map(fn: (r) => ({r with x: 1.0}))
            |> cumulativeSum(columns: ["x"]) 
t =
    renameAndSum
        |> reduce (
        fn: (r, accumulator) =>
        ({
        sx: r.x + accumulator.sx,
        suy: r._value + accumulator.suy,
        N: accumulator.N + 1.0,
        sxy: r.x * r._value + accumulator.sxy,
        sxx: r.x * r.x + accumulator.sxx,
        }), 
        identity: {
            sxy: 0.0,
            sx: 0.0,
            suy: 0.0,
            sxx: 0.0,
            N: 0.0,        
        },
        )
        |> tableFind(fn: (key) => true)
        |> getRecord(idx: 0)
divident = (t.N * t.sxx - (t.sx * t.sx))
slopesubt = (t.sx * t.suy) 
intersubt = (t.sx * t.sxy)
slopetop = (t.N * t.sxy - slopesubt)
intertop = (t.suy * t.sxx - intersubt)
slope = (slopetop / divident)
intercept = (intertop / divident)
y_hat = (r) =>
        ({r with
            y_hat: if exists r._value then slope * r.x + intercept else 0.0,
            slope: slope,
            intercept: t.intercept,
            sx: t.sx,
            sxy: t.sxy,
            sxx: t.sxx,
            N: t.N,
            suy: t.suy,
        })
        output =
            renameAndSum
                |> map(fn: y_hat)
        return output        
        }
data
    |> Regress()
|> map(fn: (r) => ({r with _value:  r.y_hat}))
//|> map(fn: y_hat)
|> yield()
    |> set(key: "_measurement", value: "20_3_Tiergarten_Teststand_3S")
    |> set(key: "_field", value: "PRdc_linreg_opt9328")
    |> to(bucket: "test_sis", org: "pvlab_vanguard_service")