There is a topic of how to do as-of type operations on irregularly spaced time series where there are no points in some windows.
I am trying to do similar as-of interpolation of irregular spaced data using the flux language, i. e. on a regular time grid, fill in the most recent value from the irregular stream of data.
When the irregular data is dense, the aggregate approach works
- window the data
- select the last data point in each window
- de-window
For example, we can find a weekly price series from a dense higher frequency series as follows
t1 = from(bucket: “quandl”)
|> range(start: -2520d, stop: 0m)
|> filter(fn: (r) => r[“_measurement”] == “trade”)
|> filter(fn: (r) => r[“sym”] == “XLA”)
|> window(every: 5d)
|> last()
|> window(every: inf)
The problem starts when there is no earlier data point for the first window, or when there is no data point in each window . For example, given daily data for price, assume we want to generate a series every 3hours using last close price for intraday time series. Most windows would be empty and not generated.
While it seems that the way logically to do this is to
- window the data, generating empty windows as well
- select the last point in each window, with null if missing
- de-window
- fill
It fails since
- de-windowing drops empty tables
- empty windows generate tables without rows and even setting dummy values that are non-null do not work
Here is the code
t1 = from(bucket: “quandl”)
|> range(start: -2520d, stop: 0m)
|> filter(fn: (r) => r[“_measurement”] == “trade”)
|> filter(fn: (r) => r[“_field”] == “close”)
|> filter(fn: (r) => r[“sym”] == “IBM”)
|> window(every: 3h, createEmpty: true)
|> window(every: inf)
|> fill(column: “_value”,usePrevious: true)
I note empty tables have good default values, maybe possible to insert these as a row? It would also be great if a developer would update the post linked above with fluxlang examples.
Also, showing how to
– implement an as-of join on irregular time series
– align one series using as-of operation on an irregular time grid of a different series
both of which are critical for a strong timeseries language, would be extremely helpful.
See example:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_asof.html