Advice: how to carry forward data from the previous day

I’m working on a project for visualizing data that doesn’t change at regular intervals (eg, bug state). Consider this simplified line protocol (actual data has more metadata as tags as well as additional fields):

bugs,status=<open|closed> bugid=N,software=<software> <timestamp>

Visualizations might include total open bugs per day, open bugs by software per day, etc. With this, if I send in data at least daily, the visualization looks good (bug status changes throughout the day are not particularly interesting so I figured I’d perhaps use a downsampling task or just look at the last item per day; this is not relevant to this topic).

The problem is, if I don’t send in data one day, I have graphs like this:

 |          -^-
 |--^--  --/   \
0|     \/       \    (these dips represent missing data)

AIUI, I can fill() null values but not missing data. What I’d like is for the above graph to look like (where f shows where missing data was filled in with whatever came before):

 |          -^-ff
 |--^--ff--/
0|

TIL about the interpolate.linear() function, but it doesn’t work on string data.

A straightforward way to solve this would be to ensure that I send in data at least daily. For reasons not relevant to this discussion, I’m trying to avoid that.

Thoughts I had (in order of preference):

  1. tell the visualization to fill in missing data (perhaps I missed something obvious?)
  2. use flux in some manner to fill in the data as part of the query, so the visualization looks correct
  3. use flux in a task in some manner to duplicate yesterday’s points into today (similar to downsampling in some ways)
  4. something else?

Since I’m new to flux, I’m not sure which approach is best (or how to do them! :slight_smile: ).

Thanks in advance for any advice and tips!

The UI Query Builder has a ‘Fill missing values’ that says ‘for windows without data, create an empty window and fill it with a null aggregate value’. It wasn’t clear how to do this in my flux, but I recalled createEmpty: true with both window() and aggregateWindow(), but that didn’t seem to work. AIUI, since those set to ‘null aggregate value’ then I should be able to use fill() strategically after…

1 Like

Using aggregateWindow with createEmpty: true is the strategy I would recommend. That way you create a window for each day and if there is no data for that day a null value will be inserted and you can use fill() afterwards to make it the data you want.

You mention you tried this and it didn’t work. Can you share what you tried and what issue you ran into?

1 Like

With flux like this (explicit createEmpty: false):

data
  |> aggregateWindow(every: 1d, createEmpty: false, fn: count)

I end up with data like this (raw csv, fine):

table _value  _start                    _stop                       _time                       status
0	  1	      2021-09-28T15:44:06.783Z	2021-09-29T15:44:06.783Z	2021-09-29T00:00:00.000Z	open
1	  7	      2021-09-28T15:44:06.783Z	2021-09-29T15:44:06.783Z	2021-09-29T00:00:00.000Z	closed

I change that to:

  |> aggregateWindow(every: 1d, createEmpty: true, fn: count)
  |> fill(column: "_value", usePrevious: true)

and end up with (raw csv):

table _value  _start                    _stop                       _time                       status
0	  1	      2021-09-28T15:44:06.783Z	2021-09-29T15:44:06.783Z	2021-09-29T00:00:00.000Z	open
0	  0	      2021-09-28T15:44:06.783Z	2021-09-29T15:44:06.783Z	2021-09-29T15:44:06.783Z	open
1	  7	      2021-09-28T15:44:06.783Z	2021-09-29T15:44:06.783Z	2021-09-29T00:00:00.000Z	closed
1	  0	      2021-09-28T15:44:06.783Z	2021-09-29T15:44:06.783Z	2021-09-29T15:44:06.783Z	closed

I was expecting the two added rows to have their _values as non-zero (ie, copied forward). I feel like I am close but missing something…

You are right this is very close, a subtle behavior is mixing this up.

What is happening is that count can count the rows even in empty tables creating 0 values.

So the window operation adds the empty windows for each day and then the count operation adds 0 to the _value column for those empty tables.

When fill then runs its doesn’t see any null values since they are 0 and therefore doesn’t fill any previous values.

Here is a bit of a workaround for this behavior. Normally you could just use map to fill the 0s with desired value. But in this case we want the previous value and map doesn’t have access to that value.

We can use map instead to convert the 0s to nulls and then use fill like normal.

  |> aggregateWindow(every: 1d, createEmpty: true, fn: count)
  |> map(fn: (r) => ({r with _value: if r._value == 0 then r.__null else r._value
  |> fill(column: "_value", usePrevious: true)

Note using r.__null will produce a null value since __null doesn’t exist on r.

It worked! Your explanation makes sense too. Thanks!

AIUI, the map() will work on all data coming in, not just the things that were filled with 0s by count() so isn’t there the problem where if two days ago I had ‘1’ in _value (eg, 1 open bug), then today I wrote in ‘0’ (no open bugs), this map() function will undo today’s ‘0’ and put in the previous day (ie, writing a ‘1’), no?

If that is true, is there a function I can provide to aggregateWindow() other than count() that doesn’t have this side effect? Maybe I can give aggregateWindow() a (non-written) function that does the fill() before the count()?

@nathaniel - I’ve tried a bunch of stuff trying to fill() before the count() but unable to get this to work where we don’t lose the meaning of ‘0’ (the map() workaround). Among the things I tried, I looked at the source for aggregateWindow() and tried to fill() before count and also tried to implement a pure-flux myCount() function to use that wouldn’t set null to 0. I also tried to use various techniques with window() but not seeing the empty tables/rows in Data Explorer raw csv (maybe I should be looking at something else?).

This is somewhat better than aggregateWindow(every: 1d, createEmpty: true, fn: count):

  |> window(every: 1d)
  |> count()
  |> duplicate(column: "_stop", as: "_time")
  |> window(every: inf)

since this will smooth out missing pieces in between points, but the visualization stops at the last point. Ie, what we had before (original):

 |          -^-   |
 |--^--  --/   \  |
0|     \/       \ |   (these dips represent missing data)

is visualized as:

 |          -^-   |
 |--^--ff--/      |
0|                |

(ie, no trail off at the end, but also no copied forward data). I suppose that I could combine this with a flux task to write points from whatever we wrote yesterday, but I feel like what I’m trying to achieve should be possible…

Ok, I figured out something that works and gives the desired result:

// something to be given to aggregateWindow() that doesn't
// have side-effects
noop = (tables=<-, column="_value") =>
  tables

data
  // get all the counts, but don't create any empty data since count() has the
  // side-effect of turning nulls to 0s
  |> aggregateWindow(every: 1d, createEmpty: false, fn: count)
  // create the empty data with nulls intact
  |> aggregateWindow(every: 1d, fn: noop)
  // convert nulls in "_value" to the previous row
  |> fill(usePrevious: true)

UPDATE: even more succinct (no explicit noop function):

data
  // get all the counts, but don't create any empty data since count() has the 
  // side-effect of turning nulls to 0s
  |> aggregateWindow(every: 1d, createEmpty: false, fn: count)
  // use aggregateWindow with noop function to create the empty data with nulls intact
  |> aggregateWindow(every: 1d, fn: (tables=<-, column="_value") => tables)
  // convert nulls in "_value" to the previous row
  |> fill(usePrevious: true)