Merge values with different stored types

So recently I changed format of my data, I was collecting a state as double (0 and 1) and changed it to track the same measurement as boolean (true/false). The measurement is the same, except the new one (the boolean typed) has a few additional tags but identical common tags (_measurement, _field, device etc.).

The problem is I now want to query all my data (double + bool) and merge it into a single result set, but it seems there’s a disruption between old and new values with my query.

Here’s my simple Flux query (toFloat() is there because my idea was to convert all the data to float because of the merge):

from(bucket: "log/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "state" and r["device"] == "nest")
  |> toFloat()

The problem is that this query changes its output based on time range. Let’s say I switched from double to bool 24h ago: if my range for more than 24 hours I get only the results stored as double (i.e. those that were reported before the switch), if however I reduce the time range to be inside the 24 hours I get the results stored as bool as expected (i.e. the ‘new’ data). I’m puzzled by this behavior because I expected at least two different tables as result, one with doubles and one with bools but it instead gives me back a single table with missing data depending on the selected range…

Below a couple of screenshots of extracted data: first one has 7 days span and returns a few measurements (double), second one is restricted to last 24 hours and has a lot of measurements that weren’t listed in the first query results (the stop range is the same for both queries). Any ideas on how I can merge these two datasets?

7 days range

table_result _measurement/group/string _field/group/string _value/no group/boolean _start/group/dateTime:RFC3339 _stop/group/dateTime:RFC3339 _time/no group/dateTime:RFC3339 device/group/string group/group/string
0 state value false 2022-02-08T12:01:00.747Z 2022-02-15T12:01:00.747Z 2022-02-08T20:43:45.071Z nest heater
0 state value true 2022-02-08T12:01:00.747Z 2022-02-15T12:01:00.747Z 2022-02-13T06:37:47.917Z nest heater
0 state value false 2022-02-08T12:01:00.747Z 2022-02-15T12:01:00.747Z 2022-02-13T07:19:49.057Z nest heater

24 hours range

table_result _measurement/group/string _field/group/string _value/no group/boolean _start/group/dateTime:RFC3339 _stop/group/dateTime:RFC3339 _time/no group/dateTime:RFC3339 device/group/string group/group/string plant_key/group/string sid/group/string
0 state value false 2022-02-14T11:43:56.031Z 2022-02-15T11:43:56.031Z 2022-02-14T21:04:45.000Z nest heater securelan nest_state
0 state value false 2022-02-14T11:43:56.031Z 2022-02-15T11:43:56.031Z 2022-02-14T21:11:22.000Z nest heater securelan nest_state
0 state value false 2022-02-14T11:43:56.031Z 2022-02-15T11:43:56.031Z 2022-02-14T21:14:10.000Z nest heater securelan nest_state
0 state value true 2022-02-14T11:43:56.031Z 2022-02-15T11:43:56.031Z 2022-02-15T05:00:00.000Z nest heater securelan nest_state
0 state value false 2022-02-14T11:43:56.031Z 2022-02-15T11:43:56.031Z 2022-02-15T06:17:05.000Z nest heater securelan nest_state

Thanks!

Luca

Hello @lucamot,

You can use map() to change other columns.

Also

I’m puzzled by this behavior because I expected at least two different tables as result, one with doubles and one with bools
I’d expect the same thing.

7 days span and returns a few measurements (double)
I’m confused i only see bool type here not double

second one is restricted to last 24 hours and has a lot of measurements that weren’t listed in the first query results (the stop range is the same for both queries).
Confused as well I only see one measurement, “state” here.

How are you querying for your data? Are you using the query builder in the UI? Perhaps and aggregateWindow() function is being applied?

Hi, thanks for your reply!

I tried mapping values with map() and conditional logic as you suggested but with the same results: if I run the query with a reduced time range that includes only the values store as boolean I get back a subset of data, if I extend the range to include those represented as float the result set includes only the data represented as float and not the more recent data represented as bool.
This is the query I tried, per your suggestions:

from(bucket: "log/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "state" and r["device"] == "nest")
  |> map(fn: (r) => ({r with _value: if float(v: r._value) > 0 then 1 else 0}))

This is the original query I was using:

from(bucket: "log/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "state" and r["device"] == "nest")
  |> toFloat()

If range includes both float and bool data, the result includes only the records stored as float and not those stored as boolean. Same query, smaller range, I can see the missing records…

Hello @lucamot,
I’m confused. It looks like all of your data is of type bool instead of float, so I’m having trouble understanding how your original query yields the results you shared. I would expect to see floats instead of bool.

Is all your data of mixed types in the _value function? I suggested conditional mapping because I thought maybe some of your bool values were in a different column.
The toFloat() function supports conversion of a bool to a float for values only in the _value column.

Can you maybe export some of your data to annotated CSV through the UI and share it with me? maybe try to limit the response to as few rows as possible that covers your problem. Then I can try to work with the data on my end.

Otherwise I suggest creating an issue here:

Hi,
thank you again for your response.

Yes, the data is in mixed format: _value is of type float until a certain date (13th of February), after that I changed the way it’s collected and _value is now stored as a boolean. The toFloat()s were in the Flux queries because I wanted to convert all the data into the same format before merging it.

I included two gzipped CSVs with a few lines exported from the UI: as you can see before the 14th of February the data was collected as 0s and 1s, after that it’s collected as true and false.

With the data of the included CSV, try these queries:

current = from(bucket: "log/autogen")
  |> range(start: 2022-02-14T04:00:00Z, stop: 2022-02-22T23:59:00Z)
  |> filter(fn: (r) => r["device"] == "nest")
  |> toFloat()

previous = from(bucket: "log/autogen")
  |> range(start: 2022-02-01T00:00:00Z, stop: 2022-02-13T23:59:00Z)
  |> filter(fn: (r) => r["device"] == "nest")

all = from(bucket: "log/autogen")
  |> range(start: 2022-02-01T00:00:00Z, stop: 2022-02-22T23:59:00Z)
  |> filter(fn: (r) => r["device"] == "nest")
  |> toFloat()

If you look at the results, all is equal to previous, I would expect all being the union of previous and current.

I also tried a few combinations of join() and union() to merge current and previous but without success, because the two sets stay separate even if the two sets have the same number and type of columns.

sample_data.gz (1.8 KB)

A quick update: after writing about join() and union() I tried again merging the two datasets with

union(tables: [previous, current])
 |> group()

and it did work this way.

I still don’t get why the all query doesn’t return the full dataset, but at this point I’m not sure if it’s a misbehaviour or something else…

@lucamot,
Ah I think I understand.
It would be most helpful to see what all yields without the toFloat() so I can see the default grouping. But I would expect this to work for you:

all = from(bucket: "log/autogen")
  |> range(start: 2022-02-01T00:00:00Z, stop: 2022-02-22T23:59:00Z)
  |> filter(fn: (r) => r["device"] == "nest")
  |> toFloat()
  |> group() 

No luck even with the query you suggested: with or without the toFloat() the result is the same without data after 13th Feb in the results. The attached CSV is the result of the query you suggested.

data.tar.gz (521 Bytes)

Hey so I checked all your outputs with:

import "csv"

csvDataAll = "#group,false,false,false,false,false,false,false,false,false,false
#datatype,string,long,string,string,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string
#default,_result,,,,,,,,,
,result,table,_field,_measurement,_start,_stop,_time,_value,device,group
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-01T05:00:00.072312457Z,1,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-01T06:11:45.453103096Z,0,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-02T05:36:21.238081506Z,1,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-02T06:21:22.339241473Z,0,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-05T06:01:36.488892424Z,1,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-05T06:37:37.603137977Z,0,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-06T05:00:00.089555454Z,1,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-06T06:24:17.562319614Z,0,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-06T07:54:20.197190957Z,1,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-06T08:00:00.071937642Z,0,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-08T05:56:34.251154467Z,1,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-08T06:31:35.273882642Z,0,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-08T20:43:45.071382178Z,0,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-13T06:37:47.917563774Z,1,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-13T07:19:49.057173718Z,0,nest,heater
"
csvDataPrevious = "#group,false,false,true,true,false,false,true,true,true,true
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string,string,string
#default,last,,,,,,,,,
,result,table,_start,_stop,_time,_value,_field,_measurement,device,group
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-01T05:41:30Z,1,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-01T06:33:20Z,0,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-02T05:52:50Z,1,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-02T06:44:40Z,0,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-05T06:26:50Z,1,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-05T07:18:40Z,0,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-06T05:46:20Z,1,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-06T06:38:10Z,0,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-06T08:21:50Z,0,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-08T06:09:00Z,1,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-08T07:00:50Z,0,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-08T20:50:10Z,0,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-13T07:05:40Z,1,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-13T07:57:30Z,0,value,state,nest,heater
"

csvDataCurrent = "#group,false,false,true,true,false,false,true,true,true,true,true
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,boolean,string,string,string,string,string
#default,last,,,,,,,,,,
,result,table,_start,_stop,_time,_value,_field,_measurement,device,group,sid
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T14:09:59.28Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T14:44:51.77Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T15:19:44.26Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T15:54:36.75Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T16:29:29.24Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T17:04:21.73Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T17:39:14.22Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T18:14:06.71Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T18:48:59.2Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T19:23:51.69Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T19:58:44.18Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T20:33:36.67Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T21:08:29.16Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T21:43:21.65Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-15T05:16:44.02Z,true,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-15T06:26:29Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-15T12:50:06.39Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-15T13:24:58.88Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-15T17:29:06.31Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-15T21:33:13.74Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-16T05:06:36.11Z,true,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-16T06:16:21.09Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-16T10:55:21.01Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-17T11:54:58.08Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-19T07:30:34.83Z,true,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-19T08:05:27.32Z,false,value,state,nest,heater,nest_state
"

all = csv.from(csv: csvDataAll)
|> count() 
|> yield(name: "all")

current = csv.from(csv: csvDataCurrent)
|> count() 
|> toInt()
|> yield(name: "current")

previous = csv.from(csv: csvDataPrevious)
|> count() 
|> yield(name: "previous")


checkAll = union(tables: [current, previous])
|> group() 
|> sum()
|> yield(name: "check all")

You can see the result of checkAll has the same number as values as previous + current combined.

So you should be able to do:

from(bucket: "log/autogen")
  |> range(start: 2022-02-01T00:00:00Z, stop: 2022-02-22T23:59:00Z)
  |> group() 
  |> toInt() 

To get all the values as shown above.

Thank you very much, the checkAll query worked! The group() did the trick for me: I wasn’t using it and I had some problem when current or previous had no data…