Merge values with different stored types

lucamot · February 15, 2022, 12:04pm

So recently I changed format of my data, I was collecting a state as double (0 and 1) and changed it to track the same measurement as boolean (true/false). The measurement is the same, except the new one (the boolean typed) has a few additional tags but identical common tags (_measurement, _field, device etc.).

The problem is I now want to query all my data (double + bool) and merge it into a single result set, but it seems there’s a disruption between old and new values with my query.

Here’s my simple Flux query (toFloat() is there because my idea was to convert all the data to float because of the merge):

from(bucket: "log/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "state" and r["device"] == "nest")
  |> toFloat()

The problem is that this query changes its output based on time range. Let’s say I switched from double to bool 24h ago: if my range for more than 24 hours I get only the results stored as double (i.e. those that were reported before the switch), if however I reduce the time range to be inside the 24 hours I get the results stored as bool as expected (i.e. the ‘new’ data). I’m puzzled by this behavior because I expected at least two different tables as result, one with doubles and one with bools but it instead gives me back a single table with missing data depending on the selected range…

Below a couple of screenshots of extracted data: first one has 7 days span and returns a few measurements (double), second one is restricted to last 24 hours and has a lot of measurements that weren’t listed in the first query results (the stop range is the same for both queries). Any ideas on how I can merge these two datasets?

7 days range

_measurement/group/string	_field/group/string	_value/no group/boolean	_start/group/dateTime:RFC3339	_stop/group/dateTime:RFC3339	_time/no group/dateTime:RFC3339	device/group/string	group/group/string
state	value	false	2022-02-08T12:01:00.747Z	2022-02-15T12:01:00.747Z	2022-02-08T20:43:45.071Z	nest	heater
state	value	true	2022-02-08T12:01:00.747Z	2022-02-15T12:01:00.747Z	2022-02-13T06:37:47.917Z	nest	heater
state	value	false	2022-02-08T12:01:00.747Z	2022-02-15T12:01:00.747Z	2022-02-13T07:19:49.057Z	nest	heater

24 hours range

_measurement/group/string	_field/group/string	_value/no group/boolean	_start/group/dateTime:RFC3339	_stop/group/dateTime:RFC3339	_time/no group/dateTime:RFC3339	device/group/string	group/group/string	plant_key/group/string	sid/group/string
state	value	false	2022-02-14T11:43:56.031Z	2022-02-15T11:43:56.031Z	2022-02-14T21:04:45.000Z	nest	heater	securelan	nest_state
state	value	false	2022-02-14T11:43:56.031Z	2022-02-15T11:43:56.031Z	2022-02-14T21:11:22.000Z	nest	heater	securelan	nest_state
state	value	false	2022-02-14T11:43:56.031Z	2022-02-15T11:43:56.031Z	2022-02-14T21:14:10.000Z	nest	heater	securelan	nest_state
state	value	true	2022-02-14T11:43:56.031Z	2022-02-15T11:43:56.031Z	2022-02-15T05:00:00.000Z	nest	heater	securelan	nest_state
state	value	false	2022-02-14T11:43:56.031Z	2022-02-15T11:43:56.031Z	2022-02-15T06:17:05.000Z	nest	heater	securelan	nest_state

Thanks!

Luca

Anaisdg · February 18, 2022, 10:59pm

Hello @lucamot,

You can use map() to change other columns.

Also

I’m puzzled by this behavior because I expected at least two different tables as result, one with doubles and one with bools
I’d expect the same thing.

7 days span and returns a few measurements (double)
I’m confused i only see bool type here not double

second one is restricted to last 24 hours and has a lot of measurements that weren’t listed in the first query results (the stop range is the same for both queries).
Confused as well I only see one measurement, “state” here.

How are you querying for your data? Are you using the query builder in the UI? Perhaps and aggregateWindow() function is being applied?

lucamot · February 19, 2022, 11:49pm

Hi, thanks for your reply!

I tried mapping values with map() and conditional logic as you suggested but with the same results: if I run the query with a reduced time range that includes only the values store as boolean I get back a subset of data, if I extend the range to include those represented as float the result set includes only the data represented as float and not the more recent data represented as bool.
This is the query I tried, per your suggestions:

from(bucket: "log/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "state" and r["device"] == "nest")
  |> map(fn: (r) => ({r with _value: if float(v: r._value) > 0 then 1 else 0}))

This is the original query I was using:

from(bucket: "log/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "state" and r["device"] == "nest")
  |> toFloat()

If range includes both float and bool data, the result includes only the records stored as float and not those stored as boolean. Same query, smaller range, I can see the missing records…

Anaisdg · February 22, 2022, 5:57pm

Hello @lucamot,
I’m confused. It looks like all of your data is of type bool instead of float, so I’m having trouble understanding how your original query yields the results you shared. I would expect to see floats instead of bool.

Is all your data of mixed types in the _value function? I suggested conditional mapping because I thought maybe some of your bool values were in a different column.
The toFloat() function supports conversion of a bool to a float for values only in the _value column.

Can you maybe export some of your data to annotated CSV through the UI and share it with me? maybe try to limit the response to as few rows as possible that covers your problem. Then I can try to work with the data on my end.

Otherwise I suggest creating an issue here:

lucamot · February 22, 2022, 10:00pm

Hi,
thank you again for your response.

Yes, the data is in mixed format: _value is of type float until a certain date (13th of February), after that I changed the way it’s collected and _value is now stored as a boolean. The toFloat()s were in the Flux queries because I wanted to convert all the data into the same format before merging it.

I included two gzipped CSVs with a few lines exported from the UI: as you can see before the 14th of February the data was collected as 0s and 1s, after that it’s collected as true and false.

With the data of the included CSV, try these queries:

current = from(bucket: "log/autogen")
  |> range(start: 2022-02-14T04:00:00Z, stop: 2022-02-22T23:59:00Z)
  |> filter(fn: (r) => r["device"] == "nest")
  |> toFloat()

previous = from(bucket: "log/autogen")
  |> range(start: 2022-02-01T00:00:00Z, stop: 2022-02-13T23:59:00Z)
  |> filter(fn: (r) => r["device"] == "nest")

all = from(bucket: "log/autogen")
  |> range(start: 2022-02-01T00:00:00Z, stop: 2022-02-22T23:59:00Z)
  |> filter(fn: (r) => r["device"] == "nest")
  |> toFloat()

If you look at the results, all is equal to previous, I would expect all being the union of previous and current.

I also tried a few combinations of join() and union() to merge current and previous but without success, because the two sets stay separate even if the two sets have the same number and type of columns.

sample_data.gz (1.8 KB)

lucamot · February 22, 2022, 10:09pm

A quick update: after writing about join() and union() I tried again merging the two datasets with

union(tables: [previous, current])
 |> group()

and it did work this way.

I still don’t get why the all query doesn’t return the full dataset, but at this point I’m not sure if it’s a misbehaviour or something else…

Anaisdg · February 23, 2022, 12:54am

@lucamot,
Ah I think I understand.
It would be most helpful to see what all yields without the toFloat() so I can see the default grouping. But I would expect this to work for you:

all = from(bucket: "log/autogen")
  |> range(start: 2022-02-01T00:00:00Z, stop: 2022-02-22T23:59:00Z)
  |> filter(fn: (r) => r["device"] == "nest")
  |> toFloat()
  |> group()

lucamot · February 23, 2022, 9:52am

No luck even with the query you suggested: with or without the toFloat() the result is the same without data after 13th Feb in the results. The attached CSV is the result of the query you suggested.

data.tar.gz (521 Bytes)

Anaisdg · February 24, 2022, 6:12pm

Hey so I checked all your outputs with:

import "csv"

csvDataAll = "#group,false,false,false,false,false,false,false,false,false,false
#datatype,string,long,string,string,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string
#default,_result,,,,,,,,,
,result,table,_field,_measurement,_start,_stop,_time,_value,device,group
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-01T05:00:00.072312457Z,1,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-01T06:11:45.453103096Z,0,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-02T05:36:21.238081506Z,1,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-02T06:21:22.339241473Z,0,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-05T06:01:36.488892424Z,1,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-05T06:37:37.603137977Z,0,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-06T05:00:00.089555454Z,1,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-06T06:24:17.562319614Z,0,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-06T07:54:20.197190957Z,1,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-06T08:00:00.071937642Z,0,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-08T05:56:34.251154467Z,1,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-08T06:31:35.273882642Z,0,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-08T20:43:45.071382178Z,0,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-13T06:37:47.917563774Z,1,nest,heater
,,0,value,state,2022-02-01T00:00:00Z,2022-02-22T23:59:00Z,2022-02-13T07:19:49.057173718Z,0,nest,heater
"
csvDataPrevious = "#group,false,false,true,true,false,false,true,true,true,true
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string,string,string
#default,last,,,,,,,,,
,result,table,_start,_stop,_time,_value,_field,_measurement,device,group
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-01T05:41:30Z,1,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-01T06:33:20Z,0,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-02T05:52:50Z,1,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-02T06:44:40Z,0,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-05T06:26:50Z,1,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-05T07:18:40Z,0,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-06T05:46:20Z,1,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-06T06:38:10Z,0,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-06T08:21:50Z,0,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-08T06:09:00Z,1,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-08T07:00:50Z,0,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-08T20:50:10Z,0,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-13T07:05:40Z,1,value,state,nest,heater
,,0,2022-02-01T04:00:00Z,2022-02-14T03:00:00Z,2022-02-13T07:57:30Z,0,value,state,nest,heater
"

csvDataCurrent = "#group,false,false,true,true,false,false,true,true,true,true,true
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,boolean,string,string,string,string,string
#default,last,,,,,,,,,,
,result,table,_start,_stop,_time,_value,_field,_measurement,device,group,sid
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T14:09:59.28Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T14:44:51.77Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T15:19:44.26Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T15:54:36.75Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T16:29:29.24Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T17:04:21.73Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T17:39:14.22Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T18:14:06.71Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T18:48:59.2Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T19:23:51.69Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T19:58:44.18Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T20:33:36.67Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T21:08:29.16Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-14T21:43:21.65Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-15T05:16:44.02Z,true,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-15T06:26:29Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-15T12:50:06.39Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-15T13:24:58.88Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-15T17:29:06.31Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-15T21:33:13.74Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-16T05:06:36.11Z,true,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-16T06:16:21.09Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-16T10:55:21.01Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-17T11:54:58.08Z,false,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-19T07:30:34.83Z,true,value,state,nest,heater,nest_state
,,0,2022-02-14T04:00:00Z,2022-02-22T21:14:56.461Z,2022-02-19T08:05:27.32Z,false,value,state,nest,heater,nest_state
"

all = csv.from(csv: csvDataAll)
|> count() 
|> yield(name: "all")

current = csv.from(csv: csvDataCurrent)
|> count() 
|> toInt()
|> yield(name: "current")

previous = csv.from(csv: csvDataPrevious)
|> count() 
|> yield(name: "previous")


checkAll = union(tables: [current, previous])
|> group() 
|> sum()
|> yield(name: "check all")

You can see the result of checkAll has the same number as values as previous + current combined.

So you should be able to do:

from(bucket: "log/autogen")
  |> range(start: 2022-02-01T00:00:00Z, stop: 2022-02-22T23:59:00Z)
  |> group() 
  |> toInt()

To get all the values as shown above.

lucamot · February 24, 2022, 11:08pm

Thank you very much, the checkAll query worked! The group() did the trick for me: I wasn’t using it and I had some problem when current or previous had no data…

Topic		Replies	Views
Join Data quickly -- InfluxQL 1.8 vs Flux 2.0 performance InfluxDB 2 influxdb , chronograf , flux , join , union	2	1468	January 28, 2022
[Question] Merging multiple databases Store influxdb , time-series	4	7724	July 20, 2019
Combining Data from Multiple Databases in InfluxDB 2 using FLUX InfluxDB 2 influxdb , flux , join	7	2002	September 28, 2023
How to copy data between measurements Fluxlang	17	6804	April 24, 2023
Map specific values influxdb , query , flux , csv	1	32	February 13, 2025

Merge values with different stored types

7 days range

24 hours range

Related topics