Null values pulled when using Python API

I’m using the Python API to query a database using the query_dataframe() function. The end result is something like 1M rows of data, but I get this is smaller chunks, splitting my queries by time and fields (5-10 fields are queried at once). For a very small subset of these queries I get null values that aren’t present in the database (when checking through the UI). Every time I run the query, the same values turn out as null.

It’s not a massive issue as the percentage is very small, but would be good to know what is happening. My query is below:

from(bucket: {bucket})
      |> range(start: {start}, stop: {stop})
      |> filter(fn: (r) => r._measurement == {measurement})
      |> filter(fn: (r) => r._field == "CYCLE" or {sensor})
      |> filter(fn: (r) => r.SHORTCAN == "0")
      |> pivot(rowKey:["_time"], columnKey:["_field"], valueColumn:"_value")
      |> drop(columns:["_start", "_stop"])

The query is set up within a loop that adds all values within curly brackets from a data frame. Each query result is concatenated together to a big data frame, however, the null values are present before concatenating is done.

I tried an alternative approach using just the query() function with single fields and iterating over tables and records to make a list of all values per field. However, this sometimes returns fewer values for some fields even when all other parts of the query are constant (e.g. start/stop times). It seems sometimes influx just does not return all values present in the db. This query is below:

from(bucket: {bucket})
          |> range(start: {start}, stop: {stop})
          |> filter(fn: (r) => r._measurement == {measurement})
          |> filter(fn: (r) => r._field == "{sensor}")
          |> filter(fn: (r) => r.SHORTCAN == "0")
          |> drop(columns:["_start", "_stop"])

Any clarity or help on this would be great!

Hi @tama-biro,

thanks for using our client.

Every time I run the query, the same values turn out as null.

Can you share a debug output of the client for your query?

You can enable debug by:

client = InfluxDBClient(url="http://localhost:8086", token="my-token", org="my-org", debug=True)

or you can use query_raw.

Regards

Hi, sorry I haven’t responded to this earlier. Here’s the output from the query_data_frame() and query() functions respectively:

query_data_frame()
send: b’{“extern”: {“imports”: , “body”: }, “query”: “\nfrom(bucket: \“live-metrics\”)\n |> range(start: 2021-11-18T09:22:23.00Z, stop: 2021-11-18T09:22:24.00Z)\n |> filter(fn: (r) => r._measurement == \“def-metrics\”)\n |> filter(fn: (r) => r._field == \“CYCLE\” or r._field == \“EDDY_BOTTOM_KURTOSIS\”\n or r._field == \“EDDY_TOP_KURTOSIS\”)\n |> filter(fn: (r) => r.SHORTCAN == \“0\”)\n |> pivot(rowKey:[\”_time\"], columnKey:[\"_field\"], valueColumn:\"_value\")\n |> drop(columns:[\"_start\", \"_stop\", \“CANALIGN_BOX_SERIAL_NUMBER\”, \“SHORTCAN\”, \"_measurement\"])\n ", “dialect”: {“header”: true, “delimiter”: “,”, “annotations”: [“datatype”, “group”, “default”], “commentPrefix”: “#”, “dateTimeFormat”: “RFC3339”}}’
reply: ‘HTTP/1.1 200 OK\r\n’
header: Content-Type: text/csv; charset=utf-8
header: Vary: Accept-Encoding
header: X-Influxdb-Build: OSS
header: X-Influxdb-Version: 2.1.1
header: Date: Tue, 08 Mar 2022 13:16:42 GMT
header: Transfer-Encoding: chunked

query()
send: b’{“extern”: {“imports”: , “body”: }, “query”: “\nfrom(bucket: \“live-metrics\”)\n |> range(start: 2021-11-18T09:22:23.00Z, stop: 2021-11-18T09:22:24.00Z)\n |> filter(fn: (r) => r._measurement == \“def-metrics\”)\n |> filter(fn: (r) => r._field == \“EDDY_TOP_KURTOSIS\”)\n |> filter(fn: (r) => r.SHORTCAN == \“0\”)\n |> drop(columns:[\”_start\", \"_stop\", \“CANALIGN_BOX_SERIAL_NUMBER\”, \“SHORTCAN\”, \"_measurement\"])\n", “dialect”: {“header”: true, “delimiter”: “,”, “annotations”: [“datatype”, “group”, “default”], “commentPrefix”: “#”, “dateTimeFormat”: “RFC3339”}}’
reply: ‘HTTP/1.1 200 OK\r\n’
header: Content-Type: text/csv; charset=utf-8
header: Vary: Accept-Encoding
header: X-Influxdb-Build: OSS
header: X-Influxdb-Version: 2.1.1
header: Date: Tue, 08 Mar 2022 13:18:57 GMT
header: Transfer-Encoding: chunked

For query_data_frame(), the first record of the ones returned is missing. For query(), only 3 values are returned, those being the ones that are not missing from query_data_frame().

query_raw returns the following, with the output iterated through and printed at the bottom:

send: b’{“extern”: {“imports”: , “body”: }, “query”: “\nfrom(bucket: \“live-metrics\”)\n |> range(start: 2021-11-18T09:22:23.00Z, stop: 2021-11-18T09:22:24.00Z)\n |> filter(fn: (r) => r._measurement == \“def-metrics\”)\n |> filter(fn: (r) => r._field == \“EDDY_TOP_KURTOSIS\”)\n |> filter(fn: (r) => r.SHORTCAN == \“0\”)\n |> drop(columns:[\”_start\", \"_stop\", \“CANALIGN_BOX_SERIAL_NUMBER\”, \“SHORTCAN\”, \"_measurement\"])\n", “dialect”: {“header”: true, “delimiter”: “,”, “annotations”: [“datatype”, “group”, “default”], “commentPrefix”: “#”, “dateTimeFormat”: “RFC3339”}}’
reply: ‘HTTP/1.1 200 OK\r\n’
header: Content-Type: text/csv; charset=utf-8
header: Vary: Accept-Encoding
header: X-Influxdb-Build: OSS
header: X-Influxdb-Version: 2.1.1
header: Date: Tue, 08 Mar 2022 14:13:35 GMT
header: Transfer-Encoding: chunked
b’#datatype,string,long,dateTime:RFC3339,double,string\r\n’
b’#group,false,false,false,false,true\r\n’
b’#default,_result,\r\n’
b’,result,table,_time,_value,_field\r\n’
b’,0,2021-11-18T09:22:23.795856Z,4.991582250729799,EDDY_TOP_KURTOSIS\r\n’
b’,0,2021-11-18T09:22:23.79631Z,6.584884125450223,EDDY_TOP_KURTOSIS\r\n’
b’,0,2021-11-18T09:22:23.796391Z,8.456383996641687,EDDY_TOP_KURTOSIS\r\n’
b’\r\n’