I’m using the Python API to query a database using the query_dataframe() function. The end result is something like 1M rows of data, but I get this is smaller chunks, splitting my queries by time and fields (5-10 fields are queried at once). For a very small subset of these queries I get null values that aren’t present in the database (when checking through the UI). Every time I run the query, the same values turn out as null.
It’s not a massive issue as the percentage is very small, but would be good to know what is happening. My query is below:
from(bucket: {bucket})
|> range(start: {start}, stop: {stop})
|> filter(fn: (r) => r._measurement == {measurement})
|> filter(fn: (r) => r._field == "CYCLE" or {sensor})
|> filter(fn: (r) => r.SHORTCAN == "0")
|> pivot(rowKey:["_time"], columnKey:["_field"], valueColumn:"_value")
|> drop(columns:["_start", "_stop"])
The query is set up within a loop that adds all values within curly brackets from a data frame. Each query result is concatenated together to a big data frame, however, the null values are present before concatenating is done.
I tried an alternative approach using just the query() function with single fields and iterating over tables and records to make a list of all values per field. However, this sometimes returns fewer values for some fields even when all other parts of the query are constant (e.g. start/stop times). It seems sometimes influx just does not return all values present in the db. This query is below:
from(bucket: {bucket})
|> range(start: {start}, stop: {stop})
|> filter(fn: (r) => r._measurement == {measurement})
|> filter(fn: (r) => r._field == "{sensor}")
|> filter(fn: (r) => r.SHORTCAN == "0")
|> drop(columns:["_start", "_stop"])
Any clarity or help on this would be great!