Filter Based on Last Known Value of Another Point

I have a number of users (organizations), each with their own devices (buckets), each with a number of components (points) writing into InfluxDB. In principle, I’d like to ask the question “Give me all readings of type ____ within geographic bounding box _____ from within the last ______ hours that I have permission to view.” I’m trying to learn whether this is possible within Flux, or whether I’ll need to implement a Postgres cache that periodically updates.

Suppose for example I had data that looks like:

{"point": "GPS", "field": "latitude", "value": 1, "time": "2022-01-01T01:00:00"}
{"point": "GPS", "field": "longitude", "value": 2, "time": "2022-01-01T01:00:00"}
<...snip...>
{"point": "THERMOMETER", "field": "temperature_C", "value": 3, "time": "2022-01-01T02:00:00"}

Is it possible to write a Flux query that would allow me to retrieve temperature_C where the last-known latitude and longitude was within some range?

Hello @balsa,
Welcome!
Yes that should be possible.
You would use a combination of the flux geo package:

With conditional filters:

And the range function:

Here’s the Flux, the query and scripting language for InfluxDB, docs:

For example I might use a query that looks something like this:

import "experimental/geo"

from(bucket: "example-bucket")
    |> range(start: -1d)
    |> filter(fn: (r) => r._measurement == "example-measurement")
    |> geo.gridFilter(region: {lat: 40.69335938, lon: -73.30078125, radius: 20.0})
    |> filter(fn: (r) => types.isType(v: r._value, type: "string") or types.isType(v: r._value, type: "bool"))
    |> yield(name: "_results")

I hope that helps! Let me know if you need some more clarity.

I used the geo.gridFilter instead of conditional filtering, but you could also use conditional flitering to specify lat and lons in between specified ranges.

Thank you! I will get some sample data loaded so that I can confirm that this works, and will let you know the results.

I tried the geo package, and what’s not clear is how I might use this result to get other measurements with the last-known latitude and longitude being within the bounding box. Here’s what I tried:

First, I used this sample data:

gps,type=gps lat=1.0,lon=1.0 1660600000000000000
measurement,type=measurement value=1 1660600000000010000
measurement,type=measurement value=2 1660600000000020000
measurement,type=measurement value=3 1660600000000030000
gps,type=gps lat=4.0,lon=4.0 1660600000000040000
measurement,type=measurement value=5 1660600000000050000
measurement,type=measurement value=6 1660600000000060000
measurement,type=measurement value=7 1660600000000070000
gps,type=gps lat=8.0,lon=8.0 1660600000000080000
measurement,type=measurement value=9 1660600000000090000

My goal is to retrieve the measurements with value=5, 6 and 7 by searching for points with the last-known GPS coordinate near 4,4: I can’t rely on the GPS coordinates being always available, nor can I rely on the GPS reading occurring at the same time as my measurements.

I tried, e.g.:

import "experimental/geo"

from(bucket: "Test Bucket")
    |> range(start: -30d)
    |> filter(fn: (r) => r._measurement == "gps")
    |> geo.shapeData(latField: "lat", lonField: "lon", level: 10)
    |> geo.gridFilter(region: {lat: 4.0, lon: 4.0, radius: 0.5})
    |> yield(name: "_results")

But, this only gives me the GPS coordinate itself, not the measurements taken when the last-known GPS coordinate was 4,4.

Maybe the thing to do is to nest queries, find the GPS coordinate’s timestamp as well as the next one following it, then return measurements within that window? Or is there a way to ask this question directly within a single Flux query?

Hello @balsa,

For that I would do something like:

import "experimental/geo"

time = (from(bucket: "test")
    |> range(start: -30d)
    |> filter(fn: (r) => r._measurement == "gps")
    |> geo.shapeData(latField: "lat", lonField: "lon", level: 10)
    |> geo.gridFilter(region: {lat: 4.0, lon: 4.0, radius: 0.5})
    |> findRecord(idx: 0 , fn: (key) => true))._time

from(bucket: "test")
    |> range(start: time)
    |> filter(fn: (r) => r._measurement == "measurement")

Or maybe

from(bucket: "test")
    |> range(start: -30d)
    |> group()
    |> sort(columns: ["_time"], desc: false)    
    |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")    
    |> fill(column: "lat", usePrevious: true)   
    |> fill(column: "lon", usePrevious: true)    
    |> filter(fn: (r) =>  r.lat >= 3.0 and  r.lat <= 5.0 or r.lon >= 3.0 and  r.lon <= 5.0)     

Thank you, @Anaisdg! That is exactly what I needed. I appreciate the help.

One final question: do you have an idea of whether there would be a substantial scaling or performance difference between these two approaches?

Hello @balsa,
Hmm to be honest I rally don’t know which would be more performant. I don’t have much experience with the geopackage to have an intuitive feel.
I can suggest checking out the flux profiler though: