Actually, there’s a much simpler way to do this. You can use the geo
package and geo.ST_Distance()
to calculate the geographic distance between two points. If you combine that with reduce()
to build a custom aggregate that returns a total sum of these distances per input table.
For example, let’s say you have to following geotemporal data with latitude and longitude stored as fields:
_time |
id |
_field |
_value |
2022-01-01T00:00:00Z |
ABC1 |
lat |
112.1 |
2022-01-01T01:00:00Z |
ABC1 |
lat |
96.3 |
2022-01-01T02:00:00Z |
ABC1 |
lat |
63.1 |
2022-01-01T03:00:00Z |
ABC1 |
lat |
50.6 |
_time |
id |
_field |
_value |
2022-01-01T00:00:00Z |
ABC1 |
lon |
42.2 |
2022-01-01T01:00:00Z |
ABC1 |
lon |
50.8 |
2022-01-01T02:00:00Z |
ABC1 |
lon |
62.3 |
2022-01-01T03:00:00Z |
ABC1 |
lon |
74.9 |
_time |
id |
_field |
_value |
2022-01-01T00:00:00Z |
DEF2 |
lat |
-10.8 |
2022-01-01T01:00:00Z |
DEF2 |
lat |
-16.3 |
2022-01-01T02:00:00Z |
DEF2 |
lat |
-23.2 |
2022-01-01T03:00:00Z |
DEF2 |
lat |
-30.4 |
_time |
id |
_field |
_value |
2022-01-01T00:00:00Z |
DEF2 |
lon |
-12.2 |
2022-01-01T01:00:00Z |
DEF2 |
lon |
-0.8 |
2022-01-01T02:00:00Z |
DEF2 |
lon |
12.3 |
2022-01-01T03:00:00Z |
DEF2 |
lon |
24.9 |
You can use geo.shapeData()
to reshape your data to meet the requirements of working with the geo package and also assign an S2 cell ID to each point (also required by the geo package). In this particular case, I’d group the data by id
so that it removes the grouping by s2_cell_id
returned from geo.shapeData()
.
Using the sample data above and an S2 cell level:
import "experimental/geo"
data
|> geo.shapeData(latField: "lat", lonField: "lon", level: 10)
|> group(columns: ["id"])
This would output the following:
_time |
id |
lat |
lon |
s2_cell_id |
2022-01-01T03:00:00Z |
ABC1 |
50.6 |
74.9 |
425be3 |
2022-01-01T02:00:00Z |
ABC1 |
63.1 |
62.3 |
4385e3 |
2022-01-01T01:00:00Z |
ABC1 |
96.3 |
50.8 |
5015ed |
2022-01-01T00:00:00Z |
ABC1 |
112.1 |
42.2 |
513e11 |
_time |
id |
lat |
lon |
s2_cell_id |
2022-01-01T00:00:00Z |
DEF2 |
-10.8 |
-12.2 |
045367 |
2022-01-01T01:00:00Z |
DEF2 |
-16.3 |
-0.8 |
04ca6f |
2022-01-01T02:00:00Z |
DEF2 |
-23.2 |
12.3 |
1c7939 |
2022-01-01T03:00:00Z |
DEF2 |
-30.4 |
24.9 |
1e8433 |
Ok, this is where you would want to define your own custom function that uses reduce()
to return an aggregate sum of the geographic distances of each path. I don’t know that I’ll explain on the logic in here, but I’ve tested this and know that it works:
totalDistance = (tables=<-) =>
tables
|> reduce(
identity: {
index: 0,
lat: 0.0,
lon: 0.0,
totalDistance: 0.0,
},
fn: (r, accumulator) => {
lastPoint =
if accumulator.index == 0 then
{lat: r.lat, lon: r.lon}
else
{lat: accumulator.lat, lon: accumulator.lon}
currentPoint = {lat: r.lat, lon: r.lon}
return {
index: accumulator.index + 1,
lat: r.lat,
lon: r.lon,
totalDistance:
accumulator.totalDistance + geo.ST_Distance(region: lastPoint, geometry: currentPoint),
}
},
)
|> drop(columns: ["index", "lat", "lon"])
So include that in your query:
import "array"
import "experimental/geo"
data =
from(bucket: "example-bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "example-measurement")
|> filter(fn: (r) => r._field == "lat" or r._field == "lon")
totalDistance = (tables=<-) =>
tables
|> reduce(
identity: {
index: 0,
lat: 0.0,
lon: 0.0,
totalDistance: 0.0,
},
fn: (r, accumulator) => {
lastPoint =
if accumulator.index == 0 then
{lat: r.lat, lon: r.lon}
else
{lat: accumulator.lat, lon: accumulator.lon}
currentPoint = {lat: r.lat, lon: r.lon}
return {
index: accumulator.index + 1,
lat: r.lat,
lon: r.lon,
totalDistance:
accumulator.totalDistance + geo.ST_Distance(region: lastPoint, geometry: currentPoint),
}
},
)
|> drop(columns: ["index", "lat", "lon"])
data
|> geo.shapeData(latField: "lat", lonField: "lon", level: 10)
|> group(columns: ["id"])
|> totalDistance()
With the sample data above, this query would return the following:
id |
totalDistance |
ABC1 |
7028.44474458754 |
DEF2 |
4428.129653320098 |
Note: By default, the geo package uses km for the unit of distance. To use miles instead, you can set the geo.unit
option to mile
. This would go at the top of your query, just after your import
statements:
option geo.units = {distance: "mile"}
The downside of this approach is that it doesn’t tell you the distance of each leg, just the total distance of the entire sequence of coordinates. Flux doesn’t currently provide a way to do that, but there is a proposal that would make it possible: EPIC: scan function · Issue #4671 · influxdata/flux · GitHub