Filtering rows based on timestamp?

tejasitraj · August 17, 2022, 4:31pm

Hi All! Got a bit of a tricky one today.

I use the readsb docker container to collect data about aircraft flying nearby. The container has an instance of telegraf running inside it, which sends data to influxdb with the following schema (full schema can be seen here):

_measurement: aircraft
  Tag: Icao
  Tag: Call
  ...
  Tag: _field
    Field Key: Altitude
    Field Key: Latitude
    Field Key: Longitude
    Field Key: Airspeed
    ....

I then run this query:

from(bucket: "hassincontainer")
  |> range(start: -24h, stop:now())
  |> filter(fn: (r) =>
    r._measurement == "aircraft" and
    r.Icao != "")
  |> keep(columns: ["_stop","Icao"])
  |> group(columns: ["_time"], mode: "by")

which gives me tens of thousands of rows in the result, showing the ICAO codes of aircraft seen in the past 24 hours. Note that a unique ICAO code appears hundreds of times in the results, as each aircraft sends hundreds of packets at a time with its position etc.

_stop	Icao
2022-08-17 18:23:36.878	0101DB
2022-08-17 18:23:36.878	0101DB
2022-08-17 18:23:36.878	0101DB
…	…
2022-08-17 18:23:36.878	06A07A
2022-08-17 18:23:36.878	06A07A
2022-08-17 18:23:36.878	06A07A

You get the idea.

Now to my question.

I want to filter out the rows where the same ICAO code appears several times within seconds, keeping only one row.

BUT if an ICAO code re-appears after an hour, I want to keep the row, not filter it out. (This is why the unique() command won’t work for me). This is a scenario where an aircraft with the same ICAO code is now flying over head again, possibly on another flight.

I did think about using the callsign instead of / in combination with the ICAO code. The issue with this is that the same callsign can be used by the aircraft on the next day, for example. So can’t use the unique() function there either.

Any ideas? Thanks!

tejasitraj · August 17, 2022, 4:35pm

To give a bit of context, here is a snippet of the dashboard I’m trying to create.

I have used the unique() query on these so far, so it shows me the unique ICAO codes (aircraft) seen over the selected period; I am missing the scenario where the same aircraft with the same ICAO code flies over me several times per day, for example.

Anaisdg · August 17, 2022, 9:50pm

Hello @tejasitraj,
I’m a little confused about why you want to group by time…
however to answer:

I want to filter out the rows where the same ICAO code appears several times within seconds, keeping only one row. BUT if an ICAO code re-appears after an hour, I want to keep the row, not filter it out.

I would do something like:

from(bucket: "hassincontainer")
  |> range(start: -24h, stop:now())
  |> filter(fn: (r) =>
    r._measurement == "aircraft" and
    r.Icao != "")
 |> aggregateWindow(every: 1h, fn: last, createEmpty: false)

But I recognize that you’re looking for something more complex. You probably don’t want to just aggregate the data by 1h intervals (the aggregateWindow aggregates data from windows starting at the unix epoch). Is your data written at regular intervals? Maybe you can make the start time the exact point of your first points.

Something like this:

github.com/influxdata/flux

option to window based on start date

opened 09:21PM - 13 Jul 22 UTC

closed 01:47AM - 09 Apr 23 UTC

abalone23

enhancement no-issue-activity

`window()` and `aggregateWindow()` window based on Unix epoch time `1970-01-01`.… It's necessary to use an offset, which is available in both `window()` and `aggregateWindow()` functions, in order to window based on the range start date. For example, for a 30d window based on the start date, without an offset, the first window is only 7 days long: ```js import "experimental/array" import "date" start_date = time(v: 2021-11-22T00:00:00.000Z) data_raw = [ {_time: 2021-11-22T00:10:00.000Z, id: "bar", _value: 5}, {_time: 2021-11-26T00:00:00.000Z, id: "bar", _value: 10}, {_time: 2021-12-03T00:00:00.000Z, id: "bar", _value: 30}, {_time: 2022-01-05T00:00:00.000Z, id: "bar", _value: 50} ] data = array.from(rows: data_raw) |> group(columns: ["id"]) data |> range(start: start_date, stop: date.add(d: 60d, to: time(v: start_date))) |> window(every: 30d) ``` <img width="1525" alt="Screen Shot 2022-07-13 at 2 15 10 PM" src="https://user-images.githubusercontent.com/330044/178837649-51d9805f-afb4-474b-b3b9-9d06a1f7b210.png"> To window by 30 days from the start, it's necessary to add an `offset` like: ```js import "math" ... days_since_epoch = uint(v: start_date) / uint(v: duration(v: 1d)) offset = duration(v: string(v: math.mod(x: float(v: days_since_epoch), y: float(v: days_since_epoch / uint(v: 30) * uint(v: 30)))) + "d") ... |> window(every: 30d, offset: offset) ``` <img width="1525" alt="Screen Shot 2022-07-13 at 2 17 48 PM" src="https://user-images.githubusercontent.com/330044/178838010-69aa5f09-9a52-49f8-bd6b-f6c7d9768713.png"> It would be easier if there were an option in `window()` and `aggregateWindow()` to start the windowing from the start date rather than epoch time.

Topic		Replies	Views
Filter out a single row(record) Based On TimeStamp (_time) column time	1	465	February 25, 2023
Influx - get all unique timestamps	6	731	May 16, 2022
Filter with part of tag name InfluxDB 2 influxdb , grafana	8	2213	February 24, 2020
Filter out rows based on the row directly before and after that InfluxDB 2 influxdb , query , flux	4	748	February 2, 2023
Filter measurements names based on a tag value	4	33	August 6, 2024

Filtering rows based on timestamp?

Related topics