How to remove duplicate data appearing as overlapping time

virtualdj · July 8, 2023, 3:58pm

When doing a query like this:

from(bucket: "mydata")
  |> range(start: 2023-06-15T18:00:00.000Z, stop: 2023-07-01T21:00:00.000Z)
  |> filter(fn: (r) => r["entity_id"] == "external_temperature")
  |> filter(fn: (r) => r["_field"] == "value")

I get a chart with those unwanted lines:

My bucket had a hole in the data that I augmented later from another source.
But, if I “sort” the data by leaving out the _start and _stop columns and also sort by _time like so:

from(bucket: "mydata")
  |> range(start: 2023-06-15T18:00:00.000Z, stop: 2023-07-01T21:00:00.000Z)
  |> filter(fn: (r) => r["entity_id"] == "external_temperature")
  |> filter(fn: (r) => r["_field"] == "value")
  |> drop(columns: ["_start", "_stop"]) // <-- THIS FIXES THE ISSUE
  |> sort(columns: ["_time"]) // <-- THIS FIXES THE ISSUE

then the wrong data disappears:

Note that just dropping _start and _stop is not enough: I must also sort by _time.

So my question is: how can I delete/remove the wrong data and avoid dropping the _start and _stop columns in the queries? I’m unable to identify the wrong data, apart from seeing in the raw results that near the end the records with _time restart from the beginning:

This is a sample of a single value, but I have other values with the same issue (and a different overlapping time) so I’m looking a way to “consolidate” the data just like the sorting does.

EDIT: Curious to note that on the results the first column table changes from 0 to 1 and I would like to basically trash those values with 1… but how can I “select” them?

Anaisdg · July 13, 2023, 5:41pm

I think the easiest way is then to filter out those data points and add a _start and _stop column manually with the map function.

virtualdj · July 13, 2023, 6:07pm

At the moment I’ve dropped _start and _stop at all and queried to another bucket… but I’m still interested. To which values should I map _start and _stop? Both to _time?

Topic		Replies	Views
Specific filtering of data: removing chronological duplicates influxdb , flux	2	2822	November 17, 2021
InfluxDB Duplicates Problem InfluxDB 2 influxdb , query	3	604	April 23, 2022
Group data by time and get most repeated value influxdb	0	245	September 18, 2023
How to filter out unchanged values (with difference() == 0) in Flux? InfluxDB 2 influxdb , query , flux	12	3348	February 1, 2023
Duplicate timestamp after aggregateWindow InfluxDB 2 influxdb , flux	3	1426	August 6, 2021

How to remove duplicate data appearing as overlapping time

Related topics