Get only newest entries of duplicate data points

neuger · June 18, 2021, 1:23pm

I have seen this guide on how to handle duplicate data points: https://docs.influxdata.com/influxdb/v2.0/write-data/best-practices/duplicate-points/

I would like to use the suggested case with adding a uniq tag. My main question is: Is there a clever way to request just the newest data points (highest uniq tag value). So lets assume I initially store a timeseries with the tag uniq = 1. Later I provide new datapoints which have the same timestamp as existing ones but with a uniq tag of value 2. Now I would like to send a request which delivers uniq = 2 data points if present and otherwise uniq = 1 datapoints. So for example if the inital timeseries contained 10.000 points with uniq = 1 and I later store 1000 new data point with uniq = 2 (with existing time stamps) - I would like to send a request that returns 10.000 data points of which 9.000 are old (uniq = 1) and 1000 are new (uniq = 2). I there a simple way to do that?

Remark: Before you ask why I don’t want to overwrite the datapoints if I only want to the newest ones: While getting the newest ones will be the most used request I have I would like to keep the option to request the original timeseries as well.

If possible I would prefer an answer in InfluxQL since I have no experience with Flux, yet. But if it is much easier to do with flux I may see this as a starting point to get into flux.

thanks!

Anaisdg · June 21, 2021, 6:52pm

Hello @neuger,
Welcome and thanks for your question.
When you mean return? Just simply what query will return the old and new points?
First try writing some dummy data to your instance with array.from()

import "influxdata/influxdb/schema"
import "array"
array.from(rows: [{_time: 2021-07-01T00:00:00Z, _value: 1.0, _field: "myfield", tag: "uniq1", _measurement: "test"},
                  {_time: 2021-07-02T00:00:00Z, _value: 2.0, _field: "myfield", tag: "uniq1", _measurement: "test"},
                  {_time: 2021-07-03T00:00:00Z, _value: 3.0, _field: "myfield", tag: "uniq1", _measurement: "test"},
                  {_time: 2021-07-04T00:00:00Z, _value: 4.0, _field: "myfield", tag: "uniq1", _measurement: "test"},
                  {_time: 2021-07-05T00:00:00Z, _value: 5.0, _field: "myfield", tag: "uniq1", _measurement: "test"},
                  {_time: 2021-07-06T00:00:00Z, _value: 6.0, _field: "myfield", tag: "uniq1", _measurement: "test"},
                  {_time: 2021-07-07T00:00:00Z, _value: 7.0, _field: "myfield", tag: "uniq1", _measurement: "test"},
                  {_time: 2021-07-08T00:00:00Z, _value: 8.0, _field: "myfield", tag: "uniq1", _measurement: "test"},
                  {_time: 2021-07-09T00:00:00Z, _value: 9.0, _field: "myfield", tag: "uniq1", _measurement: "test"},
                  {_time: 2021-07-10T00:00:00Z, _value: 10.0, _field: "myfield", tag: "uniq1", _measurement: "test"},
                  {_time: 2021-07-10T00:00:00Z, _value: 10.0, _field: "myfield", tag: "uniq2", _measurement: "test"}])

|> to(bucket: "question")

Now you can get the timestamp of the last duplicate point for uniq 2 and use that to query the old points:

first_point = from(bucket: "question")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "test")
  |> filter(fn: (r) => r["_field"] == "myfield")
  |> filter(fn: (r) => r["tag"] == "uniq2")
  |> first()
  |> findRecord(fn: (key) => true, idx: 0)

from(bucket: "question")
  |> range(start: v.timeRangeStart, stop: first_point._time)
  |> filter(fn: (r) => r["_measurement"] == "test")
  |> filter(fn: (r) => r["_field"] == "myfield")
  |> filter(fn: (r) => r["tag"] == "uniq1")
  |> yield(name: "old data minus old duplicates")

from(bucket: "question")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "test")
  |> filter(fn: (r) => r["_field"] == "myfield")
  |> filter(fn: (r) => r["tag"] == "uniq2")
  |> yield(name: "new duplicates")

I hope this helps and please let me know if I can assist further.

neuger · June 22, 2021, 6:30am

Hello @Anaisdg ,

thanks for your answer! In principle you understood my question correctly so your solution would give me the right timeseries for the demo case. Unfortunately in my real case there will not be a single point in time where the new value starts. In my example I may have something like:

import “influxdata/influxdb/schema”
import “array”
array.from(rows: [{_time: 2021-07-01T00:00:00Z, _value: 1.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-02T00:00:00Z, _value: 2.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-03T00:00:00Z, _value: 3.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-04T00:00:00Z, _value: 4.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-04T00:00:00Z, _value: 2.0, _field: “myfield”, tag: “uniq2”, _measurement: “test”},
{_time: 2021-07-05T00:00:00Z, _value: 5.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-06T00:00:00Z, _value: 6.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-07T00:00:00Z, _value: 7.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-07T00:00:00Z, _value: 5.0, _field: “myfield”, tag: “uniq2”, _measurement: “test”},
{_time: 2021-07-08T00:00:00Z, _value: 8.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-09T00:00:00Z, _value: 9.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-10T00:00:00Z, _value: 10.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-10T00:00:00Z, _value: 12.0, _field: “myfield”, tag: “uniq2”, _measurement: “test”}])

|> to(bucket: “question”)

The result I would expect would look like this:

[{_time: 2021-07-01T00:00:00Z, _value: 1.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-02T00:00:00Z, _value: 2.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-03T00:00:00Z, _value: 3.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-04T00:00:00Z, _value: 2.0, _field: “myfield”, tag: “uniq2”, _measurement: “test”},
{_time: 2021-07-05T00:00:00Z, _value: 5.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-06T00:00:00Z, _value: 6.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-07T00:00:00Z, _value: 5.0, _field: “myfield”, tag: “uniq2”, _measurement: “test”},
{_time: 2021-07-08T00:00:00Z, _value: 8.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-09T00:00:00Z, _value: 9.0, _field: “myfield”, tag: “uniq1”, _measurement: “test”},
{_time: 2021-07-10T00:00:00Z, _value: 12.0, _field: “myfield”, tag: “uniq2”, _measurement: “test”}]

So I guess there is no easy solution to get this result. So probably I will have to request all points and handle duplicate points in my code.

Anaisdg · June 22, 2021, 3:37pm

Hello @neuger,
I’m confused, so you want an automatic delete?
Because I believe the Flux I gave you would provide you with your expected results? Or am I misunderstanding something?
I think Flight be getting a delete function. If so, then you could extend the logic above and execute it in a task.

neuger · June 23, 2021, 6:05am

Hallo @Anaisdg,

sorry if I was not clear enough. No, it’s not about delete. I want to keep all values in the database but I want to sent a request which does not return all values stored. For every point in time I would like to get only a single result. If there are two entries in the data base for the same point in time I only want to get the entry with the “higher” tag value.
Therefore it would be better to use tags like “1” and “2” instead of “uniq1” und “uniq2”

Anaisdg · June 23, 2021, 4:05pm

Hello @neuger,
Then I believe the Flux I shared above will return the data you’re looking for (it works for me). Did it not work for you?

Topic		Replies	Views
Handling of duplicate points: How to add a uniq tag, increase the timestamp Telegraf telegraf	0	390	April 9, 2020
How does InfluxDB handle duplicate points?	9	21461	May 30, 2017
Extracting the timestamp of the newest data point by tag InfluxDB 2	3	2075	August 12, 2022
[Solved]Is time column unique please? Store influxdb	2	808	May 29, 2018
Merge duplicate data points Telegraf influxdb , time-series	3	2127	February 14, 2018

Get only newest entries of duplicate data points

Related topics