Calculate difference in a InfluxDB task and avoid doubling of data

cortlieb · April 17, 2025, 9:17am

I have a task that sounded easy in the beginning and turned out to be difficult.

I have some rain sensors. The operation principle is that a small bucket is filled with water when it rains. When the bucket is full, it tips over, pours the water out, a switch generates a pulse and everything starts from the beginning. The pulses are counted and sent each 10 minutes via a wireless connection to a gateway. Maybe it is important to say that due to the wireless connection it is not guaranteed that each measurement is actually received.

I use Telegraf to write the no of pulses to InfluxDB (V2).
Now the task is to:

build the difference between two subsequent measurements to know how much “pulses has been rained” during this two measurements
covert the bucket tip pulses to mm of rain (1 mm rain = 1 l / m²), 1 pulse is 0.2 mm rain
the pulse counter in the sensor flows over at 255 and starts again at 0

I’m using a “task” in InfluxDB to do this calculations.
The task runs each 10 minutes. The results are written to another measurement (coop_garden_calc) in the same bucket. I have to admit that I’m really not an Flux expert, but as far as I know in V2 tasks can only be defined in Flux. With the documentation, some tests and ChatGPT I came to this setup:

option task = {name: "calcRainDiff", every: 10m}

startRange = -60m

from(bucket: "hapysc-garden")
    |> range(start: startRange)
    |> filter(fn: (r) => r["_measurement"] == "coop_garden")
    |> filter(fn: (r) => r["_field"] == "Rain")
    |> group(columns: ["devEUI"])
    |> sort(columns: ["_time"])
    |> difference()
    |> map(
        fn: (r) =>
            ({
                _time: r._time,
                deviceName: r.deviceName,
                devEUI: r.devEUI,
                location: r.location,
                _value: (if r._value < 0 then r._value + 256.0 else r._value) * 0.2,
                _measurement: "coop_garden_calc",
                _field: "RainCorrDiff",
            }),
    )
    |> to(
        bucket: "hapysc-garden",
        tagColumns: [
            "deviceName",
            "devEUI",
            "location",
        ],
    )

There are 2 results of this scripts that bother me.

The graph in the InfluxDB “Data explorer” looks, äh, strange

grafik1598×1088 180 KB

I can also see in the table view, that data sets are no longer ordered chronological.
On the other hand, Grafana, that I’m using to show the data, doesn’t seem to care too much, graphs look OK. Hence, I’m not sure if that is an issue, but it doesn’t look good to me.
Main issue currently is that each calculated value appears 5 times in the measurement with the calculated values (coop_garden_calc).
I guess the reason is obvious, since I do the calculation over a range of 60 mins, each Rain value in the raw value measurement (coop_garden) is usually touched 5 times by the script.
But I was under the impression when
- time stamp
- tag set
- field value
  are the same, that data sets in the database are overwritten. Since the calculations are always based on the same raw data all 5 times, I would assume that the result should be exactly the same.

This leads to 2 questions:

What is the point where my reasoning is wrong, regarding the overwriting of the same data in the database?
What can I do the fix the problem?
One way would be obviously to do the calculation on each data set pair only ones. But I don’ t know how to achieve that.
The function tasks.lastSuccess() seems to have an issue (see this discussion).
The execution of the task in InfluxDB and the reception of new data is asynchronous.

I would be very thankful for some ideas!

Anaisdg · April 18, 2025, 7:17pm

Hello @cortlieb,
I’m confused too I believe you’re right, if all those values are the same it should overwrite. I’m also confused. Im asking for a sanity check.

cortlieb · April 19, 2025, 11:34am

Hi @Anaisdg,

thank you for your reply.

ChatGPT suggested, that though the time stamp might appear the same in the InfluxDB UI (I see a resolution of µs) they might still differ on ns level.
But since the the time stamps are taken from the raw data
_time: r._time,
they should be the exact same for each calculation result, even if I would do the calculation a million times. Am I correct?

The only thing that is really calculated and not simply copied from the raw data is the field RainCorrDiff. But it is a simple calculation from the raw data Rain. Is it thinkable that there are (maybe very small) differences in the calculation’s result? I can’t see why that should be the case.

Anaisdg · April 21, 2025, 7:30pm

Hello @cortlieb,
The field value shouldn’t impact overwrites though. Only tag key value pairs, measurements, field keys, and timestamps. If any of those have different names of values and aren’t being exactly copied then you won’t have an overwrite.

cortlieb · April 22, 2025, 11:24am

@Anaisdg OK, just double-checked it.
In the exported csv-file of the measurement with the calculated values (coop_garden_calc) I can find each data set ( I think “point” is the correct term) 5 times!
For each of the 5 identical data points measurement, time stamp, tag keys and values and field keys (its only one) are the same.

Anaisdg · April 23, 2025, 8:33pm

@cortlieb,
Thats very odd. Can you pleaes share those csv files? Logically there has to be a difference even if its a nanosecond difference somwhere because that’s just how influxdb works.

cortlieb · April 24, 2025, 11:03am

@Anaisdg , sure, I can do that.
I renamed the csv-file to *.txt, since csv-files seems to be not allowed to attach to posts.

query(11).txt (802.0 KB)

cortlieb · April 30, 2025, 10:12am

@Anaisdg , any news in this matter?

Just noticed that a small piece of cake is display beside my user name, since I joined the community a year ago. If I could solve the described issue it would be the cherry on the cake .

cortlieb · May 12, 2025, 9:05am

@Anaisdg , have you had a chance to look at the csv-file I provided?

eriktar · June 30, 2025, 10:38am

Perhaps not fixing the unsorted input to your hapysc-garden bucket. But on the viewing end when you retrieve - does a sort help?

|> sort(columns: ["_time"], desc: false)

Topic		Replies	Views
Tasks works fine when started manually, but not automatically? InfluxDB 2 query , flux , tasks	24	2547	May 23, 2022
Calculate the difference with a twist Welcome & Getting Started	2	645	March 15, 2021
Performance of tasks InfluxDB 2	2	709	February 5, 2021
New tables when task insert data Tasks tasks	0	18	July 10, 2024
Task difference that doesn't write data in new bucket InfluxDB 2 tasks	7	826	March 29, 2022

Calculate difference in a InfluxDB task and avoid doubling of data

Related topics