Calculate difference in a InfluxDB task and avoid doubling of data

I have a task that sounded easy in the beginning and turned out to be difficult.

I have some rain sensors. The operation principle is that a small bucket is filled with water when it rains. When the bucket is full, it tips over, pours the water out, a switch generates a pulse and everything starts from the beginning. The pulses are counted and sent each 10 minutes via a wireless connection to a gateway. Maybe it is important to say that due to the wireless connection it is not guaranteed that each measurement is actually received.

I use Telegraf to write the no of pulses to InfluxDB (V2).
Now the task is to:

  • build the difference between two subsequent measurements to know how much “pulses has been rained” during this two measurements
  • covert the bucket tip pulses to mm of rain (1 mm rain = 1 l / m²), 1 pulse is 0.2 mm rain
  • the pulse counter in the sensor flows over at 255 and starts again at 0

I’m using a “task” in InfluxDB to do this calculations.
The task runs each 10 minutes. The results are written to another measurement (coop_garden_calc) in the same bucket. I have to admit that I’m really not an Flux expert, but as far as I know in V2 tasks can only be defined in Flux. With the documentation, some tests and ChatGPT I came to this setup:

option task = {name: "calcRainDiff", every: 10m}

startRange = -60m

from(bucket: "hapysc-garden")
    |> range(start: startRange)
    |> filter(fn: (r) => r["_measurement"] == "coop_garden")
    |> filter(fn: (r) => r["_field"] == "Rain")
    |> group(columns: ["devEUI"])
    |> sort(columns: ["_time"])
    |> difference()
    |> map(
        fn: (r) =>
            ({
                _time: r._time,
                deviceName: r.deviceName,
                devEUI: r.devEUI,
                location: r.location,
                _value: (if r._value < 0 then r._value + 256.0 else r._value) * 0.2,
                _measurement: "coop_garden_calc",
                _field: "RainCorrDiff",
            }),
    )
    |> to(
        bucket: "hapysc-garden",
        tagColumns: [
            "deviceName",
            "devEUI",
            "location",
        ],
    )

There are 2 results of this scripts that bother me.

  • The graph in the InfluxDB “Data explorer” looks, äh, strange :flushed_face:

    I can also see in the table view, that data sets are no longer ordered chronological.
    On the other hand, Grafana, that I’m using to show the data, doesn’t seem to care too much, graphs look OK. Hence, I’m not sure if that is an issue, but it doesn’t look good to me.
  • Main issue currently is that each calculated value appears 5 times in the measurement with the calculated values (coop_garden_calc).
    I guess the reason is obvious, since I do the calculation over a range of 60 mins, each Rain value in the raw value measurement (coop_garden) is usually touched 5 times by the script.
    But I was under the impression when
    • time stamp
    • tag set
    • field value
      are the same, that data sets in the database are overwritten. Since the calculations are always based on the same raw data all 5 times, I would assume that the result should be exactly the same.

This leads to 2 questions:

  • What is the point where my reasoning is wrong, regarding the overwriting of the same data in the database?
  • What can I do the fix the problem?
    One way would be obviously to do the calculation on each data set pair only ones. But I don’ t know how to achieve that.
    The function tasks.lastSuccess() seems to have an issue (see this discussion).
    The execution of the task in InfluxDB and the reception of new data is asynchronous.

I would be very thankful for some ideas!

Hello @cortlieb,
I’m confused too I believe you’re right, if all those values are the same it should overwrite. I’m also confused. Im asking for a sanity check.

Hi @Anaisdg,

thank you for your reply.

ChatGPT suggested, that though the time stamp might appear the same in the InfluxDB UI (I see a resolution of µs) they might still differ on ns level.
But since the the time stamps are taken from the raw data
_time: r._time,
they should be the exact same for each calculation result, even if I would do the calculation a million times. Am I correct? :thinking:

The only thing that is really calculated and not simply copied from the raw data is the field RainCorrDiff. But it is a simple calculation from the raw data Rain. Is it thinkable that there are (maybe very small) differences in the calculation’s result? I can’t see why that should be the case.

Hello @cortlieb,
The field value shouldn’t impact overwrites though. Only tag key value pairs, measurements, field keys, and timestamps. If any of those have different names of values and aren’t being exactly copied then you won’t have an overwrite.

@Anaisdg OK, just double-checked it.
In the exported csv-file of the measurement with the calculated values (coop_garden_calc) I can find each data set ( I think “point” is the correct term) 5 times!
For each of the 5 identical data points measurement, time stamp, tag keys and values and field keys (its only one) are the same.
:thinking:

@cortlieb,
Thats very odd. Can you pleaes share those csv files? Logically there has to be a difference even if its a nanosecond difference somwhere because that’s just how influxdb works.

@Anaisdg , sure, I can do that.
I renamed the csv-file to *.txt, since csv-files seems to be not allowed to attach to posts.

query(11).txt (802.0 KB)