Deduplication for repetitive data?

trevelyanuk · August 6, 2019, 5:46pm

I’m monitoring some data and, although some fields continually change over time, other fields don’t (usually). In fact, they very rarely change; but I want to monitor them nevertheless for if and when they do. Data is being recorded with Telegraf and sent to InfluxDB.

Since several thousand points could have been taken with the same value, is there functionality that de-duplicates this at the InfluxDB level? Or is it possible for me to include a conditional statement when getting the input data in Telegraf? (IE get last value, only store a new point if the new value is different)?

Many thanks!

Anaisdg · August 6, 2019, 8:25pm

Hello @trevelyanuk,

Thanks for your question. I don’t think there’s a way to handle conditional ingest with Telegraf or Kapacitor. I’m looking more into Kapacitor in case there’s something I’m missing, but right now I can suggest ingesting all of the data and setting alerts for when the data changes (assuming that you’re primarily interested in tracking the change):

trevelyanuk · August 8, 2019, 6:07pm

Pretty much - but also saving the SD of my Pi a bit, where this is running…

To be fair, it it only 96 points a minute extra - but, due to the infrequency of the data changing, I think I’ll write a separate script to simply check the data and then send a metric when something happens.

Thanks anyhoo!

ruairinewman · August 9, 2019, 10:53am

Hi,

Further to the OP’s original question: is any form of data deduplication possible with InfluxDB? We are using a v1.6.5 enterprise cluster.

Thanks

Anaisdg · August 13, 2019, 7:50pm

@ruairinewman @trevelyanuk,

You could try writing a UDF or just using a CL or the API to write your own custom scripts to either prevent or eliminate duplicates.

Also FYI, data deduplication is usually a feature provided by a database or filesystem which InfluxDB does not provide. The goal of deduplication is to reduce storage usage, but InfluxDB already has extremely efficient compression, so you don’t have to worry about data deduplication.

Topic		Replies	Views
Condition based update in telegraf Telegraf	2	45	September 20, 2024
Dedup processor plugin telegraf	5	1607	May 3, 2021
Replication of Non frequent data change influxdb , telegraf	12	260	March 25, 2025
Deduplicating input in Kapacitor Kapacitor kapacitor	1	693	July 12, 2018
Telegraf and influxdb configuration for ingesting bulk of data InfluxDB 2	2	473	September 24, 2021

Deduplication for repetitive data?

Related topics