Deduplication for repetitive data?

I’m monitoring some data and, although some fields continually change over time, other fields don’t (usually). In fact, they very rarely change; but I want to monitor them nevertheless for if and when they do. Data is being recorded with Telegraf and sent to InfluxDB.

Since several thousand points could have been taken with the same value, is there functionality that de-duplicates this at the InfluxDB level? Or is it possible for me to include a conditional statement when getting the input data in Telegraf? (IE get last value, only store a new point if the new value is different)?

Many thanks!

Hello @trevelyanuk,

Thanks for your question. I don’t think there’s a way to handle conditional ingest with Telegraf or Kapacitor. I’m looking more into Kapacitor in case there’s something I’m missing, but right now I can suggest ingesting all of the data and setting alerts for when the data changes (assuming that you’re primarily interested in tracking the change):

Pretty much - but also saving the SD of my Pi a bit, where this is running…

To be fair, it it only 96 points a minute extra - but, due to the infrequency of the data changing, I think I’ll write a separate script to simply check the data and then send a metric when something happens.

Thanks anyhoo!

Hi,

Further to the OP’s original question: is any form of data deduplication possible with InfluxDB? We are using a v1.6.5 enterprise cluster.

Thanks

@ruairinewman @trevelyanuk,

You could try writing a UDF or just using a CL or the API to write your own custom scripts to either prevent or eliminate duplicates.

Also FYI, data deduplication is usually a feature provided by a database or filesystem which InfluxDB does not provide. The goal of deduplication is to reduce storage usage, but InfluxDB already has extremely efficient compression, so you don’t have to worry about data deduplication.