Delete specific data older than 30 days

Hi @mouthpiec, Sure. Bare in mind though I’m using an older influx version (1.5.2 currently)

I have one main assumption here, you are running the stack on linux variant.

There are two ways you can do this really, with CQ’s or Kapacitor batch tasks. Personally, i prefer to use the latter (there is less over heard for Kapacitor scripts as opposed to CQs) however if you are running small processing tasks then CQ’s should work.

This article might help you decide on which is best suited to you.

I’ve also found a topic i responded to a while back which might help

CQ Docs
Kapacitor as a CQ engine

Now, personally I prefer to use Kapacitor to do this. CQ’s are good but can be intensive on your InfluxDB instance. We process a lot of measurements with our instances and CQ’s were causing big issues in memory usage when they were running, using Kapacitor helped alleviate this - It also has more functionality in terms of functions it can perform. For this to work you will need to install Kapacitor as well.

Now, first of all we need a new RP if you haven’t created on already

CREATE RETENTION POLICY "2yearhistorical" ON "yourdatabase" DURATION 2y REPLICATION 1 

replace the DB name with your database and name the RP whatever you want to.
Check the new RP exists
SHOW RETENTION POLICIES

Once thats done, then create a batch TICK script to downsample the data (this is from a script i currently use, but you should be able to swap the measurements and fields and fiddle with it to get you started.)

//Downsample all metrics from the win_disk measuremnet.
batch
|query(‘SELECT mean(“Free_Megabytes”) AS “mean_Free_Megabytes”, mean(“Percent_Free_Space”) AS “mean_Percent_Free_Space” FROM “mydatabase”.“30days”.“win_disk”’)
.period(5m)
.every(5m)
.groupBy(time(5m), *)

|influxDBOut()
    .database('maydatabase')
    .retentionPolicy('2yearhistorical')
    .measurement('win_disk')
    .precision('s')

So the above script runs every 5 minutes and queries my raw data applying the “mean” function. This returns the mean values and outputs them to SAME DATABASE but into a different retention policy (you could specify a separate output DB if you wanted to keep them separate)

To enable it you would run
sudo kapacitor define downsample_task_name -type batch -tick /path_to_script -dbrp database.rawdataRP

Followed by
sudo kapacitor enable downsample_task_name

Once it’s run you should be able to use the data in chronograf to graph your “historical” data.

The best advice really is to read those articles about when to use CQ or Kapacitor to work out which best suits your needs. If you have a small amount of data to process then CQs might be the better option (1 less service to run)

As mentioned though, i am using an older version of Influx. You have this tagged as influxdb2 so there may be better way of doing this in influxdb2 - I don’t know, it’s still in development so we haven’t tested it at work.

Hope that helps, let me know if there’s anything you’re unsure of.

edit: things to consider:

  1. tag values, if you want to keep/retain all tags in the measurement then you must group by them, otherwise they won’t be inserted back into the database. You can group them all (*)
  2. if your default RP (raw data) isn’t already set to 30 days, you would need to amend the RP

Caution: changing this to 30 days will cause any data older than 30 days to be dropped.

Another point, my batch script above renames the fields mean_field_name - If you want to preserve dashboards and field names, you can change the AS mean_field_name to just AS field_name, that way you could just duplicate your dashboards and update the retention policy in the query.

Philb