Retention policies not dropping old data

First off, let me thank you for this awesome product, it’s almost perfect for our needs. Except for one tiny issue that I haven’t been able to figure out an maybe I’m missing something obvious - none of the defined retention policies seem to remove the old data.

As an example, we have the liverates policy which is supposed to hold the last hour’s worth of data:
> show retention policies
name duration shardGroupDuration replicaN default
---- -------- ------------------ -------- -------
autogen 0s 168h0m0s 1 true

liverates 1h0m0s 1h0m0s 1 false

However the data from 3 months ago is still lingering around:
> select * from liverates ORDER BY time LIMIT 1
name: liverates
time ask bid exchange instrument
---- — — -------- ----------
2017-12-28T21:13:24Z 59.911035 59.911035 investing OILUSD

And the policy has grown to almost 100 million records at this point… Actually hats off for the query performance, although obviously it has slowed down a bit.
> select count(*) from liverates
name: liverates
time count_ask count_bid
---- --------- ---------
1970-01-01T00:00:00Z 92729848 91867303

It used to run 1.4, but it has just been updated to 1.5.1 (on Ubuntu 17.10).

Is there anything I might have missed when setting up the retention policies? Maybe a possible issue with the installation which prevents some scheduled script from running? Is there a way to force pruning the old data?

It seems like there are no shards created for my retention policies, only for autogen.

name: fxdata
id database retention_policy shard_group start_time end_time expiry_time owners


750 fxdata autogen 750 2010-12-27T00:00:00Z 2011-01-03T00:00:00Z 2011-01-03T00:00:00Z

854 fxdata autogen 854 2018-03-26T00:00:00Z 2018-04-02T00:00:00Z 2018-04-02T00:00:00Z

Also it seems that those shards are not dropped either (expiry_time being year 2011). Can this offer any clue?

What is the shard group duration?

You are most likely writing to, and querying from, the default autogen RP.

1 Like

As per my first post, the shardGroupDuration of the liverates RP is 1h0m0s. There are several more RPs, all having the same issue.

The data is written to autogen, and then written to RPs via continuous queries (basically one CQ per RP). For reading, I don’t query autogen, but even if I did, I think it shouldn’t affect if the old data is dropped or not?

InfluxDB is running on the default configuration. Is there there some other information I could provide to help diagnose the issue?

You’ve conflated measurements and retention policies.

That reads from a measurement named liverates. It’s probably selecting from the default RP (autogen) unless you’ve configured otherwise. Having a measurement with the same name as an RP will lead to confusion.

Make sure all your writes are specifying the rp query parameter. Make sure all your queries and CQs are using fully qualified measurements: SELECT x FROM "my_db"."my_rp"."my_measurement" WHERE ...

Fully qualified CQ example

/write API reference

2 Likes

I finally got back to this and indeed Mark was right about the cause of the issue. Apparently all my “retention policies” were actually measurements created under autogen. I recreated the CQs, kept only the data we need (data directory size went from 8 GB to 200 MB) and everything works smoothly now. Thank you very much for the assistance!