Performance of copying a measurement into another retention policy

wiebeytec · January 9, 2019, 3:02pm

I have a 200 GB hosted Influx DB, of which one of the measurements needs to be retained longer, so I need to copy it to another retention policy. I already changed my application to do two HTTP posts, the second one containing only the measurement for the longer retention policy. Now, I need to copy the existing data.

After some Googling, I found this:

select * into "fouryears"."51" from "autogen"."51" group by *;

I tested this on a dev DB and seems to work.

The live database contains 1.5 U.S. billion (1529039108) points in “51”. I’m a bit reluctant to run that query, also because I never seem to able to abort running queries.

Is this indeed the best way to go about it?

rawkode · January 9, 2019, 11:36pm

Hi @wiebeytec,

When you say “200GB hosted InfluxDB”, do you mean on InfluxCloud, or hosted on your own infrastructure?

Did you know you can also change the retention policy, instead of copying into another? So this could be a better option for you, if you don’t want to duplicate a whole host of data.

Thanks,
David

wiebeytec · January 10, 2019, 12:21pm

Hi @rawkode,

I meant a InfluxCloud hosted instance, with 200 GB of 512 GB used. Our default retention of the autogen policy is 6 months.

My issue is that I want one out of hundreds of measurements to be retained longer. As far as I know, this can’t be easily done in Influx, because the retention policy is a name space / schema of sort. Hence the proposed solution.

I’m just worried it will take very long, and cause excessive load or disk space. Because I can’t easily clone the DB elsewhere and test, I was hoping someone can put my worries to rest.

philb · January 10, 2019, 4:57pm

You could use Kapacitor to run your data in batches and move it to a secondary retention policy.

We tend to collect 30 days of raw data and roll this up every 15 minutes into 5 minute chunks and store the aggregated data for 90 days.

Kapacitor as a CQ Engine

You could use InfluxDB to run the CQ but this adds extra strain to the influx instance. Kapacitor will do the same job but add less strain on to the influx instance. It would impact it slightly when the batch task runs, depending on how many datapoints you query with each batch. Kapacitor will also let you run more complex math functions on your data which CQ’s will not.

This way you could just define a TICK script which can be edited as and when you need to. Another advantage of CQ’s as these cannot be modified once they are active. They need to be deleted and recreated. Plus, the syntax is a git.

I’m not sure you can abort the query once you run it, it will either complete or cause your InfluxDB to OOM and fall over.

wiebeytec · January 11, 2019, 4:10pm

I decided to spit up the back-filling with multiple queries with a time span of a week; I simply generated them with a bash script. It takes about 10-15 minutes, about 65 million points per week. Looking at the co-monitoring, it doesn’t impact it much.

Topic		Replies	Views
How to configure different retention policies for different measurements influxdb	10	8248	April 18, 2019
Create retention policies for specific measurements InfluxDB 2	0	373	August 16, 2023
One measurement with two retention policies possible? Store influxdb	1	829	March 26, 2018
Simplifying InfluxDB: Retention Policy Best Practices Store	1	2064	November 13, 2018
Best practices around database/retention policy/measurement design influxdb , schema , backup	6	7258	June 9, 2017

Performance of copying a measurement into another retention policy

Related topics