Downsampling older data into same measurement

Hello all!

I’m reading in data to my influx db every second. I have a retention policy in place to ensure that I don’t continuously store data and run out of disk space. However, I would also like to have a longer history period at a coarser level of granularity. So what I had in mind was this:

  1. Sample every few seconds and store to the database.
  2. After a set time, only keep 1 record for each time window.

I was able to do this using a Continuous Query and saving the data in another measurement. However, I want to keep the same measurement and just thin out the old data instead. Is this possible?

Maybe the aggregator plugin will help you.
You will also be able to keep or drop the “original” (not aggregated) points.

But to respect your requirements:

I have a retention policy in place to ensure that I don’t continuously store data and run out of disk space

I would also like to have a longer history period at a coarser level of granularity

I think the right solution for this kind of problem is a continuous query, you just need to store the data in a different Retention Policy (by keeping the same measurement structure), so the data will be kept for a longer period of time.

Since I want to graph it in grafana, I’d like to have all the data in the same measurement otherwise I’d need 2 separate graphs. This is why I was looking into getting the data stored in the same measurement.

IMHO this will create more problems, here is a list of my doubts/problems:

  • Your data won’t be kept for a longer time since the RP is the same as the original
  • You won’t be able to aggregate, insert new points and then delete the “not aggregated ones”. (I have no clue about how to do that, but operations like Update and deletes are discouraged and should be avoided)
  • If you insert the aggregated points in the same measurement as the old ones you won’t be able to query only the correct points unless you can identify the two different sets (using a tag). If the points get mixed the data won’t be usable. (this is based on the problem of the second point, the delete)

Can you have some delay in Grafana? (ie: data updated every 1/2 minutes instead of every second)

If the answer is “yes” a continuous query with the advanced syntax will allow you to compute an aggregation and update its result in an “incremental” way.

As an example you will be able to:
Compute the “Average CPU Use %” of the last 5min every 1min
The final result will be:

  • A series with a data point every 5min
  • The last point will be recalculated 5 times and it’s value updated, based on the values in the 5min range, even if you don’t have 5min of data yet. (the first time the avg will be computed on 1min of data, the second time on 2min and so forth until the interval ends and the value is not overridden anymore)

Have a look at the link above to have a more practical example.

Hope this helps

Appreciate your insights. I think that confirms that what I’ve been thinking of, just wanted to make sure I wasn’t overlooking something :slight_smile:

Can you share how you are doing this with Telegraf + Influxdb. How did you set up your RPs and CQs?

I think this guy solved our problem