Design for forecast storage with high number of tags and series

Theo_Masson · April 14, 2020, 3:10pm

Dear influxdb users

Since few weeks I try to design for my company a functional and simple database for simple time series, but with a lot of different series:
We make forecast with different algorithm with different horizon. Typically we have a forecast every 10 to 15 minutes, with 120 values, for a certain number of sites (actually around 100, but growing fast). We have an interest to keep all the individuals forecast so we can perform analysis. Each forecast is then identified by the time of is run and the site of the forecast.
Actually my base have this form :
Base (Prod)
Measurement (Forecast)
Tags : - run (string of the timestamp of the run, growing at a rate of 144/day)
- site
Fields : ~15 fields of forecast.

I have other Measurement but it’s not relevant for this post as it’s simple sensor data, in a classical data stream.

With this architecture it’s easy to query data, to group by run to separate the forecast. Although this imply a large number of series (typically 144 per site per days, so around 5 million per years for 100 sites). If I well understand the hardware sizing, 10 million series is a limit for a simple influx database and it seem that we can reach this limit really fast.

I would like to know if you see a design error in my base. Should I create a base by year, a base by site, or just upgrade the server ? (We actually do not have issue with a small instance and 800 000 series).
This is important for us to understand, as we have another forecast product which should go into the base, but with a forecast every minute, so 525 000 independent series / year / site.

I hope this is clear enough, if you need more detail obviously I can provide them.

Thank you in advance

Anaisdg · April 15, 2020, 12:37am

Hello @Theo_Masson,
Ingest depends on largely on your HW and retention policies, but yes you should scale vertically and enable tsi.

Theo_Masson · April 15, 2020, 6:36am

Thank you for the tips on tsi, I will enable it!
What do you mean by “scale vertically”, should I change the organization in the base ?
Thank you !

Anaisdg · April 15, 2020, 2:44pm

Hello @Theo_Masson,
I mean to scale your server size up/use a larger InfluxDB instance.

Theo_Masson · April 15, 2020, 2:55pm

Perfect thank you for your time !

Topic		Replies	Views
Newbie Question: Optimization of large data set	3	1706	July 5, 2017
Should I divide my data into several measurements? Store	9	891	April 18, 2019
Is there a limit on the number of values per tag? influxdb	1	1240	February 8, 2019
How to Optimize InfluxDB Performance for Large Time Series Data Sets?	1	287	June 1, 2024
How to store/query forecast timeseries i.e. two timeseries keys pair	7	1942	June 9, 2017

Design for forecast storage with high number of tags and series

Related topics