Dear influxdb users
Since few weeks I try to design for my company a functional and simple database for simple time series, but with a lot of different series:
We make forecast with different algorithm with different horizon. Typically we have a forecast every 10 to 15 minutes, with 120 values, for a certain number of sites (actually around 100, but growing fast). We have an interest to keep all the individuals forecast so we can perform analysis. Each forecast is then identified by the time of is run and the site of the forecast.
Actually my base have this form :
Base (Prod)
Measurement (Forecast)
Tags : - run (string of the timestamp of the run, growing at a rate of 144/day)
- site
Fields : ~15 fields of forecast.
I have other Measurement but it’s not relevant for this post as it’s simple sensor data, in a classical data stream.
With this architecture it’s easy to query data, to group by run to separate the forecast. Although this imply a large number of series (typically 144 per site per days, so around 5 million per years for 100 sites). If I well understand the hardware sizing, 10 million series is a limit for a simple influx database and it seem that we can reach this limit really fast.
I would like to know if you see a design error in my base. Should I create a base by year, a base by site, or just upgrade the server ? (We actually do not have issue with a small instance and 800 000 series).
This is important for us to understand, as we have another forecast product which should go into the base, but with a forecast every minute, so 525 000 independent series / year / site.
I hope this is clear enough, if you need more detail obviously I can provide them.
Thank you in advance