Pardon the uninformed question, but I’m new to time-series databases in general and InfluxDB in particular and I’m looking for advice.
I’m in the process of implementing CHORDS, a meteorology application, that utilizes InfluxDB to store measurements. If you’re interested, you can see the demo at http://ncar.github.io/chords/ or the wiki at https://github.com/NCAR/chords/wiki.
In my particular case, I have 8 active towers (6 collect every 15 minutes, all 8 collect every 24 hours). The data coming in ranges from 2 values (day of year and precipitation) to 50 values (temp at multiple levels, etc). In addition to the active towers, I have 7 towers with 15-minute and 24-hour data that I’d like to keep together.
I would guestimate based on splitting the data from a tower/time period providing a single row to each measurement being a single row that I’ll have around 1-2 billion rows. Hardware isn’t an issue as we’re running on a “cloud” of Virtual Machines and I can just ask for more memory, drive space, processors, etc.
The write load once the data is loaded will be minimal but queries could be reasonably heavy based on the time period and/or values. As an example, one of my monthly reports builds a list of precipitation during the month at any of the given active towers for those days that had rain.
As it stands now, I have data retention set to infinite (I need users to have access at any given time). Given that, what can I do to optimize and/or have the primary developers at NCAR do to allow flexibility of large and small data sets in a setting?