We are building up a sensor system around InfluxDB. In these early stages we’re simply running a local instance of InfluxDB as a Docker container.
Looking down the road we are considering our options and strategy to maximize availability and recovery. Monitoring certain readings is fairly critical for us. Losing alarm systems built around InfluxDB for even a handful of hours could lead to a costly failure in the right circumstances.
On the one hand, relying on Influx Cloud eliminates much of our systems management overhead but introduces the real complication of Internet service outages at our facility (and to a lesser extent also Influx Cloud outages — Amazon’s recent experience will strike again). On the other hand, a local instance (likely involving Influx’s open source Relay project) eliminates outside outage issues but significantly increases our systems management overhead and overall risk.
We recognize that no architecture is bulletproof. Further, even if we did maintain everything on-premise but lost Internet service we still must address the issue of getting actual alarm conditions out of the facility.
What’s a good way to think about this? Is there a hybrid approach that’s not entirely unwieldy? How does one evaluate and deal with possible Internet service outages if relying on a cloud-based system? Concentrate on redundant Internet service? Engineer a hybrid local / cloud data system? Skip technical solutions and statistically model downtime in order to engineer the business around the consequences?
Perspectives and recommendations appreciated.