We are considering changing one of the default configuration options starting with InfluxDB 1.3.0. Essentially, the current behavior is that we enable the storage of database statistics into _internal. However, we see that many customers deploy into production with this still enabled. Our recommendation is to turn off the internal storage of these statistics for your production system.
Therefore, the proposed change to the config option would be to ship with the default setting for store-enable = false. For pre-production environments, you would be required to turn this on.
Example: Current
[monitor]
# Whether to record statistics internally.
# store-enabled = true
Proposed
[monitor]
# Whether to record statistics internally.
# store-enabled = false
Looking for feedback on whether folks think this causes issues/concerns with our community. Please share your thoughts.
Could you clarify reasoning for this recomendation? Does store-enabled = true result in noticable performance hit on producton system? What are alternatives to database health/performance monitoring in production then?
Disk space usage by _internal statistics is small and monitor RP be adjusted as needed, right?
The alternative to having the database capture and store information about itself is described here:
The recommendation is to set up an open source instance of InfluxDB along with Telegraf to monitor InfluxDB Enterprise edition and capture system stats along with /debug/vars (as outlined in the blog post above) should be sufficient and eliminates any overhead (CPU, storage, memory, etc.) on the production instance.
There are situations where your Enterprise Edition may reach limits in terms of system resources – if you are also attempting to access that same instance for monitoring purposes to triage what is causing the issue (and when…), you simply will be unable to do so. This is not advisable for a production setup.
It’s also best practice to monitor your production infrastructure with systems outside. It’s kind of like if you were using MySQL and you used a monitoring system that used your MySQL instance to monitor itself. The time you most need your monitoring is when your production instance isn’t working properly. People get caught up on the idea that InfluxDB is the monitoring stack so it shouldn’t need a separate monitor. Not true, you always need to think about who’s watching the watchers.
We’re actually doing both at the moment, collecting _internal and with the inputs.influxdb telegraf plugin. To be honest I haven’t done an analysis, but if the data is indeed the same I think it’s fine to drop the _internal for prod environments.
The only possible angle I could see is if someone isn’t using telegraf at all (shocker, I know!). They would need to enable the feature to get any sense of local measurements, but then they might have a debug/var scraper of their own anyway.
Appreciate you all bringing this up again, I remember we had the impact discussion when we dropped the _internal interval to 1s way back when and it’s just seemed to fall on the backburner.
I wasn’t aware of the recommendation to turn it off for production, and since we will be going to Enterprise in Q3, this is worth knowing about. (5 nodes, three in prod, 2 in dev).
Currently we have OSS deployed in two environments (prod & dev), but dev is a misnomer, since we require it to be as performant as the prod environment.
Standing up a small OSS influx instance to provide external monitoring is an easy thing to do, given the scale of our infrastructure.
IS @sebito91 correct? Are the inputs.influxdb and _internal stats the same?
Perhaps a set of best practices or other advice based on existing data from OSS _internal stats would be helpful.