Multiple Collectd instances


I am looking to monitor 30+ servers with Collectd, InfluxDB and Grafana.

I have Collectd setup across all servers reporting metrics to an InfluxDB instance on a separate server. The interval setting on collectd is 1. This worked fine for a while but as I started increasing the servers reporting, InfluxDB slowed down as Grafana took 10+ seconds to pull stats on a single server. My guess is, the slow down is attributed to all those servers reporting metrics every second.

Does my InfluxDB configuration look okay? Are they any changes I can make to have it handle all the incoming metrics better?


@Ralph That depends on how much data you are pulling up with each query. It also depends on how large of an instance you are using to host InfluxDB. Another complicating factor is writing graphite to InfluxDB is not optimal for the storage engine.

One thing I would suggest is using telegraf instead of collectd. Telegraf natively writes schema that is optimized for the database, and is less resource intensive than Collectd. You could get the same metrics out of telegraf with the following configuration:

telegraf -sample-config -input-filter apache:disk:diskio:net:nstat:cpu:system:mem:swap:tail:mysql:processes:sensors:exec -output-filter influxdb