Can telegraf be setup on a single server to pull server health metrics of multiple servers?

Can telegraf be setup on a single server to pull server health metrics of multiple servers?

Telegraf can be configured to collect metrics from HTTP endpoints using the “http” plugin, but you will need to expose the metrics in some way.

What are you trying to accomplish with this setup?

Here is our setup:

telegraf (collect)+influxdb(store)+grafana(analyze). All of these are installed on a central server where metrics are hosted.

Issue:

We are trying to gather the server level stats like cpu, memory etc. The only way we could get these stats is to have the agent installed on the servers to push these server level metrics to central server with influxdb. If there is another way to do this, I would greatly appreciate if you can guide me to collect these metrics via pull than push, if possible we would like to stick to telegraf for collecting these server level metrics.

I am aware that we can pull the metrics of apache, redis, tomcat, jolokia etc via HTTP endpoints. We are already utilizing this feature.

Sorry, I’m still not sure I understand your issue. Are you trying to avoid installing Telegraf on your instances, or run fewer processes, or are you limited to pulling metrics by your architecture?

If you want to pull metrics, you need to expose them somehow. I’m not aware of any way to expose system metrics over the network natively in Linux, so that means running a process that will listen for requests and respond with the appropriate data.

If that isn’t an issue, you can run Telegraf on your hosts and expose metrics using the “prometheus_client” output plugin, which exposes metrics on the /metrics endpoint in Prometheus format. You can then configure your main Telegraf instance to scrape each of those endpoints using the “prometheus” input plugin.

1 Like

Yes, we need to know if one telegraf on a central server with influxdb will do it all, i.e. pull metrics (cpu, mem, etc)from hosts. Instead of installing telegraf agents on each and every server (Linux or Windows).

From your earlier comments it looks to me that we need to install telegraf on every machine for a push or pull to influxdb.

Yup, for pull, you need a process to expose the metrics. For push, you need a process to send them. Telegraf can do both.

Its clear to me now. Thank you for your inputs.

One final question,

Are there any limitations to using the Prometheus input to expose metrics over /metrics? meaning can I expose all metrics collected by the agent installed on the servers?

-Naresh

Any metrics can be outputted via any of the output plugins.

If you don’t have any architectural requirements that favor pull, and you have to install Telegraf on each instance to collect system metrics anyway, you might want to consider having each Telegraf write to InfluxDB. That would be the least complex setup.

We do have a architectural requirement to only pull. I will try your inputs.

-Naresh

Another option is SNMP, if the servers in question are already sharing the metrics that you want via an SNMP server, that would be quite a neat solution.

I use a mixture of push and pull in my own setup: Telegraf agent installs, pushing metrics directly to a central InfluxDB (for servers) and one of my Telegraf instances pulls metrics via SNMP (for appliances like routers, switches, NAS, where I can’t or won’t install Telegraf). The beauty of Telegraf is it’s flexibility. :slight_smile:

2 Likes

Thanks @GainfulShrimp, that’s great information!