Gathering Kubernetes Pod Metrics with Telegraf

telegraf
#1

Hi there,

I’m currently investigating metrics solutions for monitoring Kubernetes and the services that run on top of it. At present, we’re using Telegraf for gathering both host-level metrics as well as HTTP-endpoint metrics. Translating this into a Kubernetes environment seems to be challenging, however.

There exists a couple of Helm charts for deploying Telegraf as a DaemonSet (https://github.com/influxdata/tick-charts/tree/master/telegraf-ds) and as a deployment (https://github.com/influxdata/tick-charts/tree/master/telegraf-s). The DaemonSet approach would work for gathering host and container metrics, however I’m wondering if there’s also a simple approach to go about gathering metrics exposed at k8s pod HTTP endpoints.

I’ve also been looking into Prometheus (and the Prometheus Operator more specifically), but it feels quite heavyweight for what I want to do. That said, I do like its approach of annotating services to denote whether or not the collector should scrape metrics from associated pods’ endpoints.

Any guidance on this matter would be much appreciated!

#2

Am I to take no reply as an indication that there is no current stable solution to deploying telegraf and influx db on kubernetes to monitor kubernetes itself? I am having trouble finding a full deployment example, and I could be mistaken but the kubernetes plugin hasn’t been updated in awhile and it still says it may cause problems with medium to large k8s deployments.

thanks. Would like to try this out to compare to coreos prometheus operators.

#3

I’m not an expert on it, but I believe what we do is run one Telegraf per node as a daemon set, collecting system level metrics such as cpu and mem as well as the kubernetes input.

In addition to the daemon set, we run Telegraf in each pod collection application level metrics using plugins such as statsd or prometheus. Since Telegraf is in the pod it can be configured to talk directly to the application without the need to discover locations.

One thing to be mindful of when monitoring a container orchestration tool is the series cardinality which can become quite large due to the use of UUIDs for container identification. I recently updated the passenger README with some advice on dealing with this, and many of the suggestions there would carry over to the Kubernetes input.

@gianarb Anything I got wrong or you would like to add?