Etcd input plugin

Srinivas_Kotaru · June 26, 2017, 9:55pm

Hi

We are using Kubernetes based platform. etcd is very key component in the Kubernetes cluster. I would like to collect metrics about etcd health, read, write speeds, latency, throughput etc information. Is there any best way to collect this information using telegraf ?

Srinivas Kotaru

jackzampolin · June 26, 2017, 10:44pm

@Srinivas_Kotaru The way I’ve done it in the past is to use the Prometheus plugin to export the metrics in that format. You can also use Kapacitor’s service discovery and scraping to do this exact same thing.

If you are running TICK in kubernetes I would suggest you check out tick-charts!

Hope that helps,

Jack

Srinivas_Kotaru · June 27, 2017, 2:31am

@jackzampolin Thanks as usual.

Can we pull Kube API server metrics instead of running Prometheus server? I mean every Kuburnet exposed metrics under /metrics URL.

Srinivas Kotaru

jackzampolin · June 27, 2017, 3:35pm

@Srinivas_Kotaru Yup! But for kubelet metrics telegraf has a plugin that pulls those. The telegraf-ds is configured with that plugin by default.

adrianlzt · July 14, 2017, 12:48pm

Readiing the FAQ of etcd they recommend monitoring the p99 of backend_commit_duration_seconds and wal_fsync_duration_seconds.

The data exported using the metrics endpoint is like:

etcd_disk_wal_fsync_duration_seconds_bucket{le="0.001"} 3.522449e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.002"} 1.0488103e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.004"} 1.3733184e+07

Have created a graph for that values? How?
Thanks!

adrianlzt · July 14, 2017, 1:02pm

My approximation is to draw three lines with queries:

SELECT mean("0.016")/mean("count") FROM "etcd_disk_backend_commit_duration_seconds" WHERE $timeFilter GROUP BY time($__interval) fill(null)
SELECT mean("0.032")/mean("count") FROM "etcd_disk_backend_commit_duration_seconds" WHERE $timeFilter GROUP BY time($__interval) fill(null)

And draw a threshold line at 0.99.
According to etcd docs, if disk is fast enough, 32ms line should be above 99% threshold line.
Similar for wal_fsync_duration_seconds.

Topic		Replies	Views
Gathering Kubernetes Pod Metrics with Telegraf Telegraf telegraf	2	2981	April 2, 2018
Own Telegraf Plugin -- Need to scrape metrics from Prometheus clients Telegraf telegraf , prometheus , client-libraries	2	1660	December 3, 2019
Collect data from an application with Prometheus exporter Telegraf telegraf , prometheus	11	4471	November 2, 2020
Additional inputs config on tick-charts telegraf , prometheus	1	838	April 25, 2017
Recommended way to use telegraf for kubernetes Telegraf telegraf	1	1499	September 14, 2020

Etcd input plugin

Related topics