Telegraf scrape Kubernetes Services (inputs.prometheus) Auto-Discovery

fran7802 · September 12, 2022, 4:02pm

Hi,
So im using the inputs.prometheus plugin with RBAC configured in K8s. Im also using the prometheus annotations to discover and scrape the pod’s in kubernetes.

Now i would like to scrape some services in kubernetes. I see in the documentation that this is an option but only via Consul Catalog:
PLEASE VIEW UPLOAD:

jpowers · September 12, 2022, 4:26pm

Hi,

You said you tried pod anotaton route and did not get any metrics. Did you add the correct annotations called out to your pods?

Thanks

fran7802 · September 12, 2022, 7:30pm

Hi,

Yes i added the correct annotations:

Please again look at the attachment

as for some reason i cannot add more than 1 link then it complains as im a new user:

fran7802 · September 12, 2022, 7:35pm

SO scraping the kubernetes service endpoint works 100% correct, i would have thought that if i added the pod annotations to the corresponding pod (the pod that uses that service) it would return the same metrics right, but not.
Below is the input.prometheus plugin and ofcourse RBAC is working as it returns a path that its scraping as seen above in attachment.

	[[inputs.prometheus]]
	 metric_version = 2
	 monitor_kubernetes_pods = true
	 pod_scrape_scope = \"cluster\"
	 pod_scrape_interval = 60
	 response_timeout = \"40s\"
	 insecure_skip_verify = true
	 monitor_kubernetes_pods_namespace = \"namespace\"
	 namepass = ['metrics1', 'metrics2', 'metrics3']

fran7802 · September 12, 2022, 8:15pm

Ok, so i just verified that my configs are correct, usually i scrape the service endpointgs from a list and this works 100%. What im doing now is scraping the pod (with the annotations).
So i just checked the service details via kubectl and verified that against the returned urls from inputs.prometheus and its exactly correct, but no metrics…

jpowers · September 13, 2022, 1:22pm

Ok, so it sounds like the plugin is doing the right thing. The question is why is it not collecting anything.

Can you share any further log messages? Preferably when telegraf attempts to actually collect at an interval.

It would be great if you could not use screenshots and copy and paste the logs.

fran7802 · September 13, 2022, 4:01pm

Every time i want to upload anything with more than 2 links this forum complains and says that because a new user im limited to only 2 links. Thats why im uploading screenshots.

fran7802 · September 13, 2022, 4:15pm

jpowers · September 13, 2022, 8:53pm

Thanks for the logs:

2022-09-13T16:11:40Z E! [inputs.prometheus] Error in plugin: http://10.63.77.202:8080/actuator/prometheus returned HTTP status 404 Not Found
2022-09-13T16:11:40Z E! [inputs.prometheus] Error in plugin: http://10.63.77.25:8080/actuator/prometheus returned HTTP status 404 Not Found
2022-09-13T16:12:42Z E! [inputs.prometheus] Error in plugin: http://10.63.77.202:8080/actuator/prometheus returned HTTP status 404 Not Found
2022-09-13T16:12:42Z E! [inputs.prometheus] Error in plugin: http://10.63.77.25:8080/actuator/prometheus returned HTTP status 404 Not Found
2022-09-13T16:13:27Z D! [outputs.file] Buffer fullness: 0 / 20000 metrics
2022-09-13T16:13:32Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 20000 metrics
2022-09-13T16:13:33Z D! [outputs.prometheus_client] Buffer fullness: 0 / 20000 metrics

The prometheus plugin will go through all URLs and create a goroutine for each URL and collect data. So while there are a couple that are 404, the others should have been captured.

Are you certain that these endpoints are reporting valid prometheus metrics? I had expected an error if that was the case as well, but the fact that no metrics are returned makes me wonder if the metric end points are empty.

Frank7802 · September 14, 2022, 5:22am

Hi,

As mentioned above if i take those endpoints and put them in a list, they return metrics.
These endpoints are all returning metrics via prometheus, as that is the current monitoring app, im testing out telegraf and it works gr8, just this auto-discover version of the inputs.prometheus plugin is giving me an issue:
[[inputs.prometheus]]
metric_version = 2
monitor_kubernetes_pods = true
pod_scrape_scope = "cluster"
pod_scrape_interval = 60
response_timeout = "40s"
insecure_skip_verify = true
monitor_kubernetes_pods_namespace = "namespace"
namepass = [‘metrics1’, ‘metrics2’, ‘metrics3’]

I use that telegraf debug command on the container shell and it returns nothing. Its strange.

Frank7802 · September 14, 2022, 5:24am

Ill do a screenshot today with auto-discovery and the shell command and then the list and the shell command.

fran7802 · September 14, 2022, 7:12am

Here is my Telegraf Agent Config:

{
“telegraf.conf”: "[[outputs.influxdb_v2]]
urls = ["$INFLUXDB_URL"]
token = "$INFLUX_TOKEN"
organization = "$INFLUX_ORG"
bucket = "$INFLUX_BUCKET"

	[agent]
	  interval = \"60s\"
	  round_interval = false
	  metric_batch_size = 3000
	  metric_buffer_limit = 20000
	  collection_jitter = \"10s\"
	  flush_interval = \"125s\"
	  flush_jitter = \"20s\"
	  precision = \"1ns\"
	  hostname = \"\"
	  omit_hostname = false
	  debug = true
	  quiet = false
	  logtarget = \"file\"
	  logfile = \"/etc/telegraf/log\"
	  logfile_rotation_max_size = \"150MB\"
	  logfile_rotation_max_archives = 5
	 
	  # Read metrics from one or many apps
	[[inputs.prometheus]]
	 metric_version = 2
	 monitor_kubernetes_pods = true
	 pod_scrape_scope = \"cluster\"
	 pod_scrape_interval = 60
	 response_timeout = \"40s\"
	 insecure_skip_verify = true
	 monitor_kubernetes_pods_namespace = \"namespace\"
	 namepass = ['metrics list']
	
	[[outputs.file]]
	  files = [\"/tmp/metrics.out\"]
	  use_batch_format = true
	  rotation_max_size = \"150MB\"
	  rotation_max_archives = 5
	  data_format = \"json\"
	  json_timestamp_units = \"1s\"
	
	[[outputs.prometheus_client]]
	  expiration_interval = \"180s\"
	  listen = \":9273\"
	  path = \"/metrics\"
	  string_as_label = false"

fran7802 · September 14, 2022, 7:20am

Hi Josh,

Maybe there is something in this config above (telegraf agent) that causing an issue:

I did find this in my other logs:

2022-09-13T07:49:33Z D! [inputs.prometheus] registered a delete request for “my-app-bc55d4954-hld5n” in namespace “my-namespace”
2022-09-13T07:49:33Z D! [inputs.prometheus] will stop scraping for “http://10.63.73.142:8080/actuator/prometheus”

Im not sure if there was an issue with this app at the time, but thats about the only error i can see related to a particular app, but thats about it, im scraping quite a few apps as you can see.

Is there any other debug command that i can use on the shell to verify this plugin.

Im using this:

telegraf --config /etc/telegraf/telegraf.conf --input-filter prometheus --test --debug

jpowers · September 14, 2022, 4:22pm

Is there any other debug command that i can use on the shell to verify this plugin.
telegraf --config /etc/telegraf/telegraf.conf --input-filter prometheus --test --debug

Hmm --test does have some edge-cases when using a service input like the prometheus input with Kubernetes is. I would suggest running with --test-wait 120 to ensure you at least fulfill one collection interval.

Unfortunately, I do think we are reaching the end of my knowledge of this plugin. At this point I would file a bug and we can see about building a debug version with more log output to understand why data is not getting parsed.

Frank7802 · September 14, 2022, 5:19pm

Hi,

Ok tried that and nothing.

Question: is there any other way to scrape k8s service endpoints apart from using Consul Catalog?

Thanks for your Help BTW!

jpowers · September 14, 2022, 9:55pm

Question: is there any other way to scrape k8s service endpoints apart from using Consul Catalog?

Hmm I do not.

fran7802 · September 15, 2022, 2:21pm

So i eventually figured out what the problem is, the namepass filtering breaks the inputs.prometheus plugin. Any idea’s on what this could be? or is the namepass not compatible with this auto-discovery?
[[inputs.prometheus]]
metric_version = 2
monitor_kubernetes_pods = true
pod_scrape_scope = "cluster"
pod_scrape_interval = 60
response_timeout = "40s"
insecure_skip_verify = true
monitor_kubernetes_pods_namespace = "namespace"
namepass = [‘metrics1’, ‘metrics2’, ‘metrics3’]

When i just remove the namepass metric filter, the metrics starts flowing in.

jpowers · September 20, 2022, 4:52pm

Sorry for the delay I’ve been taking some time off.

Ah! So namepass will determine what metric names are emitted. I had wrongly assumed you changed those for anonymity, but if you don’t have any metrics called “metrics1”, “metrics2” and/or “metrics3” then nothing would be emitted.

This is helpful in the cases where you want to slim down the metrics, or only want certain metrics to go to a specific output for example.

Topic		Replies	Views
Gathering Kubernetes Pod Metrics with Telegraf Telegraf telegraf	2	2956	April 2, 2018
[Question] Some questions regarding Discovery and Scraping of Prometheus Endpoints in Kubernetes Telegraf prometheus , kapacitor	1	896	September 21, 2017
Unable to fetch metrics from Application using telegraf Telegraf	4	1704	December 4, 2019
Own Telegraf Plugin -- Need to scrape metrics from Prometheus clients Telegraf telegraf , prometheus , client-libraries	2	1651	December 3, 2019
Telegraf for kubernetes pods/containers Telegraf telegraf	6	3405	May 8, 2019

Telegraf scrape Kubernetes Services (inputs.prometheus) Auto-Discovery

Related topics