I have telegraf configured to scrape metrics from Kube State Metrics. Kube-state-metrics publishes metrics from custom resources some of which are of type stateset. Telegraf prometheus input plugin is failing decode metrics with the following error
2025-04-23T07:30:05Z E! [inputs.prometheus::kube_state_metrics] Error in plugin: error reading metrics for “http://someip:someportmetrics”: decoding response failed: text format parsing error in line 7979: unknown metric type “StateSet”
# HELP capi_cluster_status_phase The clusters current phase.
# TYPE capi_cluster_status_phase stateset
capi_cluster_status_phase{customresource_group="cluster.x-k8s.io",customresource_kind="Cluster",customresource_version="v1beta1",name="somecluster",namespace="somenamespace",phase="Deleting",uid="some-uid"} 0
capi_cluster_status_phase{customresource_group="cluster.x-k8s.io",customresource_kind="Cluster",customresource_version="v1beta1",name="somecluster",namespace="somenamespace",phase="Failed",uid="some-uid-2"} 0
can someone help out in resolving this issue
I see the issue with your Telegraf configuration when scraping Kube State Metrics. The error is occurring because Telegraf’s Prometheus input plugin doesn’t recognize the “stateset” metric type that kube-state-metrics is using for some custom resources.
The error message specifically points to the capi_cluster_status_phase
metric, which is defined as type “stateset” but Telegraf doesn’t know how to decode this type.
There are a few ways to resolve this:
- Use metric filtering: Configure Telegraf to ignore these specific metrics that are causing problems:
[[inputs.prometheus]]
# Your existing configuration
# ...
# Add metric filtering
metric_version = 2
# Exclude the problematic metrics
metric_name_filter = ["capi_cluster_status_phase"]
metric_name_filter_mode = "exclude"
-
Create a transform to convert the metrics: You could use Telegraf’s processors to transform these metrics into a format it can handle.
-
Configure kube-state-metrics: If possible, consider configuring kube-state-metrics to expose these metrics in a different format that Telegraf can understand.
The simplest approach is typically to use the metric filtering option to skip the problematic metrics if they’re not critical for your monitoring needs.
Thanks @skartikey for the suggestions.
I’m also leaning towards second option. however im not sure what is different in the structure of the metric between a gauge and a stateset. For example in the snippet below both the metric structure look the same except the metadata that says the type as stateset
# HELP capi_cluster_status_condition_last_transition_time The condition's last transition time of a cluster.
# TYPE capi_cluster_status_condition_last_transition_time gauge
capi_cluster_status_condition_last_transition_time{customresource_group="cluster.x-k8s.io",customresource_kind="Cluster",customresource_version="v1beta1",name="somename",namespace="somenamespace",status="True",type="WorkerMachinesUpToDate",uid="50e401fb-9651-47e2-9baf-43d518cea952"} 1.744873934e+09
capi_cluster_status_condition_last_transition_time{customresource_group="cluster.x-k8s.io",customresource_kind="Cluster",customresource_version="v1beta1",name="somename",namespace="somenamespace",status="True",type="WorkersAvailable",uid="50e401fb-9651-47e2-9baf-43d518cea952"} 1.745265493e+09
# HELP capi_cluster_status_phase The clusters current phase.
# TYPE capi_cluster_status_phase stateset
capi_cluster_status_phase{customresource_group="cluster.x-k8s.io",customresource_kind="Cluster",customresource_version="v1beta1",name="somename",namespace="somenamespace",phase="Deleting",uid="50e401fb-9651-47e2-9baf-43d518cea952"} 0
capi_cluster_status_phase{customresource_group="cluster.x-k8s.io",customresource_kind="Cluster",customresource_version="v1beta1",name="somename",namespace="somenamespace",phase="Failed",uid="50e401fb-9651-47e2-9baf-43d518cea952"} 0
is the metadata the one causing the issue, if so, would a filter work on this