Prometheus with Telegraf

rj23495 · December 3, 2019, 4:18pm

Hi,
I am using Telegraf to ship or say expose my metrics on /metrics path for my prometheus operator service monitor to capture application metrics…
But i am facing a problem like since i am parsing my logs using telegraf so for every new timestamp it is creating a new time series…
For example - metrix_x(time=t1), metric_x(time=t2)
So logically it should create a single time series with these timestamps as my base timestamp but i am not sure how to achieve that via telegraf and prometheus.
Need help on this…

Anaisdg · December 4, 2019, 5:47pm

Hello @rj23495,
Thanks for your question. Just to be sure, you’re using the prometheus input plugin? Can you please share the output from your /metrics endpoint, the InfluxDB metrics, and your telegraf config?

rj23495 · December 6, 2019, 9:47am

@Anaisdg, I am using prometheus output plugin and storing my data in prometheus as TSDB.
I also wanted to parse and split path field which is added as label in prometheus output.
my log line looks something like this -
Also theres an issue all the log lines are not shipped or exposed i guess to prometheus due to which the data i am getting doesnt seem reliable.
To avoid the timestamp issue i have dropped that term/field so that i am aggregation is done at the pod level
Should i be using inputs.tail plugin or logparser as there are lot of files to parse…
2019-12-05 05:06:04.499 [634dcaa5-4ed5-491b-a21c-7e16ba882e6b] metric TabularOperations_getTaskTabularData_success_latency int32 4113 milliseconds

telegraf.conf: |+
    [[outputs.prometheus_client]]
      listen = ":9273"
      path = "/metrics"
      expiration_interval = "0"
    [[inputs.logparser]]
      files = ["/var/log/abc/*/*/logs/service.log"]
      from_beginning = false
      [inputs.logparser.grok]
        patterns = ['%{TIMESTAMP_ISO8601} \[%{DATA}\] %{DATA:metric} %{DATA:metricname} %{DATA:datatype} %{NUMBER:metricvalue:float} %{GREEDYDATA:unit}']
        measurement = "test_service_log"
    [[processors.parser]]
      parse_fields = ["metricname"]
      drop_original = true
      merge = "override"

rj23495 · December 9, 2019, 10:23am

@Anaisdg any update, i am stuck on this. Log file lines are being skipped very frequently and i am losing many data points of my application metrics.
I added outputs.file as well and it showed all the log lines read but they are not being exposed to prometheus and only a single data point is being exposed.

It is picking up only the last line printed and parsing it ideally it should ingest all the lines and create a time series of that for every data point

Anaisdg · December 9, 2019, 6:21pm

@rj23495,
I’m afraid I’m not sure…I’m asking around. This might be more of a question for prometheus community. I’ll post more info as soon as I get it. Have you tried setting debug=True on your config to get more info?

rj23495 · December 10, 2019, 5:18am

Yes i have set the debug flag true but did not see any error messages as such.
Also i tried with stdout output and it seems all the points data points are there as expected.

noahcrowley · December 10, 2019, 6:33pm

It sounds like you have everything else working if you see the metrics you expect in stdout, but you’re still missing metrics in Prometheus. This might have something to do with how often Telegraf publishes metrics relative to how often Prometheus is scraping them.

Both Telegraf and Prometheus operate periodically; that is, Telegraf gathers and publishes all metrics according to the flush interval set in your configuration, while Prometheus collects metrics based on the scrape interval set in your configuration. If these intervals are set incorrectly it could result in lost metrics; for example, if the flush interval of Telegraf is set to 15s and the scrape interval for Prometheus is set to 30s, you would “miss” half of your metrics.

rj23495 · December 11, 2019, 8:43am

I guess there is an issue in the way how my metrics are being exposed ideally for every unique set of labels there should be a new time series right…
Also for the scrape interval and flush interval,
flush interval is 20 sec and prometheus scrape interval is 5 sec so ideally i should be getting all the metrics whichever are exposed but all metric labels have not been exposed…

Topic		Replies	Views
Telegraf tail plugin to prometheus server Telegraf telegraf , prometheus	8	3002	July 28, 2021
Telegraf logparser input prometheus_client output telegraf	4	2240	May 14, 2018
Telegraf logparser, prometheus_client drops metrics Telegraf telegraf , prometheus	1	1528	July 12, 2018
Telegraf LogParser plugin only parsing the last line in the log file telegraf , windows	1	629	December 9, 2019
Missing metrics when proxying them to output.prometheus_client telegraf	5	1814	May 7, 2021

Prometheus with Telegraf

Related topics