Prometheus Alertmanager not triggering MySQL/Nginx PID alert using Telegraf exporter procstat metric

jobet · December 27, 2024, 11:13am

Prometheus 2.39.0
Alertmanager 0.24.0
Telegraf 1.20.2

I’m trying to setup an alert to check if MySQL/Nginx are running or not in a remote host.
I’ve setup 2 Prometheus jobs
For MySQL

 - job_name: "gm_mysql_pid"
    scheme: "https"
    tls_config:
      insecure_skip_verify: true
    static_configs:
      - targets:
          [
            "host1.com:9273",
          ]

and for Nginx

  - job_name: "gm_telegraf_exporter"
    scheme: "https"
    tls_config:
      insecure_skip_verify: true
    static_configs:
      - targets:
          [
            "host1.com:9273",
          ]

Alertmanager configuration

route:
  group_by: ["alertname", "group", "instance"]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 120h
  receiver: devops-team
  routes:
    - match:
        group: gm_telegraf_exporter
      continue: true
      receiver: prometheus-receiver
receivers:
 - name: "prometheus-receiver"
    slack_configs:
      - api_url: https://hooks.slack.com/xxx
        channel: "alerts-prometheus"
        send_resolved: true
        title: '{{ template "custom_title" . }}'
        text: '{{ template "custom_slack_message" . }}'

Alert rules configuration

MySQL
  - alert: gm_mysql_pid
    expr: procstat_lookup_pid_count{job="gm_mysql_pid",pid_finder="",pidfile="/var/run/mysqld/mysqld.pid"} <= 0
    labels:
      group: "gm_mysql_pid"
    annotations:
      identifier: "Host: {{$labels.host}}"
      description: "Trigger: MySQL service is down!"

Nginx
  - alert: gm_nginx_pid
    expr: procstat_lookup_pid_count{job="gm_telegraf_exporter",pid_finder="",pidfile="/var/run/nginx.pid"} <= 0
    labels:
      group: "gm_telegraf_exporter"
    annotations:
      identifier: "Host: {{$labels.host}}"
      description: "Trigger: Nginx service is down!"

Looking into Telegraf metrics I see both metrics:

procstat_lookup_pid_count{host="host1.com",pid_finder="pgrep",pidfile="/var/run/mysqld/mysqld.pid",result="success"} 1
procstat_lookup_pid_count{host="host1.com",pid_finder="pgrep",pidfile="/var/run/nginx.pid",result="success"} 1

The issue is that if I stop both services (MySQL and Nginx) the Alertmanager alert is not being triggered, even if metrics show that both services are down…

procstat_lookup_pid_count{host="host1.com",pid_finder="pgrep",pidfile="",result="lookup_error"} 0

Am I missing something?

jobet · December 30, 2024, 2:48pm

Found out that the issue was caused by the Telegraf version running on the server.
The expression is running fine with version 1.8 but not with version 1.25.
Downgrading to ver. 1.8 solved the issue!

Anaisdg · January 3, 2025, 8:10pm

@jobet thanks for sharing your question and solution with the community! I appreciate it!!

Topic		Replies	Views
Using telegraf to proxy metrics to prometheus is losing metrics? Telegraf prometheus , outputs	2	713	November 9, 2023
Strange behaviour with [inputs.prometheus] with metric_version=2 in telegraf 1.33.0+ Telegraf prometheus , outputs	0	17	June 6, 2025
Telegraf is not reading nginx logs via inputs.tail in Docker Swarm Telegraf prometheus , docker , nginx	1	1642	September 19, 2022
Telegraf exec input make a changing data Telegraf influxdb , prometheus , exec	1	708	January 31, 2022
Missing metrics when proxying them to output.prometheus_client telegraf	5	1803	May 7, 2021

Prometheus Alertmanager not triggering MySQL/Nginx PID alert using Telegraf exporter procstat metric

Related topics