Prometheus 2.39.0
Alertmanager 0.24.0
Telegraf 1.20.2
I’m trying to setup an alert to check if MySQL/Nginx are running or not in a remote host.
I’ve setup 2 Prometheus jobs
For MySQL
- job_name: "gm_mysql_pid"
scheme: "https"
tls_config:
insecure_skip_verify: true
static_configs:
- targets:
[
"host1.com:9273",
]
and for Nginx
- job_name: "gm_telegraf_exporter"
scheme: "https"
tls_config:
insecure_skip_verify: true
static_configs:
- targets:
[
"host1.com:9273",
]
Alertmanager configuration
route:
group_by: ["alertname", "group", "instance"]
group_wait: 30s
group_interval: 5m
repeat_interval: 120h
receiver: devops-team
routes:
- match:
group: gm_telegraf_exporter
continue: true
receiver: prometheus-receiver
receivers:
- name: "prometheus-receiver"
slack_configs:
- api_url: https://hooks.slack.com/xxx
channel: "alerts-prometheus"
send_resolved: true
title: '{{ template "custom_title" . }}'
text: '{{ template "custom_slack_message" . }}'
Alert rules configuration
MySQL
- alert: gm_mysql_pid
expr: procstat_lookup_pid_count{job="gm_mysql_pid",pid_finder="",pidfile="/var/run/mysqld/mysqld.pid"} <= 0
labels:
group: "gm_mysql_pid"
annotations:
identifier: "Host: {{$labels.host}}"
description: "Trigger: MySQL service is down!"
Nginx
- alert: gm_nginx_pid
expr: procstat_lookup_pid_count{job="gm_telegraf_exporter",pid_finder="",pidfile="/var/run/nginx.pid"} <= 0
labels:
group: "gm_telegraf_exporter"
annotations:
identifier: "Host: {{$labels.host}}"
description: "Trigger: Nginx service is down!"
Looking into Telegraf metrics I see both metrics:
procstat_lookup_pid_count{host="host1.com",pid_finder="pgrep",pidfile="/var/run/mysqld/mysqld.pid",result="success"} 1
procstat_lookup_pid_count{host="host1.com",pid_finder="pgrep",pidfile="/var/run/nginx.pid",result="success"} 1
The issue is that if I stop both services (MySQL and Nginx) the Alertmanager alert is not being triggered, even if metrics show that both services are down…
procstat_lookup_pid_count{host="host1.com",pid_finder="pgrep",pidfile="",result="lookup_error"} 0
Am I missing something?