Hi everyone! I’m trying to collect & parse Nginx logs with Telegraf in Docker Swarm, but I’m facing some strange problem - Telegraf is not reading Nginx logs and does not throw any errors, so I can’t understand, what went wrong. If somebody could help me with this, it will be amazing!
Previously I created similar setup in Docker Compose. I used the same config for Telegraf and log format for Nginx. Everything worked just fine. For Docker Swarm I changed one thing - in Docker Compose I used folder binding to persist & share Nginx logs between Nginx and Telegraf containers, now in Docker Swarm I am using docker local volume for that. I also checked that both containers have an access to log file - Nginx is writing logs as expected and from Telegraf container I can tail this file and read logs.
When I do some requests, I see logs appeared in log file, but I open my Grafana dashboard and see no information about requests. I can see information about Nginx status and resource usage so Prometheus can pull data from Telegraf and Grafana is able to get data from Prometheus.
I tried to curl localhost:9100/metrics from Telegraf container to see if there are some metrics for Nginx log. There were other metrics, but nothing for Nginx log.
My Telegraf config:
[agent]
interval = "15s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "15s"
flush_jitter = "0s"
precision = "15s"
[[inputs.nginx]]
urls = ["http://gateway:8080/nginx_status"]
response_timeout = "5s"
[[inputs.tail]]
name_override = "nginxlog"
files = ["/var/log/nginx/access-telegraf.log"]
from_beginning = false
pipe = false
watch_method = "inotify"
data_format = "grok"
grok_patterns = ["%{COMBINED_LOG_FORMAT} %{NUMBER:request_time:float} %{NUMBER:upstream_response_time:float}"]
[[inputs.cpu]]
percpu = true
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.net]]
[[inputs.mem]]
[[inputs.system]]
[[outputs.prometheus_client]]
listen = "telegraf:9100"
My Nginx log config:
log_format main '$http_x_real_ip - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'$request_time $upstream_response_time $pipe';
access_log /var/log/nginx/access-telegraf.log main;
Telegraf logs on startup:
2022-09-17T03:12:13Z I! Using config file: /etc/telegraf/telegraf.conf
2022-09-17T03:12:13Z I! Starting Telegraf 1.24.0
2022-09-17T03:12:13Z I! Available plugins: 222 inputs, 9 aggregators, 26 processors, 20 parsers, 57 outputs
2022-09-17T03:12:13Z I! Loaded inputs: cpu disk diskio mem net nginx system tail
2022-09-17T03:12:13Z I! Loaded aggregators:
2022-09-17T03:12:13Z I! Loaded processors:
2022-09-17T03:12:13Z I! Loaded outputs: prometheus_client
2022-09-17T03:12:13Z I! Tags enabled: domain=test-avito-swarm-so.liis.su host=telegraf
2022-09-17T03:12:13Z I! [agent] Config: Interval:15s, Quiet:false, Hostname:"telegraf", Flush Interval:15s
2022-09-17T03:12:13Z I! [outputs.prometheus_client] Listening on http://10.0.44.240:9100/metrics
As I can see from Telegraf logs, it is loading tail input but for some reason it is ignoring access log file.