Telegraf is not reading nginx logs via inputs.tail in Docker Swarm

Hi everyone! I’m trying to collect & parse Nginx logs with Telegraf in Docker Swarm, but I’m facing some strange problem - Telegraf is not reading Nginx logs and does not throw any errors, so I can’t understand, what went wrong. If somebody could help me with this, it will be amazing!

Previously I created similar setup in Docker Compose. I used the same config for Telegraf and log format for Nginx. Everything worked just fine. For Docker Swarm I changed one thing - in Docker Compose I used folder binding to persist & share Nginx logs between Nginx and Telegraf containers, now in Docker Swarm I am using docker local volume for that. I also checked that both containers have an access to log file - Nginx is writing logs as expected and from Telegraf container I can tail this file and read logs.

When I do some requests, I see logs appeared in log file, but I open my Grafana dashboard and see no information about requests. I can see information about Nginx status and resource usage so Prometheus can pull data from Telegraf and Grafana is able to get data from Prometheus.
I tried to curl localhost:9100/metrics from Telegraf container to see if there are some metrics for Nginx log. There were other metrics, but nothing for Nginx log.

My Telegraf config:

  interval = "15s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "15s"
  flush_jitter = "0s"
  precision = "15s"

   urls = ["http://gateway:8080/nginx_status"]
   response_timeout = "5s"

  name_override = "nginxlog"
  files = ["/var/log/nginx/access-telegraf.log"]
  from_beginning = false
  pipe = false
  watch_method = "inotify"
  data_format = "grok"
  grok_patterns = ["%{COMBINED_LOG_FORMAT} %{NUMBER:request_time:float} %{NUMBER:upstream_response_time:float}"]

  percpu = true

  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]





  listen = "telegraf:9100"

My Nginx log config:

    log_format main '$http_x_real_ip - $remote_user [$time_local] '
		'"$request" $status $body_bytes_sent '
		'"$http_referer" "$http_user_agent" '
		'$request_time $upstream_response_time $pipe';

    access_log  /var/log/nginx/access-telegraf.log main;

Telegraf logs on startup:

2022-09-17T03:12:13Z I! Using config file: /etc/telegraf/telegraf.conf
2022-09-17T03:12:13Z I! Starting Telegraf 1.24.0
2022-09-17T03:12:13Z I! Available plugins: 222 inputs, 9 aggregators, 26 processors, 20 parsers, 57 outputs
2022-09-17T03:12:13Z I! Loaded inputs: cpu disk diskio mem net nginx system tail
2022-09-17T03:12:13Z I! Loaded aggregators: 
2022-09-17T03:12:13Z I! Loaded processors: 
2022-09-17T03:12:13Z I! Loaded outputs: prometheus_client
2022-09-17T03:12:13Z I! Tags enabled: host=telegraf
2022-09-17T03:12:13Z I! [agent] Config: Interval:15s, Quiet:false, Hostname:"telegraf", Flush Interval:15s
2022-09-17T03:12:13Z I! [outputs.prometheus_client] Listening on

As I can see from Telegraf logs, it is loading tail input but for some reason it is ignoring access log file.

After all I solved this by myself
Providing the solution if somebody will need this

I enabled debug logging in telegraf with --debug flag and saw the messages that grok can’t find pattern matches in logs. The reason was that there was no IP address of client, because of using $http_x_real_ip instead of the $remote_addr. Previously in Docker Compose it worked because this Nginx instance was behind another proxy, which was passing X-Real-IP header.

Also, to get real IP of client instead of Docker Swarm LB IP, I configured port forwarding in stack file like this.

      - mode: host
        target: 80
        published: 80
      - mode: host
        target: 443
        published: 443

I wish it will help somebody!

1 Like