Inputs.tail is not flushing data unless using the --once option

Hi there!

I’ve been working on collecting docker-daemon-logs from an instance and I’ve been doing it using inputs.tail with a grok pattern that fits the ones from dockerd and containerd. I’ve been doing tests using /path/to/telegraf/executable --config /path/to/my/telegraf/configfile --once and occasionally, I’ve added the --debug flag. While I’ve used this and an output to a file it has been working well. but if I leave it to the usual configuration of Telegraf, no log is being sent to Graylog, its final destination. As i said before, if I execute telegraf with the --once option it works as expected and I can see the logs in Graylog.

These are my inputs conf:

[[inputs.tail]]
  name_override = "userdata_logs"
  files = ["/first/file.log"]
  from_beginning = true
  pipe = false
  watch_method = "inotify"
  character_encoding = "utf-8"
  data_format = "grok"
  grok_patterns = ["%{GREEDYDATA:message}"]
  [inputs.tail.tags]
    metric_type = "logs"
[[inputs.tail]]
  name_override = "docker_daemon_journal"
  files = ["/var/log/messages"]
  from_beginning = true
  pipe = false
  watch_method = "inotify"
  character_encoding = "utf-8"
  data_format = "grok"
  grok_patterns = ['%{SYSLOGTIMESTAMP:timestamp}\s*%{SYSLOGHOST:hostname}\s*%{WORD:process}\s*:\s*time="%{TIMESTAMP_ISO8601:log_timestamp}"\s*level=%{LOGLEVEL:log_level}\s*msg="%{GREEDYDATA:message}"']
  [inputs.tail.tags]
    metric_type = "logs"

All other inputs and outputs are working as expected except this two. These are the logs when using --once --debug

023-07-27T22:27:44Z D! [agent] Connecting outputs
2023-07-27T22:27:44Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2023-07-27T22:27:44Z D! [agent] Successfully connected to outputs.influxdb_v2
2023-07-27T22:27:44Z D! [agent] Starting service inputs
2023-07-27T22:27:44Z E! [agent] Starting input inputs.syslog: listen udp :5514: bind: address already in use
2023-07-27T22:27:44Z D! [inputs.tail] Tail added for "/first/file.log"
2023-07-27T22:27:44Z D! [inputs.tail] Tail added for "/var/log/messages"
2023-07-27T22:27:44Z E! [agent] Starting input inputs.influxdb_v2_listener: listen tcp :8186: bind: address already in use
##########
Many lines of "Grok no match found for:" after
##########
2023-07-27T22:27:46Z D! [agent] Stopping service inputs
2023-07-27T22:27:46Z D! [inputs.tail] Tail removed for "/first/file.log"
2023-07-27T22:27:46Z D! [inputs.tail] Tail removed for "/var/log/messages"
2023-07-27T22:27:46Z D! [agent] Input channel closed
2023-07-27T22:27:47Z D! [agent] Processor channel closed
2023-07-27T22:27:48Z D! [agent] Processor channel closed
2023-07-27T22:27:48Z I! [agent] Hang on, flushing any cached metrics before shutdown
2023-07-27T22:27:48Z D! [outputs.influxdb_v2] Wrote batch of 939 metrics in 139.440173ms
2023-07-27T22:27:48Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics

Could you help me with this?

@luroto your log file only shows an InfluxDB output but no graylog one… If you start telegraf with “the usual configuration” are you sure your are using the same config file!? If so, can you please show the full config!?

hi there, thanks for answering!

I’m sorry I forgot to explain about that, I have an telegraf that acts as a proxy and we send the logs through it to Graylog, from my machine we have the following configuration for the output to that telegraf instance:


[[outputs.influxdb_v2]]
  urls = ["http://url.to.my.telegraf.instance:port"]
  token = "telegraf-token-asdf123"
  organization = "primary"
  bucket = "my-awesome-bucket"

this is the graylog output from the telegraf instance that acts as a proxy:

[[outputs.graylog]]
  namepass = [
    "metric1*",
    "metric2*",
    "userdata_logs*",
    "docker_daemon_journal*"
  ]
  servers = [
    "udp://graylog-url:port"
  ]

Sorry I’m a bit confused by the setup and the tiny insights from the configs and logs. So what you are saying is you have a first Telegraf instance A on host hostA that scrapes the log and sends it to a second Telegraf instance B on hostB which then distributes the data to Graylog (among others).

You are further saying that if you use --once on Telegraf A you see the data in Graylog while you don’t see it if you omit the --once. Is this correct?

If so, please check your flush_interval on Telegraf A. Furthermore, you should check the log to see if metrics are flushed at all on the first instance. If all looks good there, try to check what is arriving on Telegraf B

So what you are saying is you have a first Telegraf instance A on host hostA that scrapes the log and sends it to a second Telegraf instance B on hostB which then distributes the data to Graylog (among others).

it’s as you say, also with the --once option, I can see the logs in graylog only when I use it.

Related to the flush_interval on Telegraf A, I can’t see any agent configuration, so I assume that it is using its default values, it is strange as the other inputs (syslog, docker, mem, among others) are flushing as expected but some of them have timeout declared in the telegraf.conf. Maybe the flush_interval is the default one?

I’ve checked from sudo systemctl status telegraf and it is similar to this:
Jul 28 19:39:23 hostname telegraf[21164]: 2023-07-28T19:39:23Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"hostname", Flush Interval:10s

I’ve found my issue! the user that executes telegraf didn’t have access to the tailed files, once I set the proper permissions it worked.