Telegraf not able to write its state in statefile in file input plugin

Hi, I am using file input plugin. After restarting the telegraf service all files are getting read duplicately.
I found a way using statefile configurations in agent tab.
But telegraf is not writing its state or checkpoint till where it read data from files in statefile.
Am I using the correct configuration, please suggest me solution for this
Configuration:
Telegraf.conf

[agent]
  collection_jitter = "0s"
  debug = true
  flush_interval = "60s"
  flush_jitter = "0s"
  interval = "60s"
  logfile = ""
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  omit_hostname = false
  precision = ""
  quiet = false
  round_interval = true
  hostname = "dev"
  statefile = "/usr/share/telegraf/data/statefile"

InputFile.conf

[[inputs.file]]
  alias = "lcmfile"
  files = ["/var/log/containers/lcm-service*"]
  data_format = "grok"
  grok_patterns = ['{"log":"%{GREEDYDATA:log_message}","stream":"%{WORD:stream}","time":"%{TIMESTAMP_ISO8601:timestamp}"}']
  #grok_patterns = ['%{TIMESTAMP_ISO8601:log_timestamp} %{DATA:log_source} %{WORD:log_level} %{GREEDYDATA:log}']
  tags = {host_ip="$HOSTNAME",lcm-log=""}
  file_tag = "filepath"

[[processors.regex]]
  namepass = ["file"]
  tagpass = ["lcm-log"]
  alias = "regex_lcm"
  [[processors.regex.tags]]
    key = "filepath"
    pattern = "^(.*?)_.*?log"
    replacement = "${1}"
    result_key = "podname"
    append = false

[[outputs.elasticsearch]]
  urls = ["http://opensearch-master:9200"]
  index_name = "{{podname}}-%Y.%m.%d"
  username = "admin"
  password = "admin"
  namepass = ["file"]
  metric_batch_size = 100
  tagpass = ["lcm-log"]

Logs:

2023-10-26T12:41:07Z I! Loading config: /etc/telegraf/telegraf.conf
2023-10-26T12:41:07Z I! Loading config: /etc/telegraf/telegraf.d/baremetal-metrics.conf
2023-10-26T12:41:07Z I! Loading config: /etc/telegraf/telegraf.d/bmc-log.conf
2023-10-26T12:41:07Z I! Loading config: /etc/telegraf/telegraf.d/lcm-log.conf
2023-10-26T12:41:07Z I! Loading config: /etc/telegraf/telegraf.d/telegraf.conf
2023-10-26T12:41:07Z I! Starting Telegraf 1.28.2 brought to you by InfluxData the makers of InfluxDB
2023-10-26T12:41:07Z I! Available plugins: 240 inputs, 9 aggregators, 29 processors, 24 parsers, 59 outputs, 5 secret-stores
2023-10-26T12:41:07Z I! Loaded inputs: cpu disk diskio file (5x) kernel mem net processes swap system
2023-10-26T12:41:07Z I! Loaded aggregators: 
2023-10-26T12:41:07Z I! Loaded processors: regex
2023-10-26T12:41:07Z I! Loaded secretstores: 
2023-10-26T12:41:07Z I! Loaded outputs: elasticsearch (2x) prometheus_client
2023-10-26T12:41:07Z I! Tags enabled: host=dev
2023-10-26T12:41:07Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"dev", Flush Interval:1m0s
2023-10-26T12:41:07Z D! [agent] Initializing plugins
2023-10-26T12:41:07Z W! DeprecationWarning: Value "false" for option "ignore_protocol_stats" of plugin "inputs.net" deprecated since version 1.27.3 and will be removed in 1.36.0: use the 'inputs.nstat' plugin instead
2023-10-26T12:41:07Z D! [agent] Initializing plugin states
2023-10-26T12:41:07Z D! [agent] Connecting outputs
2023-10-26T12:41:07Z D! [agent] Attempting connection to [outputs.prometheus_client]
2023-10-26T12:41:07Z I! [outputs.prometheus_client] Listening on http://[::]:9273/metrics
2023-10-26T12:41:07Z D! [agent] Successfully connected to outputs.prometheus_client
2023-10-26T12:41:07Z D! [agent] Attempting connection to [outputs.elasticsearch]

Please help me to understand the problem and provide me with correct configurations

1 Like

Hello @Vivek_parody,
I apologize I’m not quite sure I understand the problem I don’t see any errors in the logs.

The file plugin parses the complete contents of a file. It doesn’t have any checkpoints.

Perhaps tail input plugin is more what you’re looking to use?

Thanks @Anaisdg for the respons,
Yes I can use the tail plugin but I think tail plugin has one limitation that it does not read the files which creates after telegraf service get up. That is why I am using file input plugin. Is there any solution for tail plugin for this issue ?

@Vivek_parody not sure I follow what your goal is? Are you trying to read new files as they come in? If so I would look at the directory monitor plugin.

Hey @jpowers, My goal is to read the kubernetes pod logs which writes on the directory /var/log/container/* . So if I use file plugin then its not maintaining the state. If I use tail plugin then its not reading the new file after the telegraf service gets up.
If you suggesting to use directory monitor plugin then will this plugin solve the above challenges?

I am still not clear why you are trying to use the state file. Are you constantly reloading telegraf?

So if I use file plugin then its not maintaining the state.

The file plugin reads the entire file at every collection interval. When reading logs this is
probably not what you want. It does not save state.

The tail plugin will effectively tail a file. Reading it from the beginning at first, and then each collection interval will only read new lines. This does save state, when the plugin is shutdown safely. A sudden shutdown will not save state.

The directory monitor plugin will monitor files as they come in, process them, and move on. I don’t believe this plugin monitors state as well.

Each have their own use-cases, even though they are very similar.

Thanks for your response @jpowers ,
For any cases telegraf may be down eg. For OOM killed or suppose need to update any config in the telegraf so need to restart the telegraf instance. I think saving the state is the basic functionality it should provide.
Since tail plugin does not read the new file I can not use it either.
Can I request any way to add the save state functionality in the telegraf file input plugin

1 Like

I think you are looking for similar to filebeat/fluentbit to read logs.

Yes exactly, an alternative for filebeat/fluentbit

A restart would save state as we have time to clean up. An OOM, is not in scope. That would be a sudden shutdown where Telegraf gets killed and is not a scenario we are after supporting.

Since tail plugin does not read the new file I can not use it either.

Tail reads new files at each gather interval.

Hi @jpowers,

A restart would save state as we have time to clean up. An OOM, is not in scope. That would be a sudden shutdown where Telegraf gets killed and is not a scenario we are after supporting.

I have deployed telegraf in kubernetes env as a pod. So when I am deleting the pod/container its not saving its state in the statefile. I have cross checked it manier times. May be telegraf does consider it as a sudden shutdown.

Tail reads new files at each gather interval.

Can you please provide me configurations for gather interval

That is probably the case

Can you please provide me configurations for gather interval

By default this is every 10 seconds. Otherwise it can be set in the agent settings:

[agent]
   interval = "10s"

That is probably the case

@jpowers Okay so I can not use file plugin for my case

Sorry for asking the same thing again but just for the confirmation, are you sure that telegraf tail plugin read the new files after it gets up because I have tried tail plugin its not reading the new file.

The gather function is called at each interval. Which will go and look for new files:

The tailNewFiles will glob based on your regex you provided. One other item is if you are on windows the watch_method may need to be poll.

Thanks @jpowers , Tail plugin is able to solve my problem as its writing state in statefile and also tailing the new files as well

1 Like