Telegraf fails to write to InfluxDB randomly

biren · August 18, 2017, 8:03pm

Hi, so for some reason Telegraf keeps failing after some random amount of time and has to be restarted.

So what seems to happen is the log fills up with a bunch of “unable to parse” errors due to “bad timestamp”… but it makes no sense, since the data looks good in the log message and on top of that, I’m seeing the data actual end up in Influx. So not sure why I’m getting those… and then after a few minutes, it just dies altogether and nothing makes it into Influx anymore. I get the error E! Error writing to output [influxdb]: Could not write to any InfluxDB server in cluster

I can restart Telegraf and it works fine for a while until it dies again. In my pool of servers it doesn’t seem very consistent (all of the servers are running the same Telegraf config file… the only difference being the hostname that’s set). Any ideas?

I wasn’t having any problems until I introduced this log parser… here’s the configuration for that log parser…

[[inputs.logparser]]
   files = ["/opt/applications/core/current/log/sidekiq.log"]
   from_beginning = false
   [inputs.logparser.grok]
     patterns = ["%{CUSTOM_SIDEKIQ_LOG}"]
     measurement = "sidekiq_log"
     custom_patterns = '''
       THREAD_ID (?:TID-\S+)
       JOB_ID (?:JID-\S+)
       CUSTOM_SIDEKIQ_LOG %{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02T15:04:05.000Z"} %{POSINT:pid:int} %{THREAD_ID:tid} %{NOTSPACE:worker:tag} %{JOB_ID:jid} %{LOGLEVEL:severity:tag}: %{GREEDYDATA:message}
     '''

I’m also seeing this error on some other servers: E! InfluxDB Output Error: Response Error: Status Code [400], expected [204], [partial write: unable to parse 'sidekiq_log, I’m leaving out the rest of the error as it’s details I don’t feel like scrubbing to post haha

daniel · August 18, 2017, 8:22pm

What version of Telegraf is this with?

biren · August 18, 2017, 8:46pm

Sorry, should’ve included that… It’s 1.3.5

daniel · August 18, 2017, 9:15pm

This looks like a bug, can you open a new issue?

biren · August 21, 2017, 1:12pm

Sure, issue opened!

Topic		Replies	Views
Telegraf recover from - or detect - temporary failure Telegraf telegraf	6	2708	December 28, 2017
Telegraf unable be write in InfluxDB cluster influxdb , telegraf	5	2734	December 26, 2018
Support required \|\| (Telegraf/InfluxDB)	2	544	February 1, 2019
Telegraf: Ruby on Rails Log Parse Telegraf influxdb , telegraf	2	1243	August 15, 2017
Unable to write to InfluxDB from Telegraf influxdb , telegraf	1	1967	September 5, 2018

Telegraf fails to write to InfluxDB randomly

Related topics