Telegraf fails to write to InfluxDB randomly

Hi, so for some reason Telegraf keeps failing after some random amount of time and has to be restarted.

So what seems to happen is the log fills up with a bunch of “unable to parse” errors due to “bad timestamp”… but it makes no sense, since the data looks good in the log message and on top of that, I’m seeing the data actual end up in Influx. So not sure why I’m getting those… and then after a few minutes, it just dies altogether and nothing makes it into Influx anymore. I get the error E! Error writing to output [influxdb]: Could not write to any InfluxDB server in cluster

I can restart Telegraf and it works fine for a while until it dies again. In my pool of servers it doesn’t seem very consistent (all of the servers are running the same Telegraf config file… the only difference being the hostname that’s set). Any ideas?

I wasn’t having any problems until I introduced this log parser… here’s the configuration for that log parser…

[[inputs.logparser]]
   files = ["/opt/applications/core/current/log/sidekiq.log"]
   from_beginning = false
   [inputs.logparser.grok]
     patterns = ["%{CUSTOM_SIDEKIQ_LOG}"]
     measurement = "sidekiq_log"
     custom_patterns = '''
       THREAD_ID (?:TID-\S+)
       JOB_ID (?:JID-\S+)
       CUSTOM_SIDEKIQ_LOG %{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02T15:04:05.000Z"} %{POSINT:pid:int} %{THREAD_ID:tid} %{NOTSPACE:worker:tag} %{JOB_ID:jid} %{LOGLEVEL:severity:tag}: %{GREEDYDATA:message}
     '''

I’m also seeing this error on some other servers: E! InfluxDB Output Error: Response Error: Status Code [400], expected [204], [partial write: unable to parse 'sidekiq_log, I’m leaving out the rest of the error as it’s details I don’t feel like scrubbing to post haha

What version of Telegraf is this with?

Sorry, should’ve included that… It’s 1.3.5

This looks like a bug, can you open a new issue?

Sure, issue opened!