Telegraf inputs delayed when one input produces partial writes?

telegraf
#1

In InfluxDB I have a default retention policy set on ‘telegraf’ database like this:

CREATE RETENTION POLICY realtime ON telegraf DURATION 4w REPLICATION 1 DEFAULT

For some reason, I have a process that is emitting a few events per second with timestamps in the past well beyond the 4 week retention policy. Telegraf complains in the syslog with this kind of messages:

E! InfluxDB Output Error: Response Error: Status Code [400], expected [204], [partial write: points beyond retention policy dropped=23]
E! Error writing to output [influxdb]: Could not write to any InfluxDB server in cluster

So far that behaviour is expected. But, while that process is running, all the other metrics values (like cpu for example) reach InfluxDB with a 4 to 5 minutes delay. No value appears to be lost in the end but my chronogaf will show a red status on the host running that process (not to mention a dashboard 5 minutes out of date).

Can an input plugin delay others and why? I’m using http_listener and it delays all the usual plugins (cpu, disk, mem, …).

Can anyone comment on this behaviour?

#2

I posted an issue on GitHub #3144 with an environment to reproduce.
DanielNelson took it from there and there is the pull #3155.

1 Like