Telegraf with multiple outputs: If one is down, no one gets the data

Hi there,

I have a few things that don’t play well.

I send data with Telegraf to Multiple Outputs: (production Instance, Test Instance, 2.0 Beta Instance).

Now I have two effects I don’t understand:

  1. If one Output is down for a reason, the other outputs don’t seem to get the Data as well.

  2. There is no Backfill: My understanding was, that as long as metric_buffer_limit isn’t saturated, write requests get buffered until the Output is back again.

That is not the Case. If I shutdown one InfluxDB instance from my output list, all other Instances don’t get data, and a Backfill doesn’t too.

This is how the Telegraf Config looks like:

[[outputs.influxdb]]
urls = [“http://127.0.0.1:8086”]
namedrop = ["_test"]
[outputs.influxdb.tagdrop]
influxdb_database = ["
"]
database = “telegraf”

[[outputs.influxdb]]
urls = [“http://192.168.x.x:8086”]
namedrop = ["_test"]
metric_buffer_limit = 100000
[outputs.influxdb.tagdrop]
influxdb_database = ["
"]
database = “telegraf”

What do I miss here?

Just an update of this:

It happens if one of the Outputs runs into a undefined timeout. If one target output is just shutdown and the server replies “connection refused” the metrics buffer works as intended.

I worked around this by putting an instance of influxdb-relay in between (on the machine where telegraf is running). Timeouts of the desination output are not blocking other outputs anymore.

This should definitely not happen, what version of Telegraf are you using?

1.12.4, happened with 1.12.1 too.

If I gracefully shutdown one output it works as intended, if an output goes away because of an unexpected Network failure (VPN link down), it hangs, and all other outputs hang, too. 100% reproduceable.

Can you run Telegraf with --debug and then collect the logs when you shutdown one of the outputs?

I will do when I get back, will take a few days.