Update from 1.24.4 to 1.25 breaks output to graphite

After my update from telegraf 1.24.4 to 1.25, the agent was no longer able to send metrics to our graphite instance (running on docker).
The message being show in our logs is simply:
2023-01-03T14:28:30Z E! [agent] Error writing to outputs.graphite: could not write to any Graphite server in cluster

I was able to fix it by reverting to 1.24.4.

Please advise.

Can you give a short example in order to quickly reproduce this?

Hi @jpowers, my setup is very straightforward. I’m currently simply running telegraf 1.24.4, on an AWS EC2 instance, running Amazon Linux 2. Graphite is running on the same instance, in a docker container.

6ff757d94a56   graphiteapp/graphite-statsd:1.1.10-1   "/entrypoint"            13 days ago     Up 13 days     0.0.0.0:2003-2004->2003-2004/tcp, :::2003-2004->2003-2004/tcp, 2013-2014/tcp, 127.0.0.1:2023-2024->2023-2024/tcp, 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp, 8125/tcp, 127.0.0.1:8126->8126/tcp, 0.0.0.0:8125->8125/udp, :::8125->8125/udp, 0.0.0.0:8800->80/tcp, :::8800->80/tcp, 0.0.0.0:8443->443/tcp, :::8443->443/tcp   graphite.service

The relevant part of my telegraf config

[[outputs.graphite]]
  ## TCP endpoint for your graphite instance.
  ## If multiple endpoints are configured, output will be load balanced.
  ## Only one of the endpoints will be written to with each iteration.
  servers = ["192.168.0.108:2003"] # dash on VPC
  ## Prefix metrics name
  prefix = "telegraf"
  ## Graphite output template
  ## see https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md
  template = "host.tags.measurement.field"

  ## Enable Graphite tags support
  #graphite_tag_support = true

  ## timeout in seconds for the write connection to graphite
  timeout = 2

As stated before, this works with telegraf 1.24.4. As soon as I upgrade to 1.25, the telegraf agent starts complaining.

A quick look at the diff shows no changes to that plugin.

I ran graphite with this docker command:

docker run -p 2003:2003 -p 80:80 graphiteapp/graphite-statsd:1.1.10-1

And validated I could connect to port 80 and 2003:

❯ telnet localhost 80
Trying ::1...
Connected to localhost.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
telegraf on  master [?] via 🐹 v1.19.5 took 2s 
❯ telnet localhost 2003
Trying ::1...
Connected to localhost.
Escape character is '^]'.
^]
telnet> quit
Connection closed.

And this telegraf config:

[[outputs.graphite]]
  servers = ["127.0.0.1:2003"]
  prefix = "telegraf"
  template = "host.tags.measurement.field"
  timeout = 2

[[inputs.cpu]]
./telegraf --config config.toml --debug
2023-01-17T15:01:56Z I! Starting Telegraf 1.26.0-a586101d
2023-01-17T15:01:56Z I! Available plugins: 228 inputs, 9 aggregators, 26 processors, 21 parsers, 57 outputs, 2 secret-stores
2023-01-17T15:01:56Z I! Loaded inputs: cpu
2023-01-17T15:01:56Z I! Loaded aggregators: 
2023-01-17T15:01:56Z I! Loaded processors: 
2023-01-17T15:01:56Z I! Loaded secretstores: 
2023-01-17T15:01:56Z I! Loaded outputs: graphite
2023-01-17T15:01:56Z I! Tags enabled: host=ryzen
2023-01-17T15:01:56Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"ryzen", Flush Interval:10s
2023-01-17T15:01:56Z D! [agent] Initializing plugins
2023-01-17T15:01:56Z D! [agent] Connecting outputs
2023-01-17T15:01:56Z D! [agent] Attempting connection to [outputs.graphite]
2023-01-17T15:01:56Z D! [agent] Successfully connected to outputs.graphite
2023-01-17T15:01:56Z D! [agent] Starting service inputs
2023-01-17T15:02:06Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-01-17T15:02:16Z D! [outputs.graphite] Wrote batch of 33 metrics in 11.531671ms
2023-01-17T15:02:16Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-01-17T15:02:26Z D! [outputs.graphite] Wrote batch of 33 metrics in 12.392485ms
2023-01-17T15:02:26Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-01-17T15:02:36Z D! [outputs.graphite] Wrote batch of 33 metrics in 11.32413ms
2023-01-17T15:02:36Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-01-17T15:02:46Z D! [outputs.graphite] Wrote batch of 33 metrics in 10.980473ms
2023-01-17T15:02:46Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-01-17T15:02:56Z D! [outputs.graphite] Wrote batch of 33 metrics in 11.358861ms
2023-01-17T15:02:56Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2023-01-17T15:03:06Z D! [outputs.graphite] Wrote batch of 33 metrics in 11.48569ms
2023-01-17T15:03:06Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics

2023-01-03T14:28:30Z E! [agent] Error writing to outputs.graphite: could not write to any Graphite server in cluster

Can you provide more logs than the one error message? Did this work at all on v1.25.0 and then just stop?

No, it did never work on 1.25.x. I moved from 1.24 to 1.25 and it stopped working.

OK, this is weird. I just cleared my versionlock, upgraded telegraf to 1.25.0.1 again … and it now starts working as if nothing happened. I will need to check my other machines as well, before crying victory.

I have updated all the other instances with 1.25.0-1, and it appears that all is well.

Gremlins…