Hello, everybody!
For a few days I have a problems with graphite output plugin in Telegraf. I am using environment variables of Telegraf, where I specify data for graphite server.
My /etc/default/telegraf is: GRAPHITE_SERVER_ADDRESS='192.168.127.12' GRAPHITE_SERVER_PORT='2023'
Not an expert of graphite but the syntax looks ok, also it seems like telegraf can connect to graphite.
2020-02-05T12:41:18Z D! [agent] Attempting connection to [outputs.graphite]
2020-02-05T12:41:18Z D! [agent] Successfully connected to outputs.graphite
{…}
2020-02-05T12:41:30Z E! [agent] Error writing to outputs.graphite: Could not write to any Graphite server in cluster
It may be related to something else like graphite authentication (a quick search shows the very same error as related to some authentication problems).
You can try to increase the Telegraf logging level by enabling “debug” (if not already enabled) but I’m not sure it will tell you something more.
If possible you should also check the logs on the graphite side.
I forget to say that when I specify server ip and port in config file all works well servers = ["192.168.127.12:2023"]
or even servers = [ "${GRAPHITE_SERVER_ADDRESS}:${GRAPHITE_SERVER_PORT}", "192.168.127.12:2023" ]
So that why I supposed the environment variables aren’t imported into config file.
There are some rows of debug mode for above settings: telegraf --debug
2020-02-05T16:08:09Z I! Starting Telegraf 1.13.1
2020-02-05T16:08:09Z I! Using config file: /etc/telegraf/telegraf.conf
2020-02-05T16:08:09Z I! Loaded inputs: tail
2020-02-05T16:08:09Z I! Loaded aggregators:
2020-02-05T16:08:09Z I! Loaded processors:
2020-02-05T16:08:09Z I! Loaded outputs: graphite
2020-02-05T16:08:09Z I! Tags enabled:
2020-02-05T16:08:09Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"", Flush Interval:10s
2020-02-05T16:08:09Z D! [agent] Initializing plugins
2020-02-05T16:08:09Z D! [agent] Connecting outputs
2020-02-05T16:08:09Z D! [agent] Attempting connection to [outputs.graphite]
2020-02-05T16:08:09Z D! [agent] Successfully connected to outputs.graphite
2020-02-05T16:08:09Z D! [agent] Starting service inputs
2020-02-05T16:08:09Z D! [inputs.tail] Tail added for “/app/src/console/runtime/telegraf-metrics.out”
2020-02-05T16:08:20Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:08:30Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:08:40Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:08:50Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:09:00Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:09:10Z D! [outputs.graphite] Wrote batch of 245 metrics in 18.32672ms
2020-02-05T16:09:10Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:09:20Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:09:30Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:09:40Z D! [outputs.graphite] Wrote batch of 7 metrics in 10.656501ms
2020-02-05T16:09:40Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:09:50Z D! [outputs.graphite] Wrote batch of 3 metrics in 10.489131ms
2020-02-05T16:09:50Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:10:00Z D! [outputs.graphite] Wrote batch of 10 metrics in 10.784975ms
2020-02-05T16:10:00Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:10:10Z D! [outputs.graphite] Wrote batch of 247 metrics in 15.364215ms
2020-02-05T16:10:10Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:10:20Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:10:30Z D! [outputs.graphite] Wrote batch of 9 metrics in 10.70936ms
2020-02-05T16:10:30Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
For roughly a month Telegraf worked well. A couple of days ago I discovered this issue and don’t know how to fix it.
That what’s went on when I tried to see what is substituted from my variables when I added prefix for my metrics [[outputs.graphite]] # servers = ["${GRAPHITE_SERVER_ADDRESS}:${GRAPHITE_SERVER_PORT}"] servers = ["192.168.127.12:2023"] prefix = "${GRAPHITE_SERVER_ADDRESS}:${GRAPHITE_SERVER_PORT}" tagexclude = ["path"] timeout = 2
It seems like Telegraf uses environment variables intermittently
How are you starting Telegraf, are you running it via the systemd service file? Double check the permissions on /etc/default/telegraf as well, when ran with systemd it will run with the telegraf user and group.
How are you starting Telegraf, are you running it via the systemd service file? Double check the permissions on /etc/default/telegraf as well, when ran with systemd it will run with the telegraf user and group.
I run Telegraf using: service telegraf start
Permissions for /etc/default/telegraf are: ls -l /etc/default/ -rw-r--r-- 1 root root 244 Aug 12 2017 supervisor -rw-r--r-- 1 telegraf telegraf 69 Feb 7 07:14 telegraf -rw-r--r-- 1 root root 1118 Jan 25 2018 useradd