Environment variables aren't imported into Telegraf conf file

Hello, everybody!
For a few days I have a problems with graphite output plugin in Telegraf. I am using environment variables of Telegraf, where I specify data for graphite server.
My /etc/default/telegraf is:
GRAPHITE_SERVER_ADDRESS='192.168.127.12'
GRAPHITE_SERVER_PORT='2023'

My graphite output plugin:
[[outputs.graphite]]
servers = ["${GRAPHITE_SERVER_ADDRESS}:${GRAPHITE_SERVER_PORT}"]
tagexclude = ["path"]
graphite_tag_support = true
timeout = 2

Recently I observed a errors in Telegraf. Bellow is the part of command ‘telegraf --debug’
2020-02-05T12:41:18Z I! Starting Telegraf 1.13.1
2020-02-05T12:41:18Z I! Using config file: /etc/telegraf/telegraf.conf
2020-02-05T12:41:18Z I! Loaded inputs: tail
2020-02-05T12:41:18Z I! Loaded aggregators:
2020-02-05T12:41:18Z I! Loaded processors:
2020-02-05T12:41:18Z I! Loaded outputs: graphite
2020-02-05T12:41:18Z I! Tags enabled:
2020-02-05T12:41:18Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"", Flush Interval:10s
2020-02-05T12:41:18Z D! [agent] Initializing plugins
2020-02-05T12:41:18Z D! [agent] Connecting outputs
2020-02-05T12:41:18Z D! [agent] Attempting connection to [outputs.graphite]
2020-02-05T12:41:18Z D! [agent] Successfully connected to outputs.graphite
2020-02-05T12:41:18Z D! [agent] Starting service inputs
2020-02-05T12:41:18Z D! [inputs.tail] Tail added for “/app/src/console/runtime/telegraf-metrics.out”
2020-02-05T12:41:30Z E! Graphite: Reconnecting and retrying:
2020-02-05T12:41:30Z D! [outputs.graphite] Buffer fullness: 10 / 10000 metrics
2020-02-05T12:41:30Z E! [agent] Error writing to outputs.graphite: Could not write to any Graphite server in cluster
2020-02-05T12:41:40Z E! Graphite: Reconnecting and retrying:
2020-02-05T12:41:40Z D! [outputs.graphite] Buffer fullness: 14 / 10000 metrics
2020-02-05T12:41:40Z E! [agent] Error writing to outputs.graphite: Could not write to any Graphite server in cluster
2020-02-05T12:41:50Z E! Graphite: Reconnecting and retrying:
2020-02-05T12:41:50Z D! [outputs.graphite] Buffer fullness: 14 / 10000 metrics
2020-02-05T12:41:50Z E! [agent] Error writing to outputs.graphite: Could not write to any Graphite server in cluster

I guess that environment variables aren’t imported into config file

Not an expert of graphite but the syntax looks ok, also it seems like telegraf can connect to graphite.

2020-02-05T12:41:18Z D! [agent] Attempting connection to [outputs.graphite]
2020-02-05T12:41:18Z D! [agent] Successfully connected to outputs.graphite
{…}
2020-02-05T12:41:30Z E! [agent] Error writing to outputs.graphite: Could not write to any Graphite server in cluster

It may be related to something else like graphite authentication (a quick search shows the very same error as related to some authentication problems).
You can try to increase the Telegraf logging level by enabling “debug” (if not already enabled) but I’m not sure it will tell you something more.
If possible you should also check the logs on the graphite side.

Does it work when you aren’t using environment variables, and just place the values directly into the configuration file?

I forget to say that when I specify server ip and port in config file all works well
servers = ["192.168.127.12:2023"]
or even
servers = [
"${GRAPHITE_SERVER_ADDRESS}:${GRAPHITE_SERVER_PORT}",
"192.168.127.12:2023"
]

So that why I supposed the environment variables aren’t imported into config file.
There are some rows of debug mode for above settings:
telegraf --debug
2020-02-05T16:08:09Z I! Starting Telegraf 1.13.1
2020-02-05T16:08:09Z I! Using config file: /etc/telegraf/telegraf.conf
2020-02-05T16:08:09Z I! Loaded inputs: tail
2020-02-05T16:08:09Z I! Loaded aggregators:
2020-02-05T16:08:09Z I! Loaded processors:
2020-02-05T16:08:09Z I! Loaded outputs: graphite
2020-02-05T16:08:09Z I! Tags enabled:
2020-02-05T16:08:09Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"", Flush Interval:10s
2020-02-05T16:08:09Z D! [agent] Initializing plugins
2020-02-05T16:08:09Z D! [agent] Connecting outputs
2020-02-05T16:08:09Z D! [agent] Attempting connection to [outputs.graphite]
2020-02-05T16:08:09Z D! [agent] Successfully connected to outputs.graphite
2020-02-05T16:08:09Z D! [agent] Starting service inputs
2020-02-05T16:08:09Z D! [inputs.tail] Tail added for “/app/src/console/runtime/telegraf-metrics.out”
2020-02-05T16:08:20Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:08:30Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:08:40Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:08:50Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:09:00Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:09:10Z D! [outputs.graphite] Wrote batch of 245 metrics in 18.32672ms
2020-02-05T16:09:10Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:09:20Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:09:30Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:09:40Z D! [outputs.graphite] Wrote batch of 7 metrics in 10.656501ms
2020-02-05T16:09:40Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:09:50Z D! [outputs.graphite] Wrote batch of 3 metrics in 10.489131ms
2020-02-05T16:09:50Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:10:00Z D! [outputs.graphite] Wrote batch of 10 metrics in 10.784975ms
2020-02-05T16:10:00Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:10:10Z D! [outputs.graphite] Wrote batch of 247 metrics in 15.364215ms
2020-02-05T16:10:10Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:10:20Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics
2020-02-05T16:10:30Z D! [outputs.graphite] Wrote batch of 9 metrics in 10.70936ms
2020-02-05T16:10:30Z D! [outputs.graphite] Buffer fullness: 0 / 10000 metrics

For roughly a month Telegraf worked well. A couple of days ago I discovered this issue and don’t know how to fix it.

How to print config file with replaced environment variables?

That what’s went on when I tried to see what is substituted from my variables when I added prefix for my metrics
[[outputs.graphite]]
# servers = ["${GRAPHITE_SERVER_ADDRESS}:${GRAPHITE_SERVER_PORT}"]
servers = ["192.168.127.12:2023"]
prefix = "${GRAPHITE_SERVER_ADDRESS}:${GRAPHITE_SERVER_PORT}"
tagexclude = ["path"]
timeout = 2

Screenshot from 2020-02-06 09-46-43

It seems like Telegraf uses environment variables intermittently

How are you starting Telegraf, are you running it via the systemd service file? Double check the permissions on /etc/default/telegraf as well, when ran with systemd it will run with the telegraf user and group.

How are you starting Telegraf, are you running it via the systemd service file? Double check the permissions on /etc/default/telegraf as well, when ran with systemd it will run with the telegraf user and group.

I run Telegraf using:
service telegraf start

Permissions for /etc/default/telegraf are:
ls -l /etc/default/
-rw-r--r-- 1 root root 244 Aug 12 2017 supervisor
-rw-r--r-- 1 telegraf telegraf 69 Feb 7 07:14 telegraf
-rw-r--r-- 1 root root 1118 Jan 25 2018 useradd

I don’t see anything that looks off. Perhaps it will be helpful to check for the environment variables while Telegraf is running:

cat /proc/$(pgrep telegraf)/environ | tr '\0' '\n' | grep GRAPHITE