I’m running telegraf on a raspberry pi to send metrics to an InfluxDb database. I’ve got it running as a service and it’s works fine, except after a reboot. The service is enabled in systemctl and it tries to start on boot up but fails every time. It starts up and works properly when I do it manually running “sudo systemctl start telegraf”
I’m guessing there is some obvious step I missed settings this up but I haven’t been able to figure out what it is.
pi@raspberrypi:~ $ systemctl status telegraf
● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
Loaded: loaded (/lib/systemd/system/telegraf.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2022-04-24 16:52:30 EDT; 4h 16min ago
Docs: GitHub - influxdata/telegraf: The plugin-driven server agent for collecting & reporting metrics.
Process: 606 ExecStart=/usr/bin/telegraf -config http://**********:8086/api/v2/telegrafs/092490deb132a000 (code=exited, status=1/FAILURE)
Main PID: 606 (code=exited, status=1/FAILURE)
CPU: 263ms
Apr 24 16:52:30 raspberrypi systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 5.
Apr 24 16:52:30 raspberrypi systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
Apr 24 16:52:30 raspberrypi systemd[1]: telegraf.service: Start request repeated too quickly.
Apr 24 16:52:30 raspberrypi systemd[1]: telegraf.service: Failed with result ‘exit-code’.
Apr 24 16:52:30 raspberrypi systemd[1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB.
telegraf.service
[Unit]
Description=The plugin-driven server agent for reporting metrics into InfluxDB
Documentation=https://github.com/influxdata/telegraf
After=network.target
[Service]
Type=notify
EnvironmentFile=-/etc/default/telegraf
PassEnvironment=INFLUX_TOKEN
User=telegraf
ExecStart=/usr/bin/telegraf -config http://********:8086/api/v2/telegrafs/092490deb132a000
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartForceExitStatus=SIGPIPE
KillMode=control-group
[Install]
WantedBy=multi-user.target
This means your network is not yet up. The systemd service retries a number of times and gives up. When you restart the service it is after the network is up and then things start to work.
Does your network come up automatically? Does it take longer than normal? Based on the hostname, this is a raspberry pi, but I would still expect it to come up fairly quickly.
The network is certainly up when I start it manually since I’m doing it via ssh.
It is a raspberry pi. It’s using wifi with a weak signal, I’m assuming that is causing the delay?
Does the “After=network.target” in the service file not take care of that?
Is there a way to have systemd wait to start telegraf until the network is up or to have it just try more times?
I’m still very much a novice when it comes to linux.
Hmm I think we should be using network-online.target instead. network.target does not mean that IP level configuration has occurred, while network-online.target will. I think the NetworkTarget docs say something similar.
Can you file a bug on the Telegraf GitHub project?
I wonder if network-online.target is only needed if specifying a remote config. I’d image the network not being up yet is a bigger deal if telegraf needs the network to retrieve its config.
Thanks for giving it a shot and confirming the fix with the extra line!
I wonder if network-online.target is only needed if specifying a remote config. I’d image the network not being up yet is a bigger deal if telegraf needs the network to retrieve its config.
I was surprised we have not heard reports about this before. You are right that if you are not getting something from the network right when Telegraf starts the odds of hitting this are diminished. I do think the right thing would be to still change this setting for a better experience in the future.