Telegraf failed to start after reboot

I’m running telegraf on a raspberry pi to send metrics to an InfluxDb database. I’ve got it running as a service and it’s works fine, except after a reboot. The service is enabled in systemctl and it tries to start on boot up but fails every time. It starts up and works properly when I do it manually running “sudo systemctl start telegraf”

I’m guessing there is some obvious step I missed settings this up but I haven’t been able to figure out what it is.

pi@raspberrypi:~ $ systemctl status telegraf
● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
Loaded: loaded (/lib/systemd/system/telegraf.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2022-04-24 16:52:30 EDT; 4h 16min ago
Docs: GitHub - influxdata/telegraf: The plugin-driven server agent for collecting & reporting metrics.
Process: 606 ExecStart=/usr/bin/telegraf -config http://**********:8086/api/v2/telegrafs/092490deb132a000 (code=exited, status=1/FAILURE)
Main PID: 606 (code=exited, status=1/FAILURE)
CPU: 263ms

Apr 24 16:52:30 raspberrypi systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 5.
Apr 24 16:52:30 raspberrypi systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
Apr 24 16:52:30 raspberrypi systemd[1]: telegraf.service: Start request repeated too quickly.
Apr 24 16:52:30 raspberrypi systemd[1]: telegraf.service: Failed with result ‘exit-code’.
Apr 24 16:52:30 raspberrypi systemd[1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB.

telegraf.service


[Unit]
Description=The plugin-driven server agent for reporting metrics into InfluxDB
Documentation=https://github.com/influxdata/telegraf
After=network.target

[Service]
Type=notify
EnvironmentFile=-/etc/default/telegraf
PassEnvironment=INFLUX_TOKEN
User=telegraf
ExecStart=/usr/bin/telegraf -config http://********:8086/api/v2/telegrafs/092490deb132a000
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartForceExitStatus=SIGPIPE
KillMode=control-group

[Install]
WantedBy=multi-user.target

Hi,

Can you pull the full logs of the telegraf service so we can see what error message showed up please? Run the following:

journalctl --no-pager -u telegraf
-- Boot ff81dee3937a4db9934efb290f336921 --
Apr 25 21:12:52 raspberrypi systemd[1]: Starting The plugin-driven server agent for reporting metrics into InfluxDB...
Apr 25 21:12:55 raspberrypi telegraf[495]: 2022-04-26T01:12:55Z E! [telegraf] Error running agent: Error loading config file http://influx.mydomain.us:8086/api/v2/telegrafs/092490deb132a000: Retry 0 of 3 failed connecting to HTTP config server Get "http://influx.mydomain.us:8086/api/v2/telegrafs/092490deb132a000": dial tcp: lookup influx.mydomain.us on 75.75.76.76:53: dial udp 75.75.76.76:53: connect: network is unreachable
Apr 25 21:12:55 raspberrypi systemd[1]: telegraf.service: Main process exited, code=exited, status=1/FAILURE
Apr 25 21:12:55 raspberrypi systemd[1]: telegraf.service: Failed with result 'exit-code'.
Apr 25 21:12:55 raspberrypi systemd[1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB.
Apr 25 21:12:56 raspberrypi systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 1.
Apr 25 21:12:56 raspberrypi systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
Apr 25 21:12:56 raspberrypi systemd[1]: Starting The plugin-driven server agent for reporting metrics into InfluxDB...
Apr 25 21:12:56 raspberrypi telegraf[575]: 2022-04-26T01:12:56Z E! [telegraf] Error running agent: Error loading config file http://influx.mydomain.us:8086/api/v2/telegrafs/092490deb132a000: Retry 0 of 3 failed connecting to HTTP config server Get "http://influx.mydomain.us:8086/api/v2/telegrafs/092490deb132a000": dial tcp: lookup influx.mydomain.us on 75.75.76.76:53: dial udp 75.75.76.76:53: connect: network is unreachable
Apr 25 21:12:56 raspberrypi systemd[1]: telegraf.service: Main process exited, code=exited, status=1/FAILURE
Apr 25 21:12:56 raspberrypi systemd[1]: telegraf.service: Failed with result 'exit-code'.
Apr 25 21:12:56 raspberrypi systemd[1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB.
Apr 25 21:12:56 raspberrypi systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 2.
Apr 25 21:12:56 raspberrypi systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
Apr 25 21:12:56 raspberrypi systemd[1]: Starting The plugin-driven server agent for reporting metrics into InfluxDB...
Apr 25 21:12:56 raspberrypi telegraf[583]: 2022-04-26T01:12:56Z E! [telegraf] Error running agent: Error loading config file http://influx.mydomain.us:8086/api/v2/telegrafs/092490deb132a000: Retry 0 of 3 failed connecting to HTTP config server Get "http://influx.mydomain.us:8086/api/v2/telegrafs/092490deb132a000": dial tcp: lookup influx.mydomain.us on 75.75.76.76:53: dial udp 75.75.76.76:53: connect: network is unreachable
Apr 25 21:12:56 raspberrypi systemd[1]: telegraf.service: Main process exited, code=exited, status=1/FAILURE
Apr 25 21:12:56 raspberrypi systemd[1]: telegraf.service: Failed with result 'exit-code'.
Apr 25 21:12:56 raspberrypi systemd[1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB.
Apr 25 21:12:57 raspberrypi systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 3.
Apr 25 21:12:57 raspberrypi systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
Apr 25 21:12:57 raspberrypi systemd[1]: Starting The plugin-driven server agent for reporting metrics into InfluxDB...
Apr 25 21:12:57 raspberrypi telegraf[592]: 2022-04-26T01:12:57Z E! [telegraf] Error running agent: Error loading config file http://influx.mydomain.us:8086/api/v2/telegrafs/092490deb132a000: Retry 0 of 3 failed connecting to HTTP config server Get "http://influx.mydomain.us:8086/api/v2/telegrafs/092490deb132a000": dial tcp: lookup influx.mydomain.us on 75.75.76.76:53: dial udp 75.75.76.76:53: connect: network is unreachable
Apr 25 21:12:57 raspberrypi systemd[1]: telegraf.service: Main process exited, code=exited, status=1/FAILURE
Apr 25 21:12:57 raspberrypi systemd[1]: telegraf.service: Failed with result 'exit-code'.
Apr 25 21:12:57 raspberrypi systemd[1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB.
Apr 25 21:12:57 raspberrypi systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 4.
Apr 25 21:12:57 raspberrypi systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
Apr 25 21:12:57 raspberrypi systemd[1]: Starting The plugin-driven server agent for reporting metrics into InfluxDB...
Apr 25 21:12:57 raspberrypi telegraf[600]: 2022-04-26T01:12:57Z E! [telegraf] Error running agent: Error loading config file http://influx.mydomain.us:8086/api/v2/telegrafs/092490deb132a000: Retry 0 of 3 failed connecting to HTTP config server Get "http://influx.mydomain.us:8086/api/v2/telegrafs/092490deb132a000": dial tcp: lookup influx.mydomain.us on 75.75.76.76:53: dial udp 75.75.76.76:53: connect: network is unreachable
Apr 25 21:12:57 raspberrypi systemd[1]: telegraf.service: Main process exited, code=exited, status=1/FAILURE
Apr 25 21:12:57 raspberrypi systemd[1]: telegraf.service: Failed with result 'exit-code'.
Apr 25 21:12:57 raspberrypi systemd[1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB.
Apr 25 21:12:58 raspberrypi systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 5.
Apr 25 21:12:58 raspberrypi systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
Apr 25 21:12:58 raspberrypi systemd[1]: telegraf.service: Start request repeated too quickly.
Apr 25 21:12:58 raspberrypi systemd[1]: telegraf.service: Failed with result 'exit-code'.
Apr 25 21:12:58 raspberrypi systemd[1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB.
Apr 25 21:18:37 raspberrypi systemd[1]: Starting The plugin-driven server agent for reporting metrics into InfluxDB...
Apr 25 21:18:38 raspberrypi telegraf[868]: 2022-04-26T01:18:38Z I! Starting Telegraf 1.22.0
Apr 25 21:18:38 raspberrypi telegraf[868]: 2022-04-26T01:18:38Z I! Loaded inputs: cpu disk diskio file mem net processes prometheus swap system
Apr 25 21:18:38 raspberrypi telegraf[868]: 2022-04-26T01:18:38Z I! Loaded aggregators:
Apr 25 21:18:38 raspberrypi telegraf[868]: 2022-04-26T01:18:38Z I! Loaded processors:
Apr 25 21:18:38 raspberrypi telegraf[868]: 2022-04-26T01:18:38Z I! Loaded outputs: influxdb_v2
Apr 25 21:18:38 raspberrypi telegraf[868]: 2022-04-26T01:18:38Z I! Tags enabled: host=raspberrypi
Apr 25 21:18:38 raspberrypi telegraf[868]: 2022-04-26T01:18:38Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"raspberrypi", Flush Interval:10s
Apr 25 21:18:38 raspberrypi systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
Apr 25 21:18:53 raspberrypi telegraf[868]: 2022-04-26T01:18:53Z E! [outputs.influxdb_v2] When writing to [http://influx.mydomain.us:8086]: Post "http://influx.mydomain.us:8086/api/v2/write?bucket=Greenhouse&org=Home": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Apr 25 21:18:53 raspberrypi telegraf[868]: 2022-04-26T01:18:53Z E! [agent] Error writing to outputs.influxdb_v2: Post "http://influx.mydomain.us:8086/api/v2/write?bucket=Greenhouse&org=Home": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Apr 25 21:19:03 raspberrypi telegraf[868]: 2022-04-26T01:19:03Z E! [outputs.influxdb_v2] When writing to [http://influx.mydomain.us:8086]: Post "http://influx.mydomain.us:8086/api/v2/write?bucket=Greenhouse&org=Home": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Apr 25 21:19:03 raspberrypi telegraf[868]: 2022-04-26T01:19:03Z E! [agent] Error writing to outputs.influxdb_v2: Post "http://influx.mydomain.us:8086/api/v2/write?bucket=Greenhouse&org=Home": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Apr 25 21:19:13 raspberrypi telegraf[868]: 2022-04-26T01:19:13Z E! [outputs.influxdb_v2] When writing to [http://influx.mydomain.us:8086]: Post "http://influx.mydomain.us:8086/api/v2/write?bucket=Greenhouse&org=Home": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Apr 25 21:19:13 raspberrypi telegraf[868]: 2022-04-26T01:19:13Z E! [agent] Error writing to outputs.influxdb_v2: Post "http://influx.mydomain.us:8086/api/v2/write?bucket=Greenhouse&org=Home": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Apr 25 21:19:23 raspberrypi telegraf[868]: 2022-04-26T01:19:23Z E! [outputs.influxdb_v2] When writing to [http://influx.mydomain.us:8086]: Post "http://influx.mydomain.us:8086/api/v2/write?bucket=Greenhouse&org=Home": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Apr 25 21:19:23 raspberrypi telegraf[868]: 2022-04-26T01:19:23Z E! [agent] Error writing to outputs.influxdb_v2: Post "http://influx.mydomain.us:8086/api/v2/write?bucket=Greenhouse&org=Home": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Apr 25 21:19:33 raspberrypi telegraf[868]: 2022-04-26T01:19:33Z E! [outputs.influxdb_v2] When writing to [http://influx.mydomain.us:8086]: Post "http://influx.mydomain.us:8086/api/v2/write?bucket=Greenhouse&org=Home": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Apr 25 21:19:33 raspberrypi telegraf[868]: 2022-04-26T01:19:33Z E! [agent] Error writing to outputs.influxdb_v2: Post "http://influx.mydomain.us:8086/api/v2/write?bucket=Greenhouse&org=Home": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Apr 25 21:19:43 raspberrypi telegraf[868]: 2022-04-26T01:19:43Z E! [outputs.influxdb_v2] When writing to [http://influx.mydomain.us:8086]: Post "http://influx.mydomain.us:8086/api/v2/write?bucket=Greenhouse&org=Home": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Apr 25 21:19:43 raspberrypi telegraf[868]: 2022-04-26T01:19:43Z E! [agent] Error writing to outputs.influxdb_v2: Post "http://influx.mydomain.us:8086/api/v2/write?bucket=Greenhouse&org=Home": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

There are errors after manually starting the service but I’m seeing data in the db.

Retry 0 of 3 failed connecting to HTTP config server Get “http://influx.mydomain.us:8086/api/v2/telegrafs/092490deb132a000”: dial tcp: lookup influx.mydomain.us on 75.75.76.76:53: dial udp 75.75.76.76:53: connect: network is unreachable

This means your network is not yet up. The systemd service retries a number of times and gives up. When you restart the service it is after the network is up and then things start to work.

Does your network come up automatically? Does it take longer than normal? Based on the hostname, this is a raspberry pi, but I would still expect it to come up fairly quickly.

The network is certainly up when I start it manually since I’m doing it via ssh.
It is a raspberry pi. It’s using wifi with a weak signal, I’m assuming that is causing the delay?
Does the “After=network.target” in the service file not take care of that?
Is there a way to have systemd wait to start telegraf until the network is up or to have it just try more times?
I’m still very much a novice when it comes to linux.

Hmm I think we should be using network-online.target instead. network.target does not mean that IP level configuration has occurred, while network-online.target will. I think the NetworkTarget docs say something similar.

Can you file a bug on the Telegraf GitHub project?

Also if you wanted to test this, I think you could try to do the following:

  1. sudo systemctl edit telegraf
  2. Edit the After=network.target to say After=network-online.target
  3. sudo systemctl daemon-reload
  4. Reboot and see if telegraf comes up as expected

Thanks!

Thanks for the help, I got it to work. I had to also specify Wants as in the article you linked to.

After=network-online.target
Wants=network-online.target

I wonder if network-online.target is only needed if specifying a remote config. I’d image the network not being up yet is a bigger deal if telegraf needs the network to retrieve its config.

Thanks for giving it a shot and confirming the fix with the extra line!

I wonder if network-online.target is only needed if specifying a remote config. I’d image the network not being up yet is a bigger deal if telegraf needs the network to retrieve its config.

I was surprised we have not heard reports about this before. You are right that if you are not getting something from the network right when Telegraf starts the odds of hitting this are diminished. I do think the right thing would be to still change this setting for a better experience in the future.

I’ve put up fix: have telegraf service wait for network up by powersj · Pull Request #11042 · influxdata/telegraf · GitHub with a fix.

Thanks again!