I had this working previously on CentOS 7 (unsure which telegraf or InfluxDB version). It ran successfully for years (I have the data to prove it!). Recently we migrated everything to Ubuntu 20.04.3 LTS (focal) with telegraf 1.21.2 + InfluxDB 1.8.10. I’ve been re-installing our telegraf inputs and the majority have been no trouble --except this one. The trouble I’m having is all troubleshooting and debugging is showing a functional input configuration with generation of nice, happy data. Yet when I start telegraf, assuming it’ll process this input without issue, it never writes it’s data! Here are the steps I’ve taken:
[[inputs.exec]]
interval = "10m"
timeout = "900s"
commands = ["/usr/local/bin/speedtest-wrapper"]
data_format = "json"
name_suffix = "_speedtest"
json_string_fields = ["server_name","server_url","server_host"]
Here are the permissions on the wrapper:
# ls -ld /usr/local/bin/speedtest-wrapper
-rwxr-xr-x 1 root root 168 Jan 21 11:39 /usr/local/bin/speedtest-wrapper*
The contents of the script:
cat /usr/local/bin/speedtest-wrapper
#!/bin/bash
echo `date` >>/tmp/telegraf_date
# sleeptime=$(($RANDOM%1200))
# sleep $sleeptime
/bin/speedtest-cli --json
exit 0
Here’s an execution of the wrapper as user telegraf
:
# sudo -u telegraf time /usr/local/bin/speedtest-wrapper
{"download": 23777119.54694799, "upload": 14309538.638977839, "ping": 1800000.0, "server": {"url": "http://us.bgp.nkeo.to
p:8080/speedtest/upload.php", "lat": "99.982", "lon": "-102.363", "name": "San Jose, CA", "country": "United States", "cc
": "US", "sponsor": "Neko Neko Cloud", "id": "46047", "host": "us.bgp.nkeo.top:8080", "d": 70.43544359346778, "latency":
1800000.0}, "timestamp": "2022-01-22T01:14:36.827956Z", "bytes_sent": 20226048, "bytes_received": 29808112, "share": null
, "client": {"ip": "1.1.199.118", "lat": "37.7562", "lon": "-122.4866", "isp": "asdf.net, LLC", "isprating": "3.7", "rati
ng": "0", "ispdlavg": "0", "ispulavg": "0", "loggedin": "0", "country": "US"}}
real 3m9.538s
user 0m1.202s
sys 0m1.402s
Here is a --test
run of the defined input as user telegraf
# sudo -u telegraf telegraf --config /etc/telegraf/telegraf.d/exec.conf --config /etc/telegraf/telegraf.conf --test
2022-01-22T01:30:31Z I! Starting Telegraf 1.21.2
> exec_speedtest,host=proxy01 bytes_received=24176112,bytes_sent=17063936,download=18691198.516819958,ping=1800000,server
_d=49.56358932433334,server_host="speedtest.baynic.net:8080",server_latency=1800000,server_name="Fremont, CA",server_url=
"http://speedtest.baynic.net:8080/speedtest/upload.php",upload=12089092.321201608 1642811874000000000
Now without --test
sudo -u telegraf telegraf --config /etc/telegraf/telegraf.d/exec.conf --config /etc/telegraf/telegraf.conf
2022-01-22T01:34:22Z I! Starting Telegraf 1.21.2
I can see the execution of speedtest-cli at the defined interval…
cat /tmp/telegraf_date
Fri 21 Jan 2022 17:40:02 PST
ps auwx | grep speed
telegraf 982220 1.0 1.1 29548 23776 pts/2 S+ 17:40 0:00 /usr/bin/python3 /bin/speedtest-cli --json
A few minutes later I can see data has successfully been written:
telegraf.log:
2022-01-22T01:43:14Z D! [outputs.influxdb] Wrote batch of 1 metrics in 11.641873ms
influxdb.log:
Jan 21 17:43:14 influx001 influxd-systemd-start.sh[1959236]: [httpd] 192.168.20.62 - influx_telegraf [21/Jan/2022:17:43:1
4 -0800] "POST /write?db=telegraf HTTP/1.1 " 204 0 "-" "Telegraf/1.21.2 Go/1.17.5" aa4fe635-7b24-11ec-a431-10c37b4d9415 9
434
influx:
> select * from exec_speedtest ORDER BY time DESC
2022-01-22T01:43:12Z 28947952 19865600 22817584.22876692 proxy01 1800000 19.662641598786678 speedtest.open ...
At this point I assume everything is fine and I start telegraf normally with systemctl start telegraf. This is what the logs show:
2022-01-22T02:04:57Z I! Loaded inputs: exec
2022-01-22T02:04:57Z I! Loaded aggregators:
2022-01-22T02:04:57Z I! Loaded processors:
2022-01-22T02:04:57Z I! Loaded outputs: influxdb
2022-01-22T02:04:57Z I! Tags enabled: host=proxy01
2022-01-22T02:04:57Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"proxy01", Flush Interval:10s
2022-01-22T02:04:57Z D! [agent] Initializing plugins
2022-01-22T02:04:57Z D! [agent] Connecting outputs
2022-01-22T02:04:57Z D! [agent] Attempting connection to [outputs.influxdb]
2022-01-22T02:04:59Z W! [outputs.influxdb] When writing to [https://influxdb.fgh.net:8086]: database "telegraf" creation
failed: 403 Forbidden
2022-01-22T02:04:59Z D! [agent] Successfully connected to outputs.influxdb
2022-01-22T02:04:59Z D! [agent] Starting service inputs
2022-01-22T02:05:09Z D! [outputs.influxdb] Buffer fullness: 0 / 10000 metrics
2022-01-22T02:05:19Z D! [outputs.influxdb] Buffer fullness: 0 / 10000 metrics
...
2022-01-22T02:31:49Z D! [outputs.influxdb] Buffer fullness: 0 / 10000 metrics
…and this just carries on indefinitely. I would expect something other than 0 / 10000 metrics
roughly every 10-15 minutes but no data is ever written and nothing appears in the logs showing an issue.
Please help me determine what is going on here. This input could definitely use more messaging when debug is enabled!