Telegraf not sending certain data when running as service

telegraf

#1

Hi,

I’ve just found Telegraf and I’m really impressed! However, I’ve set it up on a pair of Raspberry Pis and I’m seeing odd behaviour.

With identical configuration, one Pi is not sending the netstat metrics to InfluxDB, but the other is. Similarly, the one that is sending netstat metrics is not sending one of my two input.exec metrics, but the first is.

Focusing on the input.exec plugin (because I’ve got a bit more control over that), when I run telegraf on the command line (including running it as the telegraf user to prove there’s no permissions problems) the input.exec plugin data is gathered and posted to InfluxDB. However, when I “systemctl start telegraf.service” the data for that one plug-in isn’t collected/sent. I’ve even run up Wireshark on the InfluxDB and I can see that that one metric isn’t being posted. I’ve added to the script my input.exec calls to ‘tee’ to write to a file as well as returning it’s data, and I can see that, when telegraf is running as a service, the input exec script isn’t being run. I’ve disabled EVERY input plugin except the problematic one and in the log when running as a service I see “I! Loaded inputs: inputs.exec”, but after that I only see "D! Output [influxdb] buffer fullness: 0 / 10000 metrics. "

I’ve enabled debugging/logging to file, and there are no error messages logged about the failing plug-in when running as a service. With no error messages I’m at a bit of a loss as to how to continue.

The two Pis are running the version from the Jessie Apt repository, telegraf v1.2.1-1. The InfluxDB server is running on an Ubuntu 16.04 box, version 1.2.2-1.

Here’s my config for the inputs.exec plugin

[[inputs.exec]]
commands = ["/home/dave/w1_temp/pi_gpu_gather_telegraf.sh"]

name_override = "gpu_temp"

data_format = "value"
data_type = "float" # required
interval = "120s"

And here’s the bash script it’s supposed to be calling (which I’ve proven does work from the command-line as the telegraf user)

#!/bin/bash
vcgencmd measure_temp | sed 's/.*=\(.*\).C/\1/g' | tee -a /tmp/gpu.log

If someone could point out what I’m doing wrong, I’d really appreciate it.


#2

What do you get if you replace your script with this one:

#!/bin/bash
echo "42.0"

Make sure the telegraf user has permissions to execute the script and run Telegraf like: sudo -u /bin/telegraf --config /etc/telegraf/telegraf.conf --config-directory /etc/telegraf/telegraf.d --test.


#3

Hi Daniel,

Thanks for the suggestions. Firstly, in case anyone else stumbles across this post, I’ll just correct the command line you gave which was missing the user, and also has a different path to telegraf than my Rasp-Pi install… so I ran:

sudo -u telegraf /usr/bin/telegraf --config /etc/telegraf/telegraf.conf --config-directory /etc/telegraf/telegraf.d --test

I’d tried this before with no joy BUT your suggestion of changing my script to do “echo 42.0” (which worked) gave my brain the push it needed to realise the cause of the problem.

As I said, the script I was running was

vcgencmd measure_temp | sed 's/.*=\(.*\).C/\1/g' 

What I realised was that my normal user, and the telegraf user on the other working Pi both had vcgencmd in their path, but for the problematic pi, the command was not in the path for the telegraf user. The command wasn’t getting run, but the script’s error message wasn’t go anywhere for me to see. I changed the script to the following, which then worked when telegraf was run as a service:

#!/bin/bash
/opt/vc/bin/vcgencmd measure_temp | sed 's/.*=\(.*\).C/\1/g'

I still need to work out why my other Pi isn’t sending net metrics, but looking closer at the db, it’s sending some, just not things like “bytes_recv” and “bytes_sent”. That clearly isn’t going to be the same issue as my input.exec problem, so I’ll keep digging to see what I can unearth.

Cheers,
–Dave


#4

Hi,
I have run into a similar issue where telegraf is collecting metrics for some services and not collecting metrics for few. Telegraf is also running as a services along with the services for which it is collecting metrics. Have compared the working and non working services. I am unable to find any difference b/w them and yet telegraf is not collection metrics for few of them.

All the services have been defined under /etc/sv (including telegraf service).

input.exec looks as below. metrics is collected for certain client-ids. I even replaced “client-id=exact_client_id_that_is_not_getting_collected”, this gives no output.

[[inputs.jolokia2_agent.metric]]
name = “mm_consumer”
mbean = “kafka.consumer:type=,client-id=
tag_keys = [“client-id”]

Please help


#5

I ran into a similar issue. it turned out my config files in /etc/telegraf/telegraf.d/conf.d/ were owned by root and not the telegraf user. Telegraf should print an error if it cant load files or access them.