I am trying to monitor the size of my InfluxDB wal and data directories using Telegraf and InfluxDB. I am running both Telegraf and InfluxDB on seperate docker containers, and the data/wal directories are stored on an external drive which is just mounted to the host system, and passed through to the container at /mnt/SSD_240GB.
My method for obtaining the directory sizes is to use a simple shell script which runs the disk usage (du) on each of the directories, and run this shell script automatically via the Telegraf exec plug in (this seems to be a pretty common way to do it). The problem is that the directory sizes arriving in InfluxDB aren’t correct.
No, I am using InfluxDB 1.8 , and am not setting any retention policies. The issue is that the data collected by the shell script does not equal the data that is collected when running the script via the exec plug in (which is then sent to InfluxDB).
This will let us see the raw line protocol being produced. My two bets are:
Either there is a strange passing issue going on via the json serializer
Since Telegraf runs as the telegraf user there is a strange permissions issue occurring which is leading to inconsistent results. As you are running the script as root within the container
Will it be a problem that Telegraf runs as the “telegraf” user even though I have changed the permissions of my shell script to all all users to read/write/execute?
When I look at the file’s permissions from inside the container, I can see that it is owned by “root”, but that all users can read/write/execute:
If the permissions of a script are rwx for everyone, it makes no difference who
the owner is, but it does suggest there’s a deeper problem (or at least a
better solution), since doing “chmod 777 xyz” is very very rarely the Right
Thing To Do.
I regard it as the equivalent of mislaying your house keys, so you leave the
front door open all the time as a solution to the problem.
Yes I know - thanks for the comment. I did chmod 777 only in order to try to guarantee that it was not a permissions issue with the file.
It seems more like the problem is that when the Telegraf service runs the shell script it is not able to see the actual size of the mounted directory - only some kind of link to the directory I guess (which it regards as having a size of 4096 bytes).
I think you are correct, this is a permissions issue. Not with the shell script, but instead with the directories that I am trying to run du on within this script (namely /wal and /data). Here you can see the permissions were restricted on those folders:
If I run chmod 777 on these directories, then the correct data starts flowing from Telegraf into InfluxDB:
So my question really is now: how should I set the user/permission without simply doing chmod 777 ?
The owner of the filesystem /mnt/SSD_240GB that is mounted is “root”, but you say that Telegraf runs as the “telegraf” user. I cannot change the owner of /mnt/SSD_240GB to telegraf, because other docker containers (unrelated to Telegraf) need to use it. Do I need to create a user “telegraf” on the host system (i.e. outside of the docker container) , and do something with that?
I saw this blog on the InfluxData website about passing in a user, but not sure if it is what’s needed here.
I am still having trouble with this. I have opened a new issue here.
Can you suggest the recommended way to configure the user / permissions in such a case (I believe it is a common setup) ?
The container is run as non-root user (named tom), and the telegraf process inside the container runs as user telegraf (this is the docker image default).