Telegraf disk plugin inconsistencies

Hi all – i’ve encountered a weird issue with telegraf (not sure if it is a bug or not, i figured i’d start here and then go to github if necessary).

i have a bunch of LVM disks, and they are named the format of prod_o_d or dev_o_d. what i have found is that the ones that start with dev will use the /dev/mapper device, but the ones that start with “prod” will use the dm-# device. I have this issue on multiple systems (dev and prod both build with the same image). Additionally i’ve done some more testing, and we have some other systems that have mapper names that start with trn/snd/util and those also only have the dm-# devices and not the mapper ones. I have noticed this behavior on both rhel7 and rhel8.

here is the telegraf -config /etc/telegraf/telegraf.conf -input-filter disk -test output for a prod and a dev system and attached is my telegraf.conf
Dev (working as expected):

2023-01-27T23:05:24Z I! Starting Telegraf 1.25.0
2023-01-27T23:05:24Z I! Available plugins: 228 inputs, 9 aggregators, 26 processors, 21 parsers, 57 outputs, 2 secret-stores
2023-01-27T23:05:24Z I! Loaded inputs: disk
2023-01-27T23:05:24Z I! Loaded aggregators: 
2023-01-27T23:05:24Z I! Loaded processors: 
2023-01-27T23:05:24Z I! Loaded secretstores: 
2023-01-27T23:05:24Z W! Outputs are not used in testing mode!
2023-01-27T23:05:24Z I! Tags enabled: host=vs-dentdev
> disk,device=dm-0,fstype=xfs,host=vs-dentdev,mode=rw,path=/ free=63612837888i,inodes_free=33490077i,inodes_total=33554432i,inodes_used=64355i,total=68685922304i,used=5073084416i,used_percent=7.3859158410173436 1674860724000000000
> disk,device=sda1,fstype=xfs,host=vs-dentdev,mode=rw,path=/boot free=818626560i,inodes_free=523973i,inodes_total=524288i,inodes_used=315i,total=1063256064i,used=244629504i,used_percent=23.007581360946748 1674860724000000000
> disk,device=mapper/dev_dent_d-dev_dent_d,fstype=xfs,host=vs-dentdev,mode=rw,path=/storage/scsi1 free=51953291264i,inodes_free=26165538i,inodes_total=26212352i,inodes_used=46814i,total=53656686592i,used=1703395328i,used_percent=3.174618926719134 1674860724000000000

Prod (example of the issue:

2023-01-27T23:06:33Z I! Starting Telegraf 1.25.0
2023-01-27T23:06:33Z I! Available plugins: 228 inputs, 9 aggregators, 26 processors, 21 parsers, 57 outputs, 2 secret-stores
2023-01-27T23:06:33Z I! Loaded inputs: disk
2023-01-27T23:06:33Z I! Loaded aggregators: 
2023-01-27T23:06:33Z I! Loaded processors: 
2023-01-27T23:06:33Z I! Loaded secretstores: 
2023-01-27T23:06:33Z W! Outputs are not used in testing mode!
2023-01-27T23:06:33Z I! Tags enabled: host=vs-dentprod
> disk,device=dm-0,fstype=xfs,host=vs-dentprod,mode=rw,path=/ free=61784248320i,inodes_free=33488591i,inodes_total=33554432i,inodes_used=65841i,total=68685922304i,used=6901673984i,used_percent=10.048163804882144 1674860794000000000
> disk,device=sda1,fstype=xfs,host=vs-dentprod,mode=rw,path=/boot free=818622464i,inodes_free=523972i,inodes_total=524288i,inodes_used=316i,total=1063256064i,used=244633600i,used_percent=23.00796659270217 1674860794000000000
> disk,device=dm-2,fstype=xfs,host=vs-dentprod,mode=rw,path=/storage/scsi1 free=21691576320i,inodes_free=26151805i,inodes_total=26212352i,inodes_used=60547i,total=53656686592i,used=31965110272i,used_percent=59.573395791394 1674860794000000000

Here is my telegraf.conf: https://pastebin.com/iCyKrzrY

Hi,

the ones that start with dev will use the /dev/mapper device

  • Was this ever working in a previous telegraf version?
  • Can you collect lsblk and /proc/self/mountinfo from prod and dev? I’d like to see how they might be different.

Thanks!

This is a new install, i’m in the process of trying to move from a graphite/grafana/nagios setup to TICK

$ rpm -qa|grep telegraf
telegraf-1.25.0-1.x86_64

Dev:

NAME                    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                       8:0    0   82G  0 disk 
├─sda1                    8:1    0    1G  0 part /boot
└─sda2                    8:2    0   80G  0 part 
  ├─rhel-root           253:0    0   64G  0 lvm  /
  └─rhel-swap           253:1    0   16G  0 lvm  [SWAP]
sdb                       8:16   0   50G  0 disk 
└─dev_dent_d-dev_dent_d 253:2    0   50G  0 lvm  /storage/scsi1
sr0                      11:0    1  6.6G  0 rom  
cat /proc/self/mountinfo
21 96 0:20 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw
22 96 0:5 / /proc rw,nosuid,nodev,noexec,relatime shared:25 - proc proc rw
23 96 0:6 / /dev rw,nosuid shared:21 - devtmpfs devtmpfs rw,size=8184088k,nr_inodes=2046022,mode=755
24 21 0:7 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:3 - securityfs securityfs rw
25 23 0:21 / /dev/shm rw shared:22 - tmpfs tmpfs rw,size=12582912k
26 23 0:22 / /dev/pts rw,nosuid,noexec,relatime shared:23 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
27 96 0:23 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,mode=755
28 21 0:24 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:4 - tmpfs tmpfs ro,mode=755
29 28 0:25 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:5 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
30 21 0:26 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:17 - pstore pstore rw
31 21 0:27 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime shared:18 - bpf bpf rw,mode=700
32 28 0:28 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup rw,memory
33 28 0:29 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:7 - cgroup cgroup rw,cpu,cpuacct
34 28 0:30 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:8 - cgroup cgroup rw,net_cls,net_prio
35 28 0:31 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:9 - cgroup cgroup rw,pids
36 28 0:32 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup rw,freezer
37 28 0:33 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup rw,devices
38 28 0:34 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:12 - cgroup cgroup rw,perf_event
39 28 0:35 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,cpuset
40 28 0:36 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,blkio
41 28 0:37 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,hugetlb
42 28 0:38 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,rdma
43 21 0:12 / /sys/kernel/tracing rw,relatime shared:19 - tracefs none rw
92 21 0:39 / /sys/kernel/config rw,relatime shared:20 - configfs configfs rw
96 1 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/rhel-root rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
20 22 0:19 / /proc/sys/fs/binfmt_misc rw,relatime shared:26 - autofs systemd-1 rw,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=19739
44 21 0:8 / /sys/kernel/debug rw,relatime shared:27 - debugfs debugfs rw
45 23 0:41 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs rw,pagesize=2M
46 23 0:18 / /dev/mqueue rw,relatime shared:29 - mqueue mqueue rw
47 21 0:42 / /sys/fs/fuse/connections rw,relatime shared:30 - fusectl fusectl rw
112 96 8:1 / /boot rw,relatime shared:61 - xfs /dev/sda1 rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
115 96 253:2 / /storage/scsi1 rw,relatime shared:63 - xfs /dev/mapper/dev_dent_d-dev_dent_d rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
118 96 253:2 /dent /opt/dent rw,relatime shared:63 - xfs /dev/mapper/dev_dent_d-dev_dent_d rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
189 96 0:43 / /var/lib/nfs/rpc_pipefs rw,relatime shared:66 - rpc_pipefs sunrpc rw
414 27 0:44 / /run/user/13070 rw,nosuid,nodev,relatime shared:218 - tmpfs tmpfs rw,size=1640416k,mode=700,uid=13070,gid=10513
433 27 0:46 / /run/user/0 rw,nosuid,nodev,relatime shared:255 - tmpfs tmpfs rw,size=1640416k,mode=700

Prod:

NAME                      MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                         8:0    0   82G  0 disk 
├─sda1                      8:1    0    1G  0 part /boot
└─sda2                      8:2    0   80G  0 part 
  ├─rhel-root             253:0    0   64G  0 lvm  /
  └─rhel-swap             253:1    0   16G  0 lvm  [SWAP]
sdb                         8:16   0   50G  0 disk 
└─prod_dent_d-prod_dent_d 253:2    0   50G  0 lvm  /storage/scsi1
sr0                        11:0    1 1024M  0 rom  
21 96 0:20 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw
22 96 0:5 / /proc rw,nosuid,nodev,noexec,relatime shared:25 - proc proc rw
23 96 0:6 / /dev rw,nosuid shared:21 - devtmpfs devtmpfs rw,size=8183868k,nr_inodes=2045967,mode=755
24 21 0:7 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:3 - securityfs securityfs rw
25 23 0:21 / /dev/shm rw shared:22 - tmpfs tmpfs rw,size=12582912k
26 23 0:22 / /dev/pts rw,nosuid,noexec,relatime shared:23 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
27 96 0:23 / /run rw,nosuid,nodev shared:24 - tmpfs tmpfs rw,mode=755
28 21 0:24 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:4 - tmpfs tmpfs ro,mode=755
29 28 0:25 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:5 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
30 21 0:26 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:17 - pstore pstore rw
31 21 0:27 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime shared:18 - bpf bpf rw,mode=700
32 28 0:28 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup rw,hugetlb
33 28 0:29 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:7 - cgroup cgroup rw,memory
34 28 0:30 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:8 - cgroup cgroup rw,devices
35 28 0:31 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:9 - cgroup cgroup rw,net_cls,net_prio
36 28 0:32 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup rw,perf_event
37 28 0:33 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup rw,cpuset
38 28 0:34 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:12 - cgroup cgroup rw,cpu,cpuacct
39 28 0:35 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,pids
40 28 0:36 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,rdma
41 28 0:37 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,freezer
42 28 0:38 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,blkio
43 21 0:12 / /sys/kernel/tracing rw,relatime shared:19 - tracefs none rw
92 21 0:39 / /sys/kernel/config rw,relatime shared:20 - configfs configfs rw
96 1 253:0 / / rw,relatime shared:1 - xfs /dev/mapper/rhel-root rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
20 23 0:18 / /dev/mqueue rw,relatime shared:26 - mqueue mqueue rw
44 22 0:19 / /proc/sys/fs/binfmt_misc rw,relatime shared:27 - autofs systemd-1 rw,fd=45,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=20777
45 21 0:8 / /sys/kernel/debug rw,relatime shared:28 - debugfs debugfs rw
46 23 0:41 / /dev/hugepages rw,relatime shared:29 - hugetlbfs hugetlbfs rw,pagesize=2M
47 21 0:42 / /sys/fs/fuse/connections rw,relatime shared:30 - fusectl fusectl rw
112 96 8:1 / /boot rw,relatime shared:61 - xfs /dev/sda1 rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
115 96 253:2 / /storage/scsi1 rw,relatime shared:63 - xfs /dev/mapper/prod_dent_d-prod_dent_d rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
118 96 253:2 /dent /opt/dent rw,relatime shared:63 - xfs /dev/mapper/prod_dent_d-prod_dent_d rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
406 96 0:43 / /var/lib/nfs/rpc_pipefs rw,relatime shared:175 - rpc_pipefs sunrpc rw
431 27 0:44 / /run/user/13070 rw,nosuid,nodev,relatime shared:227 - tmpfs tmpfs rw,size=1640372k,mode=700,uid=13070,gid=10513

I believe I’ve managed to reproduce locally:

disk,device=mapper/dev_dent_d-dev_dent_d,fstype=xfs,host=ryzen,mode=rw,path=/storage/scsi1 inodes_total=8388608i,inodes_free=8388605i,inodes_used=3i,total=17112760320i,free=16959598592i,used=153161728i,used_percent=0.8950147441789215 1675261594000000000

I used a pair of spare USB drives and ran the following to build a similar LVM + xfs devices:

sudo wipefs --all --backup /dev/sdb
sudo pvcreate /dev/sdb
sudo vgcreate dev_dent_d /dev/sdb
sudo lvcreate -L 16G -n dev_dent_d dev_dent_d
sudo mkfs.xfs /dev/dev_dent_d/dev_dent_d
sudo mkdir -p /storage/scsi1
sudo mount /dev/dev_dent_d/dev_dent_d /storage/scsi1

sudo wipefs --all --backup /dev/sdc
sudo pvcreate /dev/sdc
sudo vgcreate prod_dent_d /dev/sdc
sudo lvcreate -L 16G -n prod_dent_d prod_dent_d
sudo mkfs.xfs /dev/prod_dent_d/prod_dent_d
sudo mkdir -p /storage/scsi2
sudo mount /dev/prod_dent_d/prod_dent_d /storage/scsi2
$ lsblk
sdb                         8:16   1 119.5G  0 disk 
└─dev_dent_d-dev_dent_d   254:1    0    16G  0 lvm  /storage/scsi1
sdc                         8:32   1 119.5G  0 disk 
└─prod_dent_d-prod_dent_d 254:2    0    16G  0 lvm  /storage/scsi2

Which resulted in:

disk,device=mapper/dev_dent_d-dev_dent_d,fstype=xfs,host=ryzen,mode=rw,path=/storage/scsi1 total=17112760320i,free=16959598592i,used=153161728i,used_percent=0.8950147441789215,inodes_total=8388608i,inodes_free=8388605i,inodes_used=3i 1675262405000000000
disk,device=dm-2,fstype=xfs,host=ryzen,mode=rw,path=/storage/scsi2 used=153161728i,used_percent=0.8950147441789215,inodes_total=8388608i,inodes_free=8388605i,inodes_used=3i,total=17112760320i,free=16959598592i 1675262405000000000

It looks like we are getting the device name directly from shirou/gopsutil library. Telegraf uses this library for a number of system operations like disk information. As such I am going to try to reproduce with just that library and see if this is a known issue:

	0: {"device":"/dev/mapper/dev_dent_d-dev_dent_d","mountpoint":"/storage/scsi1","fstype":"xfs","opts":["rw","relatime"]}
	1: {"device":"/dev/dm-2","mountpoint":"/storage/scsi2","fstype":"xfs","opts":["rw","relatime"]}

If I run:

package main

import (
	"fmt"

	"github.com/shirou/gopsutil/v3/disk"
)

func main() {
	parts, err := disk.Partitions(true)
	if err != nil {
		panic(err)
	}
	for _, part := range parts {
		fmt.Println(part.String())
	}
}

I get the same info:

{"device":"/dev/mapper/dev_dent_d-dev_dent_d","mountpoint":"/storage/scsi1","fstype":"xfs","opts":["rw","relatime"]}
{"device":"/dev/dm-2","mountpoint":"/storage/scsi2","fstype":"xfs","opts":["rw","relatime"]}

Looking at the PartitionsWithContext, it will read the values from /proc/1/mountinfo, which for our devices is:

142 26 254:1 / /storage/scsi1 rw,relatime shared:719 - xfs /dev/mapper/dev_dent_d-dev_dent_d rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
1298 26 254:2 / /storage/scsi2 rw,relatime shared:751 - xfs /dev/mapper/prod_dent_d-prod_dent_d rw,attr2,inode64,logbufs=8,logbsize=32k,noquota

It splits those lines by -, and takes the device name from the second field:

/dev/mapper/dev_dent_d-dev_dent_d
/dev/mapper/prod_dent_d-prod_dent_d

Further on it then checks if the device starts with /dev/mapper to resolve the dm- based name:

devpath, err := filepath.EvalSymlinks(common.HostDev(strings.Replace(d.Device, "/dev", "", -1)))

That replacement is the root cause. I believe the intent is to replace /dev/ at the beginning, but it is instead replacing every instance of /dev at the start and at the start of your mapper device:

/dev/mapper/dev_dent_d-dev_dent_d
becomes
/mapper_dent_d-dev_dent_d
instead of
mapper/dev_dent_d-dev_dent_d

Which will not resolve the symlink in /dev/mapper:

❯ ls -l /dev/mapper/
total 0
crw------- 1 root root 10, 236 Feb  1 06:09 control
lrwxrwxrwx 1 root root       7 Feb  1 07:25 dev_dent_d-dev_dent_d -> ../dm-1
lrwxrwxrwx 1 root root       7 Feb  1 07:39 prod_dent_d-prod_dent_d -> ../dm-2

I have filed lvm group named dev fails to resolve · Issue #1411 · shirou/gopsutil · GitHub and put up fix(disk): correctly replace /dev in /dev/mapper by powersj · Pull Request #1412 · shirou/gopsutil · GitHub

looking at everything, it looks like in the end everything would be dm-# - not sure if it is possible, but is there a way to add to the plugin config to return the mapper device instead of dm-#?

thank you so much!

Correct it will ultimately go back to the disk device (e.g. dm-#) rather than the LVM LV.

There is no option that I am aware of, but have you looked to see if the LVM plugin would get you what you want?

did not know that existed. i’ll probably switch over to that one, however, i am very happy the bug was identified and a fix submitted. thank you again for all your help!

I’ve been working with the LVM plugin, and it does not report filesystem usage, just pv usage. While going through the gopsutil source, i do see it is possible to get the label straight from disk.go:

// Label returns label of given device or empty string on error.
// Name of device is expected, eg. /dev/sda
// Supports label based on devicemapper name
// See https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-block-dm
func Label(name string) (string, error) {
	return LabelWithContext(context.Background(), name)
}

According to the diskio plugin dm-# is near-meaningless. It looks like it would be possible to add the label into the disk plugin, however, I am not familiar enough with go to add it, and therefore i am not sure how hard it would be. Would you have a minute or two to look at this?

thanks!

Would you have a minute or two to look at this?

Yeah, can I ask you to go open a Telegraf Feature Request on GitHub?

Looking at this briefly I don’t think that will do what you want. There is no label unless you have specified something. Check out lsblk output:

lsblk -o name,mountpoint,label

It sounds like you are after the untranslated device name (e.g. /dev/mapper...), which is lost after this translation

the only reason i was going after the untranslated name is because i needed a way to identify the disk through a graph. label works for my purposes just as well. (i typically label my filesystems the same as the LV). i opened a feature request here: Add Label to disk plugin · Issue #12594 · influxdata/telegraf · GitHub i also did come up with some proof of concept code (it is rough, i have never done anything in golang before today)