Does the Plugin support collecting information via IPMITOOL chassis power status?
Here is the plugin readme: telegraf/plugins/inputs/ipmi_sensor at master · influxdata/telegraf · GitHub
From the example output we do have some information about the power supplies:
ipmi_sensor,server=10.20.2.203,name=power_supply_1,unit=watts status=1i,value=110 1517125513000000000
ipmi_sensor,server=10.20.2.203,name=power_supply_2,unit=watts status=1i,value=120 1517125513000000000
ipmi_sensor,server=10.20.2.203,name=power_supplies value=0,status=1i 1517125513000000000
Is this what you are after or is there more detailed information?
@jpowers
Right. What I currently need is to differentiate the result of when the server is on or off and add this result as a filter.
I found the information only through ipmitool chassis power status
Reason:
When a FAN failure occurs, the status_code returns as nr , this same status_code is returned when the server is turned off and it is not possible to create an alert.
Can you provide some example output of this?
The IPMI plugin was developed originally to be focused on collecting sensor data, namely the values from ipmitool sdr
. I’m not opposed to extending this to other data, but would need examples and we would need to hide this beyond a config option as well.
Here is an example when the server is off and the other the server is on, but with an error on the FAN
SYS_FAN1 | 0 RPM | nr
SYS_FAN2 | 0 RPM | nr
SYS_FAN3 | 0 RPM | nr
SYS_FAN4 | 0 RPM | nr
SYS_FAN5 | 0 RPM | nr
SYS_FAN6 | 0 RPM | nr
SW_FAN1 | 0 RPM | nr
SW_FAN2 | 0 RPM | nr
SW_FAN3 | 0 RPM | nr
SW_FAN4 | 0 RPM | nr
SW_FAN5 | 0 RPM | nr
GPU_FAN1 | 0 RPM | nr
GPU_FAN2 | 0 RPM | nr
GPU_FAN3 | 0 RPM | nr
GPU_FAN4 | 0 RPM | nr
GPU_FAN5 | 0 RPM | nr
GPU_FAN6 | 0 RPM | nr
GPU_FAN7 | 0 RPM | nr
GPU_FAN8 | 0 RPM | nr
GPU_FAN9 | 0 RPM | nr
GPU_FAN10 | 0 RPM | nr
REAR_FAN1 | 0 RPM | nr
REAR_FAN2 | 0 RPM | nr
REAR_FAN3 | 0 RPM | nr
REAR_FAN4 | 0 RPM | nr
PSU1 Slow FAN1 | 0x00 | ok
PSU2 Slow FAN1 | 0x00 | ok
PSU3 Slow FAN1 | 0x00 | ok
PSU4 Slow FAN1 | 0x00 | ok
PSU5 Slow FAN1 | 0x00 | ok
PSU6 Slow FAN1 | 0x00 | ok
# /usr/bin/ipmitool -H 172.21.72.150 -U tsdb -I lanplus -L USER chassis power status
Chassis Power is off
Check that on the server with the FAN failure it returned with the NR.
# /usr/bin/ipmitool -H 172.21.72.114 -U tsdb -I lanplus -L USER sdr |grep FAN
SYS_FAN1 | 10200 RPM | ok
SYS_FAN2 | 10100 RPM | ok
SYS_FAN3 | 10200 RPM | ok
SYS_FAN4 | 10200 RPM | ok
SYS_FAN5 | 10100 RPM | ok
SYS_FAN6 | 10100 RPM | ok
SW_FAN1 | 12600 RPM | ok
SW_FAN2 | 12800 RPM | ok
SW_FAN3 | 15200 RPM | ok
SW_FAN4 | 14800 RPM | ok
SW_FAN5 | 9400 RPM | ok
GPU_FAN1 | 5300 RPM | ok
GPU_FAN2 | 5400 RPM | ok
GPU_FAN3 | 5200 RPM | ok
GPU_FAN4 | 5200 RPM | ok
GPU_FAN5 | 5300 RPM | ok
GPU_FAN6 | 5300 RPM | ok
GPU_FAN7 | 5300 RPM | ok
GPU_FAN8 | 0 RPM | nr
GPU_FAN9 | 5300 RPM | ok
GPU_FAN10 | 5300 RPM | ok
REAR_FAN1 | 9620 RPM | ok
REAR_FAN2 | 9230 RPM | ok
REAR_FAN3 | 8450 RPM | ok
REAR_FAN4 | 8450 RPM | ok
PSU1 Slow FAN1 | 0x00 | ok
PSU2 Slow FAN1 | 0x00 | ok
PSU3 Slow FAN1 | 0x00 | ok
PSU4 Slow FAN1 | 0x00 | ok
PSU5 Slow FAN1 | 0x00 | ok
PSU6 Slow FAN1 | 0x00 | ok
# /usr/bin/ipmitool -H 172.21.72.114 -U tsdb -I lanplus -L USER chassis power status
Chassis Power is on
Hi again,
Can you grab one of the artifacts from this PR: feat(inputs.ipmi_sensor): Collect additional commands by powersj · Pull Request #15495 · influxdata/telegraf · GitHub
You will need to update your IPMI sensor plugin config as follows:
[[inputs.ipmi_sensor]]
sensors = ["sdr", "chassis_power_status"]
Can you let me know if that creates a metric with a name tag with the value chassis_power_status
?
If that works, can you also try the dcmi_power_reading
sensor as well?
[[inputs.ipmi_sensor]]
sensors = ["sdr", "chassis_power_status", "dcmi_power_reading"]
Thanks!
Hi,
Hi,
I added it to the conf file, but it showed an error when checking.
error:
2024-06-13T13:50:33Z I! Loading config: ipmi_inputs.conf
2024-06-13T13:50:33Z E! error loading config file ipmi_inputs.conf: plugin inputs.ipmi_sensor: line 1: configuration specified the fields ["sensors"], but they weren't used
[[inputs.ipmi_sensor]]
interval = "300s"
timeout = "30s"
metric_version = 2
privilege = "USER"
use_cache = true
sensors = ["sdr", "chassis_power_status"]
servers = ["tsdb:xxxxxx@lanplus(172.21.72.150)", "tsdb:xxxxx@lanplus(172.21.72.114)"]
Thanks
That sounds like you didn’t use the artifact and maybe your locally installed telegraf?
Can you enable debug mode and ensure the version matches?
telegraf-1.32.0/usr/bin
❯ ./telegraf --version
Telegraf 1.32.0-9885bb54 (git: pull/15495@9885bb54)
telegraf-1.32.0/usr/bin
❯ vim config.toml
telegraf-1.32.0/usr/bin took 19s
❯ ./telegraf --config config.toml --once
2024-06-13T14:00:09Z I! Loading config: config.toml
2024-06-13T14:00:09Z I! Starting Telegraf 1.32.0-9885bb54 brought to you by InfluxData the makers of InfluxDB
2024-06-13T14:00:09Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 6 secret-stores
2024-06-13T14:00:09Z I! Loaded inputs: ipmi_sensor
2024-06-13T14:00:09Z I! Loaded aggregators:
2024-06-13T14:00:09Z I! Loaded processors:
2024-06-13T14:00:09Z I! Loaded secretstores:
2024-06-13T14:00:09Z I! Loaded outputs: file
2024-06-13T14:00:09Z I! Tags enabled: host=ryzen
2024-06-13T14:00:09Z D! [agent] Initializing plugins
2024-06-13T14:00:09Z E! [telegraf] Error running agent: could not initialize input inputs.ipmi_sensor: looking up "ipmitool" failed: exec: "ipmitool": executable file not found in $PATH
[agent]
debug = true
[[inputs.ipmi_sensor]]
interval = "300s"
timeout = "30s"
metric_version = 2
privilege = "USER"
use_cache = true
sensors = ["sdr", "chassis_power_status"]
servers = ["tsdb:xxxxxx@lanplus(172.21.72.150)", "tsdb:xxxxx@lanplus(172.21.72.114)"]
[[outputs.file]]
telegraf --version
Telegraf 1.28.5 (git: HEAD@77e1a498)
[root@ccl-lab-collector01 telegraf.d]# service telegraf status
Redirecting to /bin/systemctl status telegraf.service
× telegraf.service - Telegraf
Loaded: loaded (/usr/lib/systemd/system/telegraf.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Thu 2024-06-13 10:38:43 EDT; 31s ago
Duration: 1month 1w 5d 10h 36min 25.400s
Docs: GitHub - influxdata/telegraf: Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
Process: 2015297 ExecStart=/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS (code=exited, status=1/FAILURE)
Main PID: 2015297 (code=exited, status=1/FAILURE)
CPU: 140ms
Jun 13 10:38:43 ccl-lab-collector01 systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 5.
Jun 13 10:38:43 ccl-lab-collector01 systemd[1]: Stopped Telegraf.
Jun 13 10:38:43 ccl-lab-collector01 systemd[1]: telegraf.service: Start request repeated too quickly.
Jun 13 10:38:43 ccl-lab-collector01 systemd[1]: telegraf.service: Failed with result ‘exit-code’.
Jun 13 10:38:43 ccl-lab-collector01 systemd[1]: Failed to start Telegraf.
After removing the sensors = [“sdr”, “chassis_power_status”]
2024-06-13T14:39:39Z I! Starting Telegraf 1.28.5 brought to you by InfluxData the makers of InfluxDB
2024-06-13T14:39:39Z I! Available plugins: 240 inputs, 9 aggregators, 29 processors, 24 parsers, 59 outputs, 5 secret-stores
2024-06-13T14:39:39Z I! Loaded inputs: cpu disk io ipmi_sensor mem net snmp swap system
2024-06-13T14:39:39Z I! Loaded aggregators:
2024-06-13T14:39:39Z I! Loaded processors:
2024-06-13T14:39:39Z I! Loaded secretstores:
2024-06-13T14:39:39Z I! Loaded outputs: opentsdb
2024-06-13T14:39:39Z I! Tags enabled: host=ccl-lab-collector01
2024-06-13T14:39:39Z W! Deprecated inputs: 1 and 0 options
2024-06-13T14:39:39Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:“ccl-lab-collector01”, Flush Interval:10s
2024-06-13T14:39:39Z W! DeprecationWarning: Value “false” for option “ignore_protocol_stats” of plugin “inputs.net” deprecated since version 1.27.3 and will be removed in 1.36.0: use the ‘inputs.nstat’ plugin instead
Yeah that is the wrong version. That is probably what you have installed and not from the artifact.
Looks like you have telegraf running as a service, my suggestion is to download the artifacts, extract it and run it manually from the CLI, like so:
telegraf --config config.toml --once --debug
[root@ccl-lab-collector01 bin]# ./telegraf --version
Telegraf 1.31.0 (git: HEAD@fbfaba05)
[root@ccl-lab-collector01 bin]# ./telegraf --config …/…/etc/telegraf/telegraf.toml --once --debug
2024-06-13T16:06:26Z I! Loading config: …/…/etc/telegraf/telegraf.toml
2024-06-13T16:06:26Z E! error loading config file …/…/etc/telegraf/telegraf.toml: plugin inputs.ipmi_sensor: line 9: configuration specified the fields [“sensors”], but they were not used. This is either a typo or this config option does not exist in this version.
[root@ccl-lab-collector01 bin]#
[root@ccl-lab-collector01 bin]# cat …/…/etc/telegraf/telegraf.toml
[agent]
debug = true
[[inputs.ipmi_sensor]]
interval = “300s”
timeout = “30s”
metric_version = 2
privilege = “USER”
use_cache = true
sensors = [“sdr”, “chassis_power_status”]
servers = [“tsdb:xxxxxx@lanplus(172.21.72.150)”, “tsdb:xxxxx@lanplus(172.21.72.114)”]
[[outputs.file]]
[root@ccl-lab-collector01 bin]#
We are looking for:
Telegraf 1.32.0-9885bb54 (git: pull/15495@9885bb54)
Here are some direct links from the PR: feat(inputs.ipmi_sensor): Collect additional commands by powersj · Pull Request #15495 · influxdata/telegraf · GitHub
let me know if you are using some other OS + arch.
Wonder. Correct functionality
[root@hci-mtl3-collector01 bin]# ./telegraf --version
Telegraf 1.32.0-9885bb54 (git: pull/15495@9885bb54)
[root@hci-mtl3-collector01 bin]# ./telegraf --config …/…/etc/telegraf/telegraf.conf --once --debug |grep power
2024-06-13T17:13:37Z I! Loading config: …/…/etc/telegraf/telegraf.conf
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu1_power_in,server=172.21.72.150,status_code=ok,unit=watts value=0 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu1_power_out,server=172.21.72.150,status_code=ok,unit=watts value=0 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu2_power_in,server=172.21.72.150,status_code=ok,unit=watts value=0 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu2_power_out,server=172.21.72.150,status_code=ok,unit=watts value=0 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu3_power_in,server=172.21.72.150,status_code=ok,unit=watts value=0 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu3_power_out,server=172.21.72.150,status_code=ok,unit=watts value=0 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu4_power_in,server=172.21.72.150,status_code=ok,unit=watts value=0 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu4_power_out,server=172.21.72.150,status_code=ok,unit=watts value=0 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu5_power_in,server=172.21.72.150,status_code=ok,unit=watts value=0 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu5_power_out,server=172.21.72.150,status_code=ok,unit=watts value=0 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu6_power_in,server=172.21.72.150,status_code=ok,unit=watts value=0 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu6_power_out,server=172.21.72.150,status_code=ok,unit=watts value=0 1718298818000000000
ipmi_sensor,host=hci-mtl3-collector01,name=chassis_power_status,server=172.21.72.150 value=0i 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu1_power_in,server=172.21.72.114,status_code=ok,unit=watts value=368 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu1_power_out,server=172.21.72.114,status_code=ok,unit=watts value=320 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu2_power_in,server=172.21.72.114,status_code=ok,unit=watts value=368 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu2_power_out,server=172.21.72.114,status_code=ok,unit=watts value=336 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu3_power_in,server=172.21.72.114,status_code=ok,unit=watts value=368 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu3_power_out,server=172.21.72.114,status_code=ok,unit=watts value=320 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu4_power_in,server=172.21.72.114,status_code=ok,unit=watts value=352 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu4_power_out,server=172.21.72.114,status_code=ok,unit=watts value=304 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu5_power_in,server=172.21.72.114,status_code=ok,unit=watts value=336 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu5_power_out,server=172.21.72.114,status_code=ok,unit=watts value=288 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu6_power_in,server=172.21.72.114,status_code=ok,unit=watts value=352 1718298818000000000
ipmi_sensor,entity_id=10.0,host=hci-mtl3-collector01,name=psu6_power_out,server=172.21.72.114,status_code=ok,unit=watts value=304 1718298818000000000
ipmi_sensor,host=hci-mtl3-collector01,name=chassis_power_status,server=172.21.72.114 value=1i 1718298818000000000
Awesome! Thank you very much for confirming!