Storage health data from HP ILO using IPMI Sensor and Telegraf

Hi,

I have a pretty extensive telegraf setup to gather metrics from HP and Dell servers. Everything is setup correctly. However, I was testing Grafana alerts by pulling out hard drives from one of the HP servers. The server has a 4 disk RAID 5 configuration. I pulled out 3/4 disks from their bays. This is what happened:

  • In the ILO dashboard, the storage status shows up as “Failed”
  • The server’s physical health LED is blinking red
    However, on the data collected by IPMI_SENSOR telegraf plugin, the status of disk stays ‘OK’ with some status description:

(This is taken from grafana dashboard. I have verified the state of data in my InfluxDB as well)

I’m trying to figure out if the Status should be set to something other than an OK. Is it some bug or a behaviour. Any suggestions on changing it?

Thanks!

Please provide the following:

  1. the direct output from telegraf via logs using the [[outputs.file]]
  2. your configuration
  3. Logs of telegraf with debug enabled

I looked at this some more, and we don’t really do any transformation of the text, we should be reporting whatever we get. So I would also suggest looking at the output of:

ipmitool sdr

or if you are using the version 2 schema:

ipmitool sdr elist

Hey,

Thank you so much for your quick responses, apologies for the delay from my side.

I will check out ipmitool’s output. Unfortunately, I cannot provide rest of the required output logs and configuration. I’ll try to see if I find something that helps me then will update the thread.

Hi!

Sharing the method that I used to achieve the use case.

Use Case
The ipmi_sensor plugin returns a field status_desc which contains description returned by the device. In case of a drive, the status_desc may point towards a failure while the status may stay “ok”. I need to have a field which gives me the status_desc in form a number so that I may trigger Grafana alerts based on that because using “status” is not reliable. Please note that the DB that is being used is InfluxDB

Method used

  1. Used processors.converter telegraf plugin to convert the InfluxDB Tag Key status_desc to a Field Key
  2. Used processors.enum telegraf plugin to map the values of status_desc field key to numbers/integers
1 Like

Thanks for following up with what you used!