浪潮服务器主板显示b8图,浪潮的硬件监控(ipmitool,MegaCli)

本文探讨了浪潮服务器中使用ipmitool和MegaCli进行硬件监控的细节,包括风扇、处理器、内存、硬盘和电源状态,以及MegaCli在阵列管理和详细信息获取中的应用。对比了与Dell服务器监控的区别,并提供了实例信息和命令说明。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

浪潮的硬件监控(ipmitool,MegaCli)

为什么使用ipmitool和MegaCli工具监控?

在浪潮的的服务器中ipmitool中的信息和管理卡中的信息是一一对应的,比如说看这个在管理卡上的风扇的状态值

f46d0ae7b10cc337087dc707a77c891c.png

在系统里ipmitool获取风扇的相关信息

ipmitool sdr list | grep FAN[0-6]

FAN0_F_Speed | 4224 RPM | ok

FAN0_R_Speed | 3744 RPM | ok

FAN1_F_Speed | 4224 RPM | ok

FAN1_R_Speed | 3744 RPM | ok

FAN2_F_Speed | 4320 RPM | ok

FAN2_R_Speed | 3840 RPM | ok

FAN3_F_Speed | 4224 RPM | ok

FAN3_R_Speed | 3840 RPM | ok

从这里输出的数据中可以开到他显示了风扇的状态和读值。 dell和浪潮的风扇的监控区别是,dell的监控中如果风扇的读取值异常了会显示在返回的状态中。在浪潮的监控中是只监控了风扇的在线情况。风扇的读取值异常比如时候风扇的读取值是200,但是风扇还在线。dell就会有状态的异常,会报警。浪潮的就不会(但是一般这种情况很少发生,我是没有见过)

ipmitool的监控

处理器的状态监控 这里看他的状态不是看ok那一栏,而是看Presence detected这一栏 ipmitool sdr elist | grep -i cpu[0-2]_status 对应管理卡

b851ec261347211d00530416e2a7bbab.png

CPU0_Status | 7Dh | ok | 3.0 | Presence detected

CPU1_Status | 7Eh | ok | 3.0 | Presence detected

查看内存的状态 注意:浪潮这里就比较恶心了,他的内存的传感器 名字叫cpu(哈----忒)所以在grep的时候需要看一下他的传感器的名称。他的传感器的名称型号和型号之间是不一致的。所以做监控的时候需要兼容一下。 这里也是他的状态值是否是Presence Detected

f93ba0d270063afd8485fa84675000dc.png

ipmitool sdr elist | grep -i CPU[0-1]_C[0-1]D[0-1]

CPU0_C0D0 | 83h | ok | 32.0 | Presence Detected

CPU0_C0D1 | 84h | ok | 32.1 |

CPU0_C1D0 | 85h | ok | 32.2 | Presence Detected

CPU0_C1D1 | 86h | ok | 32.3 |

CPU1_C0D0 | 8Fh | ok | 32.12 | Presence Detected

CPU1_C0D1 | 90h | ok | 32.13 |

CPU1_C1D0 | 91h | ok | 32.14 | Presence Detected

CPU1_C1D1 | 92h | ok | 32.15 |

硬盘插口状态查看 ipmitool sdr elist| grep -i disk

下边是硬盘插槽

DISK0_Status | B4h | ok | 4.0 | Drive Present

DISK1_Status | B5h | ok | 4.1 | Drive Present

DISK2_Status | B6h | ok | 4.2 | Drive Present

DISK3_Status | B7h | ok | 4.3 | Drive Present

DISK4_Status | B8h | ok | 4.4 | Drive Present

DISK5_Status | B9h | ok | 4.5 | Drive Present

DISK6_Status | BAh | ok | 4.6 | Drive Present

DISK7_Status | BBh | ok | 4.7 | Drive Present

DISK8_Status | BCh | ok | 4.8 | Drive Present

DISK9_Status | BDh | ok | 4.9 | Drive Present

DISK10_Status | BEh | ok | 4.10 | Drive Present

DISK11_Status | BFh | ok | 4.11 | Drive Present

DISK12_Status | C0h | ok | 4.12 |

DISK13_Status | C1h | ok | 4.13 |

DISK14_Status | C2h | ok | 4.14 |

DISK15_Status | C3h | ok | 4.15 |

DISK16_Status | C4h | ok | 4.16 |

DISK17_Status | C5h | ok | 4.17 |

DISK18_Status | C6h | ok | 4.18 |

DISK19_Status | C7h | ok | 4.19 |

DISK20_Status | C8h | ok | 4.20 |

DISK21_Status | C9h | ok | 4.21 |

DISK22_Status | CAh | ok | 4.22 |

DISK23_Status | CBh | ok | 4.23 |

DISK24_Status | D4h | ok | 4.24 |

下边是硬盘背板插槽

DISK0_R_Status | CCh | ok | 4.0 | Drive Present

DISK1_R_Status | CDh | ok | 4.1 | Drive Present

DISK2_R_Status | CEh | ok | 4.2 |

DISK3_R_Status | CFh | ok | 4.3 |

DISK4_R_Status | D0h | ok | 4.4 |

DISK5_R_Status | D1h | ok | 4.5 |

DISK6_R_Status | D2h | ok | 4.6 |

DISK7_R_Status | D3h | ok | 4.7 |

电源信息 ipmitool sdr elist| grep -i psu[0-1]_status

982c3f2fc124d7820f57013bde234654.png

PSU0_Status | 74h | ok | 10.0 | Presence detected

PSU1_Status | 75h | ok | 10.0 | Presence detected

风扇状态信息 ipmitool sdr elist| grep -i fan[0-9]_Present

FAN0_Present | 60h | ok | 29.0 | Device Present

FAN1_Present | 61h | ok | 29.1 | Device Present

FAN2_Present | 62h | ok | 29.2 | Device Present

FAN3_Present | 63h | ok | 29.3 | Device Present

温度情况监控 ipmitool sdr elist| grep -i temp

Inlet_Temp | 00h | ok | 12.0 | 22 degrees C

Outlet_Temp | 01h | ok | 55.1 | 32 degrees C

CPU0_Temp | 06h | ok | 3.0 | 28 degrees C

CPU1_Temp | 07h | ok | 3.0 | 26 degrees C

CPU0_DIMM_Temp | 0Eh | ok | 32.0 | 34 degrees C

CPU1_DIMM_Temp | 0Fh | ok | 32.0 | 32 degrees C

CPU0_VR_Temp | 02h | ok | 3.0 | 31 degrees C

CPU1_VR_Temp | 03h | ok | 3.1 | 30 degrees C

PCH_Temp | 16h | ok | 3.0 | 44 degrees C

OCP_Temp | 29h | ns | 11.0 | No Reading

NVME_Temp | 28h | ns | 11.1 | No Reading

PSU0_Temp | 1Ch | ok | 32.0 | 28 degrees C

PSU1_Temp | 1Dh | ok | 32.0 | 27 degrees C

RAID0_Temp | 17h | ok | 11.0 | 58 degrees C

RAID1_Temp | 18h | ns | 11.1 | No Reading

RAID2_Temp | 19h | ns | 11.2 | No Reading

RAID3_Temp | 1Ah | ns | 11.3 | No Reading

GPU0_Temp | 20h | ns | 11.0 | No Reading

GPU1_Temp | 21h | ns | 11.1 | No Reading

GPU2_Temp | 22h | ns | 11.2 | No Reading

GPU3_Temp | 23h | ns | 11.3 | No Reading

GPU4_Temp | 24h | ns | 11.4 | No Reading

GPU5_Temp | 25h | ns | 11.5 | No Reading

GPU6_Temp | 26h | ns | 11.6 | No Reading

GPU7_Temp | 27h | ns | 11.7 | No Reading

PCIE_SSD0_Temp | A7h | ns | 11.0 | No Reading

PCIE_SSD1_Temp | A8h | ns | 11.1 | No Reading

PCIE_SSD2_Temp | A9h | ns | 11.2 | No Reading

PCIE_SSD3_Temp | AAh | ns | 11.3 | No Reading

PCIE_SSD4_Temp | ABh | ns | 11.4 | No Reading

PCIE_SSD5_Temp | ACh | ns | 11.5 | No Reading

PCIE_SSD6_Temp | ADh | ns | 11.6 | No Reading

PCIE_SSD7_Temp | AEh | ns | 11.7 | No Reading

M.2_Inlet_Temp | 05h | ok | 55.0 | 28 degrees C

Rear_HDDBP_Temp | 2Ah | ns | 11.0 | No Reading

SWITCH0_Temp | 4Ah | ns | 11.0 | No Reading

SWITCH1_Temp | 4Bh | ns | 11.1 | No Reading

HDD_Max_Temp | 2Bh | ok | 11.0 | 32 degrees C

阵列监控

MegaCli64具体其他的使用可以百度一下

硬盘信息输出 sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll -NoLog| egrep -iv "exit|Adapter"

Enclosure Device ID: 8 # id

Slot Number: 13 # 磁盘插槽

Enclosure position: 0

Device Id: 14

Sequence Number: 2

Media Error Count: 0

Other Error Count: 0

Predictive Failure Count: 0

Last Predictive Failure Event Seq Number: 0

PD Type: SATA

Raw Size: 3.638 TB [0x1d1c0beb0 Sectors] #设备大小

Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors]

Coerced Size: 3.637 TB [0x1d1b00000 Sectors]

Firmware state: Online, Spun Up # 磁盘的状态 就是监控磁盘的这个值的状态

SAS Address(0): 0x56c92bf001fa0bcd

Connected Port Number: 0(path0)

Inquiry Data: V6J3J9SS HGST HUS726T4TALA6L4 VLGAW41G

FDE Capable: Not Capable

FDE Enable: Disable

Secured: Unsecured

Locked: Unlocked

Needs EKM Attention: No

Foreign State: None

Device Speed: 6.0Gb/s

Link Speed: Unknown

Media Type: Hard Disk Device

Drive: Not Certified

Drive Temperature :27C (80.60 F) # 温度

虚拟硬盘的信息获取 他可能有很多的阵列,现在只是拿出其中一个说 sudo /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aAll -NoLog| egrep -iv "exit|Adapter"

Virtual Drive: 9 (Target Id: 9)

Name :

RAID Level : Primary-0, Secondary-0, RAID Level Qualifier-0 # 这里就是raid0

Size : 3.637 TB

State : Optimal # 这个是这个整列的状态,阵列的监控就是监控的这个值

Strip Size : 64 KB # 这个是他的条带

Number Of Drives : 1

Span Depth : 1

Default Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU

Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU

Access Policy : Read/Write

Disk Cache Policy : Disk's Default

Encryption Type : None

Bad Blocks Exist: No

Number of Spans: 1

Span: 0 - Number of PDs: 1

# 下边是在这个整列中的磁盘信息,但是这里的磁盘信息需要注意,当磁盘信息是在线或者热备的时候会显示在这下边的列表中。

PD: 0 Information

Enclosure Device ID: 8

Slot Number: 10

Enclosure position: 0

Device Id: 17

Sequence Number: 2

Media Error Count: 0

Other Error Count: 0

Predictive Failure Count: 0

Last Predictive Failure Event Seq Number: 0

PD Type: SATA

Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]

Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors]

Coerced Size: 3.637 TB [0x1d1b00000 Sectors]

Firmware state: Online, Spun Up

SAS Address(0): 0x56c92bf001fa0bca

Connected Port Number: 0(path0)

Inquiry Data: V6J3J1BS HGST HUS726T4TALA6L4 VLGAW41G

FDE Capable: Not Capable

FDE Enable: Disable

Secured: Unsecured

Locked: Unlocked

Needs EKM Attention: No

Foreign State: None

Device Speed: 6.0Gb/s

Link Speed: Unknown

Media Type: Hard Disk Device

Drive: Not Certified

Drive Temperature :28C (82.40 F)

查看阵列卡的详细信息 sudo /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -aAll

BBU status for Adapter: 0

BatteryType: CVPM02

Voltage: 9431 mV

Current: 0 mA

Temperature: 25 C

BBU Firmware Status:

Charging Status : None

Voltage : OK

Temperature : OK

Learn Cycle Requested : No

Learn Cycle Active : No

Learn Cycle Status : OK

Learn Cycle Timeout : No

I2c Errors Detected : No

Battery Pack Missing : No

Battery Replacement required : No

Remaining Capacity Low : No

Periodic Learn Required : No

Transparent Learn : No

No space to cache offload : No

Pack is about to fail & should be replaced : No

Cache Offload premium feature required : No

Module microcode update required : No

Battery state:

GasGuageStatus:

Fully Discharged : Yes

Fully Charged : Yes

Discharging : Yes

Initialized : Yes

Remaining Time Alarm : No

Remaining Capacity Alarm: Yes

Discharge Terminated : Yes

Over Temperature : No

Charging Terminated : Yes

Over Charged : No

Pack energy : 247 J

Capacitance : 110

Remaining reserve space : 0

BBU Design Info for Adapter: 0

Date of Manufacture: 08/06, 2019

Design Capacity: 288 J

Design Voltage: 9500 mV

Serial Number: 1550

Manufacture Name: LSI

Device Name: CVPM02

Device Chemistry: EDLC

Battery FRU: N/A

TMM FRU: N/A

Module Version: 6635-02A

BBU Properties for Adapter: 0

Auto Learn Period: 2412000 Sec

Next Learn time: 634778466 Sec

Learn Delay Interval:0 Hours

Auto-Learn Mode: Enabled

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值