0% found this document useful (0 votes)
205 views164 pages

Huawei Servers Troubleshooting 16

This document provides troubleshooting guidance for Huawei servers, including log collection, fault diagnosis, software upgrades, and preventive maintenance. It is intended for technical support and maintenance engineers and includes safety instructions and a detailed troubleshooting process. The document outlines various operations and resources to assist in server maintenance and fault resolution.

Uploaded by

paulotaguaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
205 views164 pages

Huawei Servers Troubleshooting 16

This document provides troubleshooting guidance for Huawei servers, including log collection, fault diagnosis, software upgrades, and preventive maintenance. It is intended for technical support and maintenance engineers and includes safety instructions and a detailed troubleshooting process. The document outlines various operations and resources to assist in server maintenance and fault resolution.

Uploaded by

paulotaguaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 164

Huawei Servers

Troubleshooting

Issue 16
Date 2019-11-15

HUAWEI TECHNOLOGIES CO., LTD.


Copyright © Huawei Technologies Co., Ltd. 2019. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without prior written
consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.

Notice
The purchased products, services and features are stipulated by the contract made between Huawei and the
customer. All or part of the products, services and features described in this document may not be within the
purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information,
and recommendations in this document are provided "AS IS" without warranties, guarantees or
representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.


Address: Huawei Industrial Base
Bantian, Longgang
Shenzhen 518129
People's Republic of China

Website: https://e.huawei.com

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. i


Huawei Servers
Troubleshooting About This Document

About This Document

Overview
This document describes how to collect logs, diagnose faults, upgrade software, perform
preventive maintenance and common operations, and collect the information required to for
troubleshoot Huawei E9000, E6000, X6000, X8000, X6800, rack, and G Series
Heterogeneous servers.

It guides you through the server troubleshooting process.

Intended Audience
This document is intended for:

l Technical support engineers


l Maintenance engineers

Symbol Conventions
The symbols that may be found in this document are defined as follows.

Symbol Description

Indicates a hazard with a high level of risk which, if not


avoided, will result in death or serious injury.

Indicates a hazard with a medium level of risk which, if


not avoided, could result in death or serious injury.

Indicates a hazard with a low level of risk which, if not


avoided, could result in minor or moderate injury.

Indicates a potentially hazardous situation which, if not


avoided, could result in equipment damage, data loss,
performance deterioration, or unanticipated results.
NOTICE is used to address practices not related to
personal injury.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. ii


Huawei Servers
Troubleshooting About This Document

Symbol Description

Supplements the important information in the main text.


NOTE is used to address information not related to
personal injury, equipment damage, and environment
deterioration.

Change History
Issue Date Description

16 2019-11-15 This issue is the sixteenth official release.


Added content about the Atlas 800 AI server
(model 3010).

15 2019-09-19 This issue is the fifteenth official release.


Added contents related to the MM920/MM921.

14 2019-07-05 This is the fourteenth official release.


Added information about SmartKit.

13 2019-01-08 This is the thirteenth official release.

12 2018-07-13 This is the twelfth official release.

11 2018-05-18 This is the eleventh official release.


Added description about the FusionServer
G2500 heterogeneous server.

10 2018-03-12 This issue is the tenth official release.


Added information about the Atlas G5500
heterogeneous server.

09 2017-12-14 This issue is the ninth official release.


Added description about the CX916 switch
module of the E9000 server.

08 2017-08-08 This issue is the eighth official release.


Modified 4.4.1.1 Connecting a PC to the
Ethernet Switching Plane.

07 2017-07-20 This issue is the seventh official release.


Modified 4.4.2.1 Collection Method.

06 2017-04-18 This issue is the sixth official release.


Added description that faulty E9000 compute
nodes cannot be reseated in 5.5 Checking
Indicators to Locate Faults.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. iii


Huawei Servers
Troubleshooting About This Document

Issue Date Description

05 2016-10-27 This issue is the fifth official release.


Added the quick recovery method for E9000
switch modules in 5.6 Handling Faults Based
on Symptoms.

04 2016-07-11 This issue is the fourth official release.


l Modified 4.4.2.7 Using the Switch Module
CLI to Collect FC Switching Plane
Information (MX210/MX220).
l Added the quick recovery method in 5.6
Handling Faults Based on Symptoms.

03 2016-05-10 This issue is the third official release.


l Deleted the "Using the Web Tools of a
Switch Module to Collect Information
About an FC Switching Plane (NX120/
NX220/MX210/MX220)" section.
l Modified 4.4.2.4 Using the V8 Switch
Module CLI to Collect Ethernet
Switching Plane Information.
l Added 9 Other Resources.

02 2015-10-27 This issue is the second official release.


l Added 5.5 Checking Indicators to Locate
Faults.
l Added description about how to collect
FreeBSD and Solaris host information in
4.2 Collecting OS Logs.

01 2015-10-09 The issue is the first official release.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. iv


Huawei Servers
Troubleshooting Contents

Contents

About This Document.....................................................................................................................ii


1 Safety Instructions.........................................................................................................................1
2 Troubleshooting Process.............................................................................................................. 5
3 Preparing for Troubleshooting................................................................................................... 7
4 Collecting Information............................................................................................................... 11
4.1 Collecting Basic Information........................................................................................................................................11
4.2 Collecting OS Logs...................................................................................................................................................... 12
4.3 Collecting Hardware Logs............................................................................................................................................13
4.4 Collecting Switch Module Logs (for E9000+MM910)................................................................................................14
4.4.1 Preparing for Log Collection..................................................................................................................................... 14
4.4.1.1 Connecting a PC to the Ethernet Switching Plane................................................................................................. 14
4.4.1.2 Querying the Software Version of the Ethernet Switching Plane...........................................................................16
4.4.2 Collecting Switch Module Logs................................................................................................................................ 18
4.4.2.1 Collection Method.................................................................................................................................................. 18
4.4.2.2 Using InfoCollect to Collect Switch Module Logs................................................................................................ 19
4.4.2.3 Using the V5 Switch Module CLI to Collect Ethernet Switching Plane Information........................................... 20
4.4.2.4 Using the V8 Switch Module CLI to Collect Ethernet Switching Plane Information........................................... 23
4.4.2.5 Using the Web Tools Page of a Switch Module to Collect FC Switching Plane Information (MX510)............... 28
4.4.2.6 Using the Switch Module CLI to Collect FC Switching Plane Information (MX510).......................................... 30
4.4.2.7 Using the Switch Module CLI to Collect FC Switching Plane Information (MX210/MX220)............................ 32
4.5 Collecting Switch Module Logs (for E9000+MM910/MM921)..................................................................................34
4.6 Collecting Qlogic HBA Logs....................................................................................................................................... 35
4.7 Collecting Other Logs.................................................................................................................................................. 36

5 Diagnosing and Rectifying Faults............................................................................................37


5.1 Fault Diagnosis Rules................................................................................................................................................... 37
5.2 Using Tools to Diagnose Faults....................................................................................................................................38
5.3 Handling Alarms...........................................................................................................................................................38
5.4 Using Error Codes to Locate Faults............................................................................................................................. 39
5.5 Checking Indicators to Locate Faults........................................................................................................................... 40
5.6 Handling Faults Based on Symptoms...........................................................................................................................64
5.6.1 Power Failures........................................................................................................................................................... 65

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. v


Huawei Servers
Troubleshooting Contents

5.6.2 KVM Login Faults.....................................................................................................................................................69


5.6.3 POST Faults...............................................................................................................................................................71
5.6.4 Memory Errors.......................................................................................................................................................... 76
5.6.5 Drive I/O Faults......................................................................................................................................................... 78
5.6.6 Ethernet Controller Faults......................................................................................................................................... 80
5.6.7 FC Controller Faults.................................................................................................................................................. 85
5.6.8 Switch Module Faults................................................................................................................................................ 89
5.6.9 OS Faults................................................................................................................................................................... 90

6 Software and Firmware Upgrade............................................................................................. 96


7 Preventive Maintenance.............................................................................................................98
7.1 Inspecting the Equipment Room Environment and Cable Layout............................................................................... 98
7.1.1 Precautions.................................................................................................................................................................98
7.1.2 Inspecting the Equipment Room Environment......................................................................................................... 99
7.1.3 Inspecting Cable Layout............................................................................................................................................ 99
7.2 Inspecting Servers.......................................................................................................................................................100
7.2.1 Precautions...............................................................................................................................................................100
7.2.2 Inspecting Indicators................................................................................................................................................100
7.2.3 Using SmartKit to Perform Health Inspection........................................................................................................ 101
7.2.4 Checking the System Status Through iBMC...........................................................................................................101
7.3 Huawei Server Inspection Report...............................................................................................................................102

8 Common Operations.................................................................................................................107
8.1 Obtaining a Product SN.............................................................................................................................................. 108
8.2 Using iMana 200 to Collect Information in Batches.................................................................................................. 114
8.3 Using iBMC to Collect Information in Batches......................................................................................................... 115
8.4 Using the MM910 WebUI to Collect Information in Batches (for Versions Earlier Than U54 2.20)........................117
8.5 Using the MM910 WebUI to Collect Information in Batches (for U54 2.20 or Later)..............................................117
8.6 Using the FusionDirector WebUI to Collection Information in Batches....................................................................118
8.7 Using the MM510 CLI to Collect Information (FusionServer G5500)......................................................................118
8.8 Logging In to the iMana 200 WebUI..........................................................................................................................119
8.9 Logging In to the iBMC WebUI................................................................................................................................. 122
8.10 Logging In to the Web Tools of the MX510.............................................................................................................125
8.11 Logging In to the MM910 WebUI............................................................................................................................ 126
8.12 Logging In to the FusionDirector WebUI.................................................................................................................130
8.13 Logging In to the MM510 CLI.................................................................................................................................134
8.14 Logging In to the RMC CLI..................................................................................................................................... 136
8.15 Logging In to a Server Over a Network Port by Using PuTTY............................................................................... 140
8.16 Logging In to a Server Over a Serial Port by Using PuTTY....................................................................................142
8.17 Logging In to a Compute Node, Passthrough Module, or Switch Module by Using the SOL Function of the
MM910............................................................................................................................................................................. 144
8.18 Logging In to a Compute Node, Passthrough Module, or Switch Module by Using the SOL Function of the
MM920/MM921............................................................................................................................................................... 147
8.19 Using WinSCP to Transfer Files...............................................................................................................................148

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. vi


Huawei Servers
Troubleshooting Contents

8.20 Configuring an FTP Server.......................................................................................................................................150


8.21 Using SFTP to Transfer Files................................................................................................................................... 151

9 Other Resources......................................................................................................................... 153


9.1 Obtaining Technical Support...................................................................................................................................... 153
9.2 Product Information Resources.................................................................................................................................. 154
9.3 Product Configuration Resources............................................................................................................................... 155
9.4 Maintenance Tools......................................................................................................................................................155

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. vii


Huawei Servers
Troubleshooting 1 Safety Instructions

1 Safety Instructions

General Instructions
l Comply with all local laws and regulations when installing the hardware. These Safety
Instructions are only a supplement.
l Observe the instructions that accompany all "DANGER", "WARNING", "CAUTION",
and "NOTE" symbols in this document. Follow them in conjunction with these Safety
Instructions.
l Observe all safety instructions provided on the device labels when installing hardware.
Follow them in conjunction with these Safety Instructions.
l Operations involving high voltages or moving equipment must be performed by
authorized, qualified personnel.
l Take protective measures against radio interference before operating the device in
residential areas.

Personal Safety
l Only personnel certified or authorized by Huawei are allowed to install equipment or its
components.
l Discontinue any dangerous operations and take protective measures. Report anything
that could cause personal injury or equipment damage to a project supervisor.
l Do not move devices or install cabinets and power cables in hazardous weather
conditions.
l The average weight carried by a person cannot exceed the maximum acceptable weight
of lift (MAWL) allowed by local safety regulations. Before moving a device, check the
maximum device weight and arrange required personnel.
l Wear clean protective gloves, ESD clothing, a protective hat, and protective shoes, as
shown in Figure 1-1.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 1


Huawei Servers
Troubleshooting 1 Safety Instructions

Figure 1-1 Protective clothing

l Before contacting devices, wear antistatic clothing and ESD gloves, and take off
electricity-conductive materials such as watches and jewelries, as shown in Figure 1-2.

Figure 1-2 Conductive objects to be removed

Figure 1-3 shows how to wear an ESD wrist strap.


1. Secure the wrist strap around your wrist.
2. Fasten the strap buckle and ensure that the ESD wrist strap is snug against the skin.
3. Insert the attached ground terminal into the jack on the grounded rack or chassis.

Figure 1-3 Wearing a wrist strap

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 2


Huawei Servers
Troubleshooting 1 Safety Instructions

l Exercise caution when using tools that could cause personal injury.
l Use a stacker when lifting hardware above shoulder height.
l Avoid any contact with high-voltage cables.
l Ensure that the device is properly grounded before powering it on.
l Do not use a ladder alone.
l Do not look into optical ports without eye protection.

Equipment Safety
l Use dedicated power cables to ensure equipment and personal safety.
l Use power cables only for dedicated devices.
l When moving a device, hold the handles or bottom of the device. Do not hold the handle
of the installed module, such as a power module, fan module, drive, or mainboard.
l Connect the power cables to separate power distribution units (PDUs) for active/standby
operation.

Transportation Precautions
l The logistics company engaged to transport the equipment must be reliable and comply
with international standards for transporting electronics. Ensure that the equipment being
transported is always kept upright. Take necessary precautions to prevent collisions,
corrosion, package damage, damp conditions and pollution.
l Transport the equipment in its original packaging.
l If original packages are not used, package heavy, bulky items (such as chassis and
compute nodes) and fragile components (such as PCIe GPUs and SSDs and optical
modules) separately.

Use the Intelligent Computing Compatibility Checker to search for components supported by the
compute nodes or servers.
l Power off all equipment before transportation. Do not transport hazardous materials.

Weight Limits Per Person

To reduce the risk of personal injury, comply with local regulations with regard to the
maximum weight one person is permitted to carry.

Table 1-1 lists the maximum weight each person is permitted to carry by standards
organization.

Table 1-1 Maximum handling weight


Organization Weight (kg/lb)

European Committee for Standardization (CEN) 25/55.13

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 3


Huawei Servers
Troubleshooting 1 Safety Instructions

Organization Weight (kg/lb)

International Organization for Standardization (ISO) 25/55.13

National Institute for Occupational Safety and Health (NIOSH) 23/50.72

Health and Safety Executive (HSE) 25/55.13

General Administration of Quality Supervision, Inspection and l Male: 15/33.08


Quarantine of the People's Republic of China (AQSIQ) l Female: 10/22.05

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 4


Huawei Servers
Troubleshooting 2 Troubleshooting Process

2 Troubleshooting Process

Troubleshooting is a process of using appropriate methods to find the cause of a fault and
rectify the fault. The guideline of troubleshooting is to narrow down the scope of possible
causes for a fault to reduce troubleshooting complexity, identify the root cause, and rectify the
fault.
Figure 2-1 shows the recommended troubleshooting process.

Figure 2-1 Troubleshooting flowchart

Table 2-1 Troubleshooting steps


Step Description

3 Preparing for Prepare the manuals and tools required for fault diagnosis and
Troubleshooting rectification.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 5


Huawei Servers
Troubleshooting 2 Troubleshooting Process

Step Description

4 Collecting Collect comprehensive information for fault diagnosis.


Information

5 Diagnosing and Locate the fault and take troubleshooting measures.


Rectifying Faults

9.1 Obtaining If a fault is difficult to locate or rectify after you refer to documents,
Technical Support contact Huawei technical support.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 6


Huawei Servers
Troubleshooting 3 Preparing for Troubleshooting

3 Preparing for Troubleshooting

Scenarios
This section describes how to prepare for troubleshooting.

Basic Knowledge and Skills


Get familiar with the following basic knowledge and skills before troubleshooting:
l Server product knowledge
l Danger signs and levels
l Server hardware architecture
l Indicators on the front and rear panels
l Systems that run on servers
l Device operating conditions
l Common hardware operations such as power-on and power-off
l Common software operations such as upgrade
l Device maintenance process

Essential Materials
Table 3-1 lists the materials that you must read before routine maintenance for Huawei
servers.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 7


Huawei Servers
Troubleshooting 3 Preparing for Troubleshooting

Table 3-1 Essential materials for routine maintenance

Document Type Description How to Obtain

User Guide Describes the server structure, 1. Log in to the Support >
specifications, and installation Intelligent Servers or
method. Each Huawei server has Support > AI Computing
a user guide or maintenance and Platform page.
service guide. 2. Choose a server model to
access the product page.
3. On the Documentation tab
page, choose Operation &
Maintenance.
4. View the required user guide
or maintenance and service
guide.

Alarm Handling Describes common alarms 1. Log in to the Support >


reported to the server iMana 200/ Management Software >
iBMC or management module, Server Management
and alarm handling suggestions. Software > iBMC >
Each Huawei server has an alarm Troubleshooting > Alarm
reference. Handling page.
2. View the corresponding alarm
handling manual.

Equipment Room Describes the regulations for Comply with the customer's
Management equipment room management equipment room management
Regulations and routine maintenance. regulations during onsite
maintenance.

Software Tools
Table 3-2 lists the software tools required for routine maintenance of Huawei servers.

Table 3-2 Tools for routine maintenance

Tool Server and Description


Version

FusionServer Huawei-developed Diagnoses and configures servers for fault


Tools Toolkit V2 and V3 servers locating.
(For details, see the Download link: FusionServer Tools
FusionServer
Tools 2.0 Toolkit
User Guide.)

FusionServer See the Used for new site deployment and delivery,
Tools 2.0 FusionServer troubleshooting, and firmware upgrade.
SmartKit Tools 2.0 SmartKit Download link: FusionServer Tools
User Guide.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 8


Huawei Servers
Troubleshooting 3 Preparing for Troubleshooting

Tool Server and Description


Version

Smart Huawei-developed Used to install OSs without a physical DVD-


Provisioning V5 servers (For ROM drive, configure RAID, upgrade
details, see the firmware, and perform troubleshooting.
Smart
Provisioning User
Guide.)

PuTTY All Huawei servers Third-party tool used for remote access. You
of all versions can obtain the tool from the Internet.

WinSCP All Huawei servers Third-party tool used for file transfer for iMana
of all versions 200/iBMC or the management module. You can
obtain the tool from the Internet.

WFTPD All Huawei servers Third-party tool used for file transfer for the
of all versions Ethernet switching plane of a switch module.
You can obtain the tool from the Internet.

CoreFTPServer/ All Huawei servers Third-party tools used for file transfer for the
mini-sftp-server of all versions FC switching plane of a switch module. You
can obtain the tool from the Internet.

Hardware Tools
Table 3-3 lists the hardware tools required for routine maintenance of Huawei servers.

Table 3-3 Hardware tools required for routine maintenance


Tool Description

Floating nut hook Used to guide floating nuts to the holes in the mounting bars of
a rack.

Screwdriver Used to tighten and loosen screws. A screwdriver can be a flat-


head, Phillips, hex screwdriver.

Diagonal pliers Used to trim insulation tubes and cable ties.

Multimeter Used to measure the resistance and voltage and to check


connectivity.

ESD wrist strap Used to prevent ESD damage when you touch or operate
devices or components.

Electrostatic discharge Used to prevent ESD damage to a board or precision instrument


(ESD) gloves when you insert, remove, or hold them.

Cable tie Used to bind cables.

Ladder Used to perform operations at heights.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 9


Huawei Servers
Troubleshooting 3 Preparing for Troubleshooting

Tool Description

PC Used to access the management network port or a service


network port over the network to capture data. (You need to
prepare a network cable.)

Serial cable Used to connect the serial port on the server. The serial port is
usually a DB9 or RJ45 port.

Thermometer and Used to measure the equipment room temperature and relative
hygrometer humidity.

Oscilloscope Used to measure the voltage and time sequence.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 10


Huawei Servers
Troubleshooting 4 Collecting Information

4 Collecting Information

About This Chapter


If a fault occurs on a server, collect logs for fault diagnosis.
Collect logs immediately upon fault occurrence to obtain the original data.
4.1 Collecting Basic Information
4.2 Collecting OS Logs
4.3 Collecting Hardware Logs
4.4 Collecting Switch Module Logs (for E9000+MM910)
4.5 Collecting Switch Module Logs (for E9000+MM910/MM921)
4.6 Collecting Qlogic HBA Logs
4.7 Collecting Other Logs

4.1 Collecting Basic Information


The customer needs to collect basic information listed in Table 4-1 before submitting a
service request.

Table 4-1 Server fault records


Server fault records

Trouble Ticket No. Example: 123456 Fault Report Time Example:


2015-10-18
20:30:00

Customer Name Full name of your Address Example: 20 Baker


organization Street, New York

Customer Example: John Contact Info Phone number and


Contact/ASP Name Smith email address

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 11


Huawei Servers
Troubleshooting 4 Collecting Information

Server fault records

Device Model Example: RH2285 SN/ESN Example:


V2 2102310XXXXX
(For details about
how to obtain the
value, see 8.1
Obtaining a
Product SN.)

Hardware If the device configuration (CPUs, DIMMs, RAID controller cards,


Configuration or NICs) is modified, you need to provide the modified
configuration. If the configuration is not modified, enter None.

OS and Service Example: SLES 11 SP1 64-bit or Oracle 10.2. (Consider the fault
Software Version symptom to determine whether to collect the OS and service
software versions.)

Fault Occurrence Example: 2015-10-18 20:30:00


Time

Fault Symptom Example: The server frequently restarts during OS installation or


the server stops responding upon power-on.

Action Before Fault Example: BIOS settings configuration, memory capacity expansion,
Occurrence network settings modification.

Action and Result Example: After the power cable is disconnected and then
After Fault reconnected, the fault persists.
Occurrence After the DVD-ROM is replaced, the fault persists.
(Optional)
...

4.2 Collecting OS Logs


Collect OS logs after an OS fault occurs.

NOTICE

l Obtain the customer's written authorization before collecting information.


l Logs collected by SmartKit may contain sensitive customer information. If sensitive
customer information is involved, obtain the customer's written authorization before
performing any maintenance operation.

Table 4-2 describes the methods for collecting logs of different OSs.

Table 4-2 Methods for collecting OS logs


OS Collection Method

Windows Use SmartKit to collect logs of Linux and Windows OSs.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 12


Huawei Servers
Troubleshooting 4 Collecting Information

OS Collection Method

Linux For details, see the FusionServer Tools 2.0 SmartKit User Guide.

VMware l If the purple screen of death (PSOD) does not occur, perform the following
steps:
1. Log in to the ESX server console as the root user.
2. Run the vm-support command to collect all VMware logs.
3. After logs are collected, check that a log file in the esxsupport-YYYY-
[email protected] format is generated in the /var/tmp
directory.
l If the PSOD occurs and the customer retains the site environment, perform
the following steps:
1. Capture a screenshot of the PSOD or take a photo to save the displayed
information.
2. Press Alt+F12 to switch to forcible memory information output mode,
and press Alt+PageUp/Alt+PageDown to capture screenshots and
photos. Ensure that screenshots and photos of the last several screens are
captured after the PSOD occurs.
3. Hot-restart the system, and run the vm-support command to collect all
VMware logs.
4. After logs are collected, check that a log file in the esxsupport-YYYY-
[email protected] format is generated in the /var/tmp
directory.
l If the PSOD occurs and the customer hot-restarts the system, run vm-
support to collect all of the VMware logs and check that a log file in the
[email protected] format is generated in
the /var/tmp directory.

FreeBSD Log in to the OS CLI over SSH and copy all files in /var/log/.
Copy the messages file and all files prefixed with messages (for example,
messages.0) in /var/log/ before copying other files.

Solaris Log in to the OS CLI over SSH and copy all files in the /var/log/ directory
and /var/adm/ directory.
Copy the syslog file and all files prefixed with syslog (for example, syslog.0)
in /var/log/, and copy the messages file and files prefixed with messages (for
example, messages.0) in /var/adm/ before copying other files.

4.3 Collecting Hardware Logs


Collect hardware logs after a hardware fault occurs.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 13


Huawei Servers
Troubleshooting 4 Collecting Information

NOTICE

l Obtain the customer's written authorization before collecting information.


l Logs collected by SmartKit may contain sensitive customer information. If sensitive
customer information is involved, obtain the customer's written authorization before
performing any maintenance operation.

You can use one of the following methods to collect hardware logs:

l Use SmartKit to collect server hardware information in batches. For details about the
supported servers and operations, see section "Using SmartKit > Collecting Server Logs"
in the FusionServer Tools 2.0 SmartKit User Guide.
l Use iBMC to collect hardware logs of a single server. For details, see 8.3 Using iBMC
to Collect Information in Batches.
l Use iMana 200/iBMC to collect hardware logs. For details, see the 8.2 Using iMana 200
to Collect Information in Batches or 8.3 Using iBMC to Collect Information in
Batches.
l Use SmartKit to collect hardware logs and Windows/Linux logs. For details, see the
FusionServer Tools 2.0 SmartKit User Guide.

4.4 Collecting Switch Module Logs (for E9000+MM910)

4.4.1 Preparing for Log Collection

4.4.1.1 Connecting a PC to the Ethernet Switching Plane


Connect a PC to the Ethernet switching plane before logging in to the switching plane.

Procedure
Step 1 Connect the Ethernet port of the PC to the management network ports of the active and
standby MM910 modules over the LAN. Figure 4-1 shows the network connection.

NOTICE

l The MGMT port on the MM910 panel is the management network port.
l If the active MM910 MGMT port has been connected to the network by using a network
cable and the client needs to be directly connected to the MM910, do not directly
disconnect the network cable from the active MM910 MGMT port. Otherwise, an active/
standby MM910 switchover will be triggered, which may cause network interruption. You
are advised to connect the client to the active MM910 STACK port in the chassis by using
a network cable. If the active MM910 STACK port has been connected to the MGMT port
in another chassis, use an idle active MM910 STACK port in another chassis.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 14


Huawei Servers
Troubleshooting 4 Collecting Information

Figure 4-1 Network connections

l In V2.25 and earlier versions, the MM910 MGMT port is accessed by the external network through
the 2X and 3X switch modules by default. In this case, do not connect the MM910 MGMT port and
the switch module network ports to the same network. Otherwise, a network storm will occur and
the network connection will be interrupted.
You as advised to run the smmset -d outportmode -v 1 command on the CLI to provide the
MM910 MGMT port for the external network.
l In V2.26 and later versions, the MM910 MGMT port is provided as the default management
network port for the external network.

Step 2 Use an SSH tool and the MM910 floating IP address to connect to the MM910 CLI.
For details about how to use PuTTY for SSH login, see 8.15 Logging In to a Server Over a
Network Port by Using PuTTY.

Step 3 to Step 5 configure the IP address and routing information for the management network port of
the Ethernet switching plane. If the IP address and routing information of the management network port
have been configured, skip Step 3 to Step 5.

Step 3 (Optional) Run the following command to query the IP address of the management network
port of the Ethernet switching plane:
smmget -l swiN:fruM -d swipcontrol
The parameters are described as follows:
l N indicates the slot number of the switch module. The value range is 1 to 4, mapping to
logical slot numbers 1E, 2X, 3X, and 4E from left to right on the panel respectively.
l M: indicates the ID of the switching plane. The value for the Ethernet switching plane is
2.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 15


Huawei Servers
Troubleshooting 4 Collecting Information

Check whether the IP address is 0.0.0.0.


l If yes, go to Step 4.
l If no, go to Step 5.
Step 4 (Optional) Run the following command to set an IP address for the management network port
of the Ethernet switching plane:
ipmcset -l <bladeN|swiN> -d ipaddr -v <ipaddr> <mask> [gateway]
The parameters are described as follows:
l N indicates the slot number of the switch module. The value range is 1 to 4, mapping to
logical slot numbers 1E, 2X, 3X, and 4E from left to right on the panel respectively.
l M: indicates the ID of the switching plane. The value for the Ethernet switching plane is
2.
l ipaddress: indicates the IP address of the management network port.
l maskaddress: indicates the subnet mask of the management network port.
Step 5 (Optional) Configure the gateway for the switching plane by running the following command
so that the switching plane can communicate with the PC:

For stacked switching planes, configure the gateway only for the master switching plane.

smmset -l swiN:fruM -d route -v targetvalue maskvalue gatewayvalue


The parameters are described as follows:
l N indicates the slot number of the switch module. The value range is 1 to 4, mapping to
logical slot numbers 1E, 2X, 3X, and 4E from left to right on the panel respectively.
l M: indicates the ID of the switching plane. The value for the Ethernet switching plane is
2.
l targetvalue: indicates the target network segment IP address of the switching plane.
l maskvalue: indicates the subnet mask of the switching plane.
l gatewayvalue: indicates the gateway IP address of the switching plane.
Example: Administrator@SMM:/#ipmcset -l swi2 -d ipaddr -v 172.200.2.153 255.255.0.0
172.200.0.1

----End

4.4.1.2 Querying the Software Version of the Ethernet Switching Plane


Query the software version of the switching plane before upgrading a switch module.

Prerequisites
l The switch modules have been powered on.
l For logging in to the Ethernet switching plane over SSH, the default username is root
and the default password is Huawei12#$.
l By default, the MM910 username is root and the password is Huawei12#$.
l You are familiar with the parameters required for this operation.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 16


Huawei Servers
Troubleshooting 4 Collecting Information

Table 4-3 Parameter description


Parameter Example Value

IP address and subnet mask of the l IP address of the management


management network port on the Ethernet network port: 192.168.9.61
switching plane l Subnet mask: 255.255.255.0

Floating IP address, subnet mask, and l IP address: 10.85.4.77


gateway of the MM910 l Subnet mask: 255.255.255.0
l Gateway: 10.85.4.1

Procedure
Step 1 Connect the PC to the Ethernet switching plane.
For details, see 4.4.1.1 Connecting a PC to the Ethernet Switching Plane.
Step 2 Log in to the CLI of the Ethernet switching plane by using the SOL function of the MM910.
For details about SOL login, see 8.17 Logging In to a Compute Node, Passthrough
Module, or Switch Module by Using the SOL Function of the MM910.
Step 3 Run the following command to query the version of the Ethernet switching plane:
display version
l Information similar to the following is displayed:
BoardName : CX910
CPLD Version : 003
PCB Version : VER.A
Bootrom Version : 008
Creation Time : Sep 17 2012, 09:53:25
Backup Bootrom Version : 008
Creation Time : Sep 17 2012, 09:53:25
Switch Version : 1.1.0.200.3
Creation Time : Oct 17 2012, 17:10:28
Backup Switch Version : 1.1.0.200.3
FC BoardName : UNKNOWN
FC PCB Version : UNKNOWN

If the command output contains Switch Version, the software version is V5 .


l Information similar to the following is displayed:
Huawei Versatile Routing Platform Software
VRP (R) software, Version 8.60 (OSCA V100R002C01)
Copyright (C) 2012-2013 Huawei Technologies Co., Ltd.
HUAWEI OSCA uptime is 0 day, 0 hour, 20 minutes

CX910_10GE(Master) 3 : uptime is 0 day, 0 hour, 20 minutes


StartupTime 2013/12/16 01:54:58
Memory Size : 2048 M bytes
Flash Size : 1024 M bytes
CX910_10GE version information
1. PCB Version : CX910_10GE VER C
2. MAB Version : 1
3. Board Type : CX910_10GE4. CPLD1 Version : 013
5. BIOS Version : 038
6. Software Version : 1.2.1.0.39

If the command output contains Software Version, the software version is V8.
----End

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 17


Huawei Servers
Troubleshooting 4 Collecting Information

4.4.2 Collecting Switch Module Logs

4.4.2.1 Collection Method


Table 4-4 lists the methods for collecting switch module logs.

Table 4-4 Methods for collecting switch module logs


Switch Switch Prerequis Log Collection Method Reference Link
ing Module ites
Plane Type
Type

Both V5 switch l You Using InfoCollect 4.4.2.2 Using


the module and have NOTE InfoCollect to
Etherne V8 switch downlo You can use SmartKit to collect Collect Switch
t module aded only Ethernet switching plane Module Logs
switchi the logs of V8 switch modules in
batches. In this case, the HMM
ng InfoCol
version must be 5.57 or later,
plane lect and the switch software version
and FC tool. must be VRP8 5.32 or later. For
switchi l The details, see the FusionServer
ng MM910 Tools 2.0 SmartKit User Guide
plane (for Engineers). Logs collected
IP by using this method is the
address same as those collected by
of the using the CLI.
target
server
can be
pinged.

Only V5 switch 4.4.1.1 l Using the CLI l Using the CLI


the modules Connectin NOTE 4.4.2.3 Using
Etherne g a PC to Switch module fabric plane logs the V5 Switch
t the collected by using this method Module CLI to
switchi Ethernet is the same as those collected Collect
by using InfoCollect. For
ng Switching Ethernet
details about how to collect logs
plane Plane of the switch module Fabric Switching
plane by using the InfoCollect, Plane
see the FusionServer Tools 2.0 Information
InfoCollect User Guide.
l WebUI
l Using the WebUI 8.5 Using the
MM910 WebUI
to Collect
Information in
Batches (for
U54 2.20 or
Later)

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 18


Huawei Servers
Troubleshooting 4 Collecting Information

Switch Switch Prerequis Log Collection Method Reference Link


ing Module ites
Plane Type
Type

V8 switch NOTE l Using the CLI


modules The prerequisites for using the 4.4.2.4 Using
one-click full collection
function of the MM910 WebUI
the V8 Switch
to collect switching plane logs Module CLI to
are as follows: Collect
l The MM910 software Ethernet
version is 6.00 or later. Switching
l The switching plane Plane
software version is 5.30 or Information
later.
l WebUI
8.5 Using the
MM910 WebUI
to Collect
Information in
Batches (for
U54 2.20 or
Later)

Only CX311, 4.4.1.1 Using the CLI 4.4.2.6 Using the


the FC CX911, Connectin Switch Module
switchi and CX915 g a PC to CLI to Collect FC
ng switch the Switching Plane
plane modules Ethernet Information
Switching (MX510)
Plane

8.10 Using the GUI 4.4.2.5 Using the


Logging In Web Tools Page of
to the Web a Switch Module
Tools of to Collect FC
the Switching Plane
MX510 Information
(MX510)

CX210, 4.4.1.1 Using the CLI 4.4.2.7 Using the


CX220, Connectin Switch Module
CX912, g a PC to CLI to Collect FC
and CX916 the Switching Plane
switch Ethernet Information
modules Switching (MX210/MX220)
Plane

4.4.2.2 Using InfoCollect to Collect Switch Module Logs


For details about how to use InfoCollect to collect logs for the E9000 switch module, see
section "Collecting Switch Module Log Files" in the FusionServer Tools 2.0 InfoCollect
User Guide.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 19


Huawei Servers
Troubleshooting 4 Collecting Information

4.4.2.3 Using the V5 Switch Module CLI to Collect Ethernet Switching Plane
Information

Operation Scenario
Use the E9000 server switch module CLI of the V5 platform to collect Ethernet switching
plane information, including:

l Logs
l Debugging information
l Trap information

For details about how to query the Ethernet switching plane version, see 4.4.1.2 Querying the
Software Version of the Ethernet Switching Plane.

Prerequisites
Conditions

l WFTPD 4.2.4.610 or later has been installed on the PC.


l You have logged in to the Ethernet switching plane CLI. For details, see 8.15 Logging
In to a Server Over a Network Port by Using PuTTY or 8.17 Logging In to a
Compute Node, Passthrough Module, or Switch Module by Using the SOL
Function of the MM910.

Data

Table 4-5 describes the required parameters.

Table 4-5 Data list


Parameter Example Value

IP Address CX911: 192.168.1.100


CX912: 10.77.77.77

Subnet mask 255.255.255.0

Default gateway 0.0.0.0

The default username of the switching plane is root, and the default password is Huawei12#
$.

You can query and set IP addresses of all modules. For details, see 8.11 Logging In to the MM910
WebUI.
l For the MM910 versions earlier than (U54) 2.20, choose System Management > Network
Management > xx > IP addresses.
l For the MM910 (U54) 2.20 or later, choose Chassis Settings > Network Settings > xx.

Software Tools

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 20


Huawei Servers
Troubleshooting 4 Collecting Information

wftpd32.exe: used to transfer files between different platforms, for example, from a PC to a
switch module. This tool is a free third-party tool. You can obtain it from the Internet.

Procedure
Step 1 Configure the FTP server.
For detailed about the configuration operations, see 8.20 Configuring an FTP Server.
Step 2 Configure the IP address of the management network port.
1. After logging in to the switch module by using a serial port or the SOL function, run the
following commands on the switching plane CLI to query and set the IP address of the
management network port so that the switch module can properly communicate with the
FTP server:

Skip this step if you log in to the switch module by using a network port.
<Fabric>system-view
[Fabric]interface MEth 0/0/1
[Fabric-MEth0/0/1]ip address 192.168.100.123 24
[Fabric-MEth0/0/1]display this
#
interface MEth0/0/1
ip address 192.168.100.123 255.255.255.0
#
return

[Fabric-MEth0/0/1]quit
[Fabric]quit
2. If the configured IP address and the FTP server address are not on the same network
segment, run the following command on the HMM CLI to configure a gateway for the
switching plane:
smmset -l swiN:fruM -d route -v targetvalue maskvalue gatewayvalue
The parameters are described as follows:
– N indicates the slot number of the switch module. The value range is 1 to 4,
mapping to logical slot numbers 1E, 2X, 3X, and 4E from left to right on the panel
respectively.
– M: indicates the ID of the switching plane. The value for the Ethernet switching
plane is 2.
– targetvalue: indicates the target network segment IP address of the switching plane.
– maskvalue: indicates the subnet mask of the switching plane.
– gatewayvalue: indicates the gateway IP address of the switching plane.
For example, if the IP address is 192.168.112.1, run the following command:
smmset -l swi3:fru2 -d route -v 0.0.0.0 0.0.0.0 192.168.112.1
Step 3 Obtain the log information.
1. Run the following command to collect logs.
<Fabric>display diagnostic-information diag-info.txt
Now saving the diagnostic information to the
device

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 21


Huawei Servers
Troubleshooting 4 Collecting Information

Info: The diagnostic information was saved to the device successfully.

<Fabric>save logfile
Save log file successfully.

2. View the log file system.


<Fabric>dir
Directory of flash:/
Idx Attr Size(Byte) Date Time(LMT) FileName
0 -rw- 1,075 Apr 01 2000 23:55:17 private-data.txt
1 -rw- 1,260 Apr 02 2000 00:00:13 hostkey
2 -rw- 540 Apr 02 2000 00:00:17 serverkey
3 -rw- 148,848 Sep 08 2015 11:22:40 diag-info.txt
16,129 KB total (15,976 KB free)

<Fabric>dir flashvx:/logfile/
Directory of flashvx:/
logfile/
Idx Attr Size(Byte) Date Time(LMT)
FileName
0 -rw- 2,939,200 Apr 01 2000 23:55:02
log.dblg
1 -rw- 95,988 Jan 07 2014 19:16:00
2014-01-07.19-13-54.log.zip
2 -rw- 172,081 Jan 07 2014 21:35:14
2014-01-07.21-31-56.log.zip
3 -rw- 2,716,484 Jan 23 2014 01:35:24
log.log
4 -rw- 4,589,648 Jan 17 2014 12:30:48
2000-04-01.23-55-08.dblg

3. Enter the IP address, username, and password to log in to the FTP server. In the
following example, the FTP server address is 200.1.1.126 and the username is root.
<Fabric>ftp 200.1.1.126
Trying 200.1.1.126 ...
Press CTRL+K to abort
Connected to 200.1.1.126.
220 WFTPD 2.0 service (by Texas Imperial Software) ready for new user
User(200.1.1.126 none):root
331 Give me your password, please
Enter password:
230 Logged in successfull
[ftp]

The IP address of the FTP server is configured by the user and is on the same network segment as
the management IP address of the switch module.
4. Convert the log file into a binary file for transfer.
[ftp]binary
5. Obtain the log file.
[ftp]put flash:/diag-info.txt
200 PORT command okay
150 "F:\diag-info.txt" file ready to receive in IMAGE / Binary mode
226 Transfer finished successfully.
FTP: 148848 byte(s) sent in 0.280 second(s) 531.60Kbyte(s)/sec.

[ftp]lcd flashVX:/logfile
The current local directory is flashVX:/logfile.

[ftp]mput *
Error: The file name . is invalid.
Error: The file name .. is invalid.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 22


Huawei Servers
Troubleshooting 4 Collecting Information

200 PORT command okay


150 "F:\log.dblg" file ready to receive in IMAGE / Binary mode
226 Transfer finished successfully.
FTP: 1513938 byte(s) sent in 1.160 second(s) 1305.11Kbyte(s)/sec.
200 PORT command okay
150 "F:\log.log" file ready to receive in IMAGE / Binary mode
226 Transfer finished successfully.
FTP: 2689148 byte(s) sent in 1.940 second(s) 1386.15Kbyte(s)/sec.

[ftp]quit

6. View the log file in the FTP directory on the PC.

----End

4.4.2.4 Using the V8 Switch Module CLI to Collect Ethernet Switching Plane
Information

Operation Scenario
Use the CLI of an E9000 switch module to collect the following information about the V8
platform:

l Logs
l Debugging information
l Trap information

For details about how to query the Ethernet switching plane version, see 4.4.1.2 Querying the
Software Version of the Ethernet Switching Plane.

Prerequisites
Conditions

l WFTPD 4.2.4.610 or later has been installed on the PC.


l You have logged in to the Ethernet switching plane CLI. For details, see 8.15 Logging
In to a Server Over a Network Port by Using PuTTY or 8.17 Logging In to a
Compute Node, Passthrough Module, or Switch Module by Using the SOL
Function of the MM910.

Data

Table 4-6 describes the required parameters.

Table 4-6 Data list

Parameter Example Value

IP address CX911, CX311, and CX915: 192.168.1.100


CX912, CX210, and CX220: 10.77.77.77

Subnet mask 255.255.255.0

Default gateway 0.0.0.0

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 23


Huawei Servers
Troubleshooting 4 Collecting Information

The default username of the switching plane is root, and the default password is Huawei12#
$.

You can query and set IP addresses of all modules. For details, see 8.11 Logging In to the MM910
WebUI.
l For the MM910 versions earlier than (U54) 2.20, choose System Management > Network
Management > xx > IP addresses.
l For the MM910 (U54) 2.20 or later, choose Chassis Settings > Network Settings > xx.

Software Tools

wftpd32.exe: used to transfer files between different platforms, for example, from a PC to a
switch module. wftpd32.exe is a free third-party tool. You can obtain it from the Internet.

Procedure
Step 1 Configure the FTP server.

For details, see 8.20 Configuring an FTP Server.

Step 2 After logging in through the serial port or SOL function, run the following commands on the
Ethernet switching plane CLI to check whether the management network port IP address has
been configured.

Skip this step if you log in to the switch module by using a network port.

<HUAWEI>system-view

[~HUAWEI]interface MEth 0/0/0

[~HUAWEI-MEth0/0/0]display this

l If the command output is as follows with no IP address displayed, go to Step 3


#
interface MEth0/0/0
#
return

l If the command output contains an IP address and gateway address, go to Step 4.


#
interface MEth0/0/0
ip address 192.168.100.123 255.255.255.0
#
return

Step 3 (Optional) After logging in to the switch module by using a serial port or the SOL function,
run the following commands on the Ethernet switching plane CLI to query and set the IP
address of the management network port so that the switch module can properly communicate
with the FTP server:

Skip this step if you log in to the switch module by using a network port.

<HUAWEI>system-view

[~HUAWEI]interface MEth 0/0/0

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 24


Huawei Servers
Troubleshooting 4 Collecting Information

[~HUAWEI-MEth0/0/0]ip address 192.168.100.123 24


[~HUAWEI-MEth0/0/0]commit
[~HUAWEI-MEth0/0/0]display this
#
interface MEth0/0/0
ip address 192.168.100.123 255.255.255.0
#
return

[~HUAWEI-MEth0/0/0]quit
[~HUAWEI]quit
Step 4 Obtain the log information.
1. View the log file system.
<HUAWEI>system-view
Enter system view, return user view with return command.

[~HUAWEI]diagnose
Warning: Enter diagnose view, return user view by pressing Ctrl+Z.
Info: The diagnose view is used to debug system hardware and software. Misuse
of some commands in this view will affect system performance. Therefore, use
these commands with the guidance of Huawei engineers.

[~HUAWEI-diagnose]collect diagnostic information


Info: Succeeded in collecting diagnostic information in slot 3.

[~HUAWEI-diagnose]display diagnostic-information diag-info.txt


Now saving the diagnostic information to the
device........................................................................
..............................................................................
....
Info: The diagnostic information was saved to the device successfully.

[~HUAWEI-diagnose]return
<HUAWEI>save logfile
Info: Save logfile successfully.

<HUAWEI>dir
Directory of flash:/

Idx Attr Size(Byte) Date Time


FileName
0 drwx - Apr 07 2014
22:32:50 $_checkpoint
1 dr-x - Feb 21 2014
15:03:54 $_security_info
2 -rw- 117,788,305 Jan 01 1970 00:03:53 xxx.cc
3 -rw- 117,784,209 Feb 21 2014 14:47:03 xxx.cc
4 -rw- 76,227,537 Feb 21 2014 14:41:45 xxx.cc
5 drwx - Jan 01 1970 00:00:19
POST
6 -rw- 10,568 Feb 21 2014 18:20:01
cfg_from_smm
7 -rw- 6,575 Mar 22 2014 04:14:27
cfg_local
8 -rw- 19,435 Mar 22 2014 04:14:24
device.sys
9 -rw- 1,130,184 Apr 08 2014 16:22:11 diag-
info.txt
10 drwx - Apr 08 2014 16:18:55
logfile
11 -rw- 1,838 Mar 22 2014 04:14:24

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 25


Huawei Servers
Troubleshooting 4 Collecting Information

vrpcfg.zip
1,048,576 KB total (367,972 KB free)

<HUAWEI>dir logfile/
Directory of flash:/logfile/

Idx Attr Size(Byte) Date Time


FileName
0 -rw- 7,971,326 Apr 08 2014 16:35:00
diag.log
1 -rw- 444,920 Feb 21 2014 18:23:11
diaglog_3_20140221182310.log.zip
2 -rw- 1,756,870 Apr 08 2014 16:18:55
diagnostic_information.zip
3 -rw- 4,269,737 Apr 08 2014 16:45:08
log.log
4 -rw- 354,428 Dec 22 2013 11:32:34
log_3_20131222113233.log.zip
5 -rw- 353,715 Jan 16 2014 08:50:19
log_3_20140116085018.log.zip
1,048,576 KB total (367,972 KB free)

2. Query stack information.


Record the queried slot numbers and roles of the stacked switch modules.
<HUAWEI>display stack
------------------------------------------------------------------------------
--
MemberID Role MAC Priority Device Type Bay/
Chassis

------------------------------------------------------------------------------
--
2 Standby dcd2-fcf8-5600 100 CX910 2X/
300

3 Master dcd2-fcf8-55c0 100 CX910 3X/


300
------------------------------------------------------------------------------
--

Role specifies the switch module role. The value can be Master, Standby, or Slave,
indicating the primary switch module, standby switch module, and slave switch module
respectively. Bay in Bay/Chassis indicates the switch module slot number.
3. Obtain the log file.
<HUAWEI>ftp 192.168.100.122
Trying
192.168.100.122 ...
Press CTRL+K to
abort
Connected to
192.168.100.122.
220 WFTPD 2.0 service (by Texas Imperial Software) ready for new
user
User(192.168.100.122:
(none)):huawei
331 Give me your password,
please
Enter
password:
230 Logged in successfully

[ftp]binary
200 Type is Image (Binary)

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 26


Huawei Servers
Troubleshooting 4 Collecting Information

# On the FTP server, create a log receiving directory for the master switch module in the
stack. In this example, the number 3 in swi3 indicates the stack ID (same as the slot
number) of the master switch module. (If the switch modules are not stacked, create a
log receiving directory for the current switch module. The number 3 in swi3 indicates the
slot number of the current switch module.)
[ftp]mkdir swi3
[ftp]cd swi3
[ftp]put flash:/diag-info.txt
200 Port command successful.
150 Opening data connection for diag-info.txt.
/ 100% [***********]
226 File received ok
FTP: 1756870 byte(s) send in 0.308 second(s) 5570.431Kbyte(s)/sec.

[ftp]mput flash:/logfile/*
200 Port command successful.
150 Opening data connection for diag.log.
/ 100% [***********]
226 File received ok

FTP: 7971326 byte(s) send in 0.798 second(s) 9755.010Kbyte(s)/sec.


200 Port command successful.
150 Opening data connection for diaglog_3_20140221182310.log.zip.
/ 100% [***********]
226 File received ok

FTP: 444920 byte(s) send in 0.113 second(s) 3845.061Kbyte(s)/sec.


200 Port command successful.
150 Opening data connection for diagnostic_information.zip.
/ 100% [***********]
226 File received ok

FTP: 1756870 byte(s) send in 0.308 second(s) 5570.431Kbyte(s)/sec.


200 Port command successful.
150 Opening data connection for log.log.
/ 100% [***********]
226 File received ok

FTP: 4272491 byte(s) send in 3.492 second(s) 1194.832Kbyte(s)/sec.


200 Port command successful.
150 Opening data connection for log_3_20131222113233.log.zip.
/ 100% [***********]
226 File received ok

FTP: 354428 byte(s) send in 0.238 second(s) 1454.289Kbyte(s)/sec.


200 Port command successful.
150 Opening data connection for log_3_20140116085018.log.zip.
/ 100% [***********]
226 File received ok
FTP: 353715 byte(s) send in 0.265 second(s) 1303.486Kbyte(s)/sec.

[ftp]cd ..
# On the FTP server, create a log receiving directory for the standby or slave switch
module in the stack. In this example, the number 2 in swi2 indicates the stack ID (same
as the slot number) of the master switch module. (If the switch modules are not stacked,
log in to each switch module and repeat the preceding log collection procedure.)
[ftp]mkdir swi2
[ftp]cd swi2
[ftp]mput 2#flash:/logfile/*
[ftp]cd ..
[ftp]quit

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 27


Huawei Servers
Troubleshooting 4 Collecting Information

221 Windows FTP Server (WFTPD, by Texas Imperial Software) says goodbye
<HUAWEI>

– When you use the mput command in the FTP CLI, 2#flash:/ indicates the flash root directory
of the switch module with the stack ID 2. You can obtain the stack ID and role information by
using the display stack command.
– The flash root directory of the master switch module in a stack is flash:/.
– If multiple switch modules are displayed after running the display stack command, obtain the
log file of each switch module in the logfile directory.
4. View the log file in the FTP directory on the PC.

----End

4.4.2.5 Using the Web Tools Page of a Switch Module to Collect FC Switching
Plane Information (MX510)

Operation Scenario
Use Web Tools page of a switch module (MX510) to collect information about the FC
switching plane.

This section applies to the CX311, CX911, and CX915.

Prerequisites
Conditions

l The connection between the management IP address of the FC switch module and the
server IP address is normal.
l You have logged in to the Ethernet switching plane Web Tools page. For details, see 8.10
Logging In to the Web Tools of the MX510.

Data

Table 4-7 Data list


Parameter Example Value

IP address 192.168.1.100

Subnet mask 255.255.255.0

Default gateway 0.0.0.0

For exporting the dump_support log file, the username is images, and the default password
is Huawei12#$.

Procedure
Step 1 On Web Tools, choose Switch > Download Support File, as shown in Figure 4-2.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 28


Huawei Servers
Troubleshooting 4 Collecting Information

Figure 4-2 Web Tools home page

Step 2 Select the directory for storing the log file, and click Start.
The log file download starts. If "Support file saved" is displayed in the Status area, the log
file has been successfully exported, See Figure 4-3.

Figure 4-3 Download Support File dialog box

----End

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 29


Huawei Servers
Troubleshooting 4 Collecting Information

4.4.2.6 Using the Switch Module CLI to Collect FC Switching Plane Information
(MX510)

Operation Scenario
Use the CLI of a switch module (MX510) to collect FC switching plane information.

The is section applies to the CX311, CX911, and CX915.

Prerequisites
Conditions

l The PC has been connected to the management network port of the server by using a
network cable.
l You have obtained mini-sftp-server.exe.

If the MX510 firmware version is earlier than 9.8.2.6.0, you can use the FTP tool WFTPD to collect
information. For details, see 8.20 Configuring an FTP Server.

Data

Table 4-8 Data list


Parameter Example Value

IP address 192.168.1.100

Subnet mask 255.255.255.0

Default gateway 0.0.0.0

The default username of the switching plane is admin, and the default password is
Huawei12#$.

Software Tools

mini-sftp-server.exe: used to transfer files between different platforms, for example, from a
switch module to a PC. mini-sftp-server.exe is a free third-party tool. You can obtain it from
the Internet.

Procedure
Step 1 Configure an SFTP server.

For details, see 8.21 Using SFTP to Transfer Files.

Step 2 Log in to the MX510.

For details about how to access the FC switching plane CLI, see 8.15 Logging In to a Server
Over a Network Port by Using PuTTY or 8.17 Logging In to a Compute Node,
Passthrough Module, or Switch Module by Using the SOL Function of the MM910.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 30


Huawei Servers
Troubleshooting 4 Collecting Information

Step 3 Query and set the management IP address.


1. Run the following command on the CLI of the FC switching plane to query the
management IP address:
FCoE_GW: admin> show version
*****************************************************
* *
* Command Line Interface SHell (CLISH) *
* *
*****************************************************

SystemDescription Huawei FCoE-FC Gateway module


HostName <undefined>
Eth1IPv4NetworkAddr 192.168.96.96
Eth1IPv6NetworkAddr fe80::2c0:ddff:fe24:21fe
MAC1Address 00:c0:dd:24:21:fe
WorldWideName 10:00:00:c0:dd:24:21:fd
SerialNumber 2198080446DQCB46F882
SymbolicName FCoE_GW
ActiveSWVersion V9.8.0.10.0
ActiveTimestamp Fri May 17 21:19:51 2013
POSTStatus Passed
SwitchMode Full Fabric

Eth1IPv4NetworkAddr indicates the management IP address.


2. Set the management IP address so that the switch module can properly communicate
with the configured SFTP server.
FCoE_GW: admin> admin start
FCoE_GW (admin): admin> set setup system ipv4
Set a static IPv4 address as prompted.
EthIPv4NetworkDiscovery (1=Static, 2=Bootp, 3=Dhcp, 4=Rarp) :1
EthIPv4NetworkAddress (dot-notated IP Address) : 192.168.101.123
EthIPv4NetworkMask (dot-notated IP Address) : 255.255.255.0
EthIPv4GatewayAddress (dot-notated IPv4 Address) : 192.168.101.254
Do you want to save and activate this system setup? (y/n): [n] y

3. Query the static IPv4 address of the FCoE gateway.


FCoE_GW (admin): admin> show setup system ipv4
The following information is displayed:
System Information
------------------
Eth1IPv4NetworkDiscovery Static
Eth1IPv4NetworkAddress 192.168.101.123
Eth1IPv4NetworkMask 255.255.255.0
Eth1IPv4GatewayAddress 192.168.101.254

Step 4 Obtain the log information.


1. Run the following command to collect logs and save the logs to the local PC:
FCoE_GW: admin> create support
2. Set the log collection parameters as prompted and start log collection. See Figure 4-4.
The key information is described as follows:
– In this example, 192.168.100.214 is the PC IP address.
– The username and password for the SFTP server are both vxworks.
– If you press Enter when the CLI prompts you to specify the directory for storing
the dump file, the dump file is automatically downloaded to the default directory on
the SFTP server.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 31


Huawei Servers
Troubleshooting 4 Collecting Information

Figure 4-4 Collecting information

----End

4.4.2.7 Using the Switch Module CLI to Collect FC Switching Plane Information
(MX210/MX220)

Operation Scenario
Use the CLI of a switch module (MX210/MX220) to collect FC switching plane information.

This section applies to the CX210, CX220, CX912, and CX916. The FC switching planes of
the CX210 and CX912 are the MX210, and those of the CX220 and CX916 are the MX220.

Prerequisites
Conditions

l The PC has been connected to the management network port of the server by using a
network cable.
l You have obtained mini-sftp-server.exe.

Data

Table 4-9 Data list

Parameter Example Value

IP address 10.77.77.77

Subnet mask 255.255.255.0

Default gateway 0.0.0.0

The default username of the switching plane is admin, and the default password is
Huawei12#$.

Software Tools

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 32


Huawei Servers
Troubleshooting 4 Collecting Information

mini-sftp-server.exe: used to transfer files between different platforms, for example, from a
switch module to a PC. mini-sftp-server.exe is a free third-party tool. You can obtain it from
the Internet.

Procedure
Step 1 Configure an SFTP server.

For details, see 8.21 Using SFTP to Transfer Files.

Step 2 Log in to the MX210 or MX220.

For details about how to access the FC switching plane CLI, see 8.15 Logging In to a Server
Over a Network Port by Using PuTTY or 8.17 Logging In to a Compute Node,
Passthrough Module, or Switch Module by Using the SOL Function of the MM910.

Step 3 Run the ipaddrset command to set the management IP address and then run the ipaddrshow
command to check whether the IP address is correct.
l IPv4
FC_SW:admin> ipaddrset
Ethernet IP Address [10.77.77.77]:10.32.53.47
Ethernet Subnetmask [255.255.255.0]:255.255.240.0
Fibre Channel IP Addresss [none]:
Fibre Channel Subnetmask [none]:
Gateway IP Address [0.0.0.0]:10.32.48.1
DHCP [Off]:
IP address is being changed...Done.

FC_SW:admin> ipaddrshow
FC_SW:admin> ipaddrshow
Ethernet IP Address: 10.32.53.47
Ethernet Subnetmask: 255.255.240.0
Fibre Channel IP Addresss: none
Fibre Channel Subnetmask: none
Gateway IP Address 10.32.48.1
DHCP: Off

l IPv6
FC_SW:admin> ipaddrset -ipv6 --add fd00:60:69bc:82:205:33ff:fed7:f6fe/64
IP address is being changed...Done.

FC_SW:admin> ipaddrshow
SWITCH
Ethernet IP Address: 10.20.24.55
Ethernet Subnetmask: 255.255.240.0
Gateway IP Address: 10.20.16.1
DHCP: Off
IPv6 Autoconfiguration Enabled: No
Local IPv6 Addresses:
static fd00:60:69bc:82:205:33ff:fed7:f6fe/64 preferred
IPv6 Gateways: fe80:21b:3dff:fe0b:7800 fe80:21b:edff:fe0b:2400

The current environment uses IPv4 addresses. You do not need to set the IPv6 address.

Step 4 Run the supportsave command on the CLI to collect logs.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 33


Huawei Servers
Troubleshooting 4 Collecting Information

1. Run the following command to save logs to the SFTP server.


FC_SW:admin> supportsave
This command collects RASLOG, TRACE, supportShow, core file, FFDC data
and then transfer them to a FTP/SCP/SFTP server or a USB device.
This operation can take several minutes.
NOTE: supportSave will transfer existing trace dump file first, then
automatically generate and transfer latest one. There will be two trace dump
files transferred after this command.
OK to proceed? (yes, y, no, n): [no] y
2. Set the log collection parameters as prompted and start log collection.
– Host IP or Host Name: specifies the address for storing logs on the target device
(the SFTP server IP address).
– User Name: specifies the username for logging in to the target device (the SFTP
server username).
– Password: specifies the password for logging in to the target device (the SFTP
server password).
– Protocol: specifies the transfer protocol. Set this parameter to sftp.
– Remote Directory: specifies the directory for storing log files on the SFTP server.
Create the /support directory in the home directory of the SFTP server, and set
Remote Directory to /support.
(Optional) When "Do you want to continue with CRA (Y/N)" is displayed, enter n
to start collecting logs.

Figure 4-5 Collecting logs

If "SupportSave completed" is displayed, logs are successfully collected.


----End

4.5 Collecting Switch Module Logs (for E9000+MM910/


MM921)
l FusionDirector has been installed on the MM920 and can be used to collect chassis
information.
l After FusionDirector manages the chassis of the MM921, you can use FusionDirector to
collect information.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 34


Huawei Servers
Troubleshooting 4 Collecting Information

Step 1 Log in to the FusionDirector WebUI.

For details, see 8.12 Logging In to the FusionDirector WebUI.

Step 2 Choose Compute > Hardware > Add Device > Add Online to add the chassis of the
MM920/MM921.

For details, see the FusionDirector User Guide.

Step 3 Choose Compute > Hardware > Chassis.

The list of chassis managed by FusionDirector is displayed.

Step 4 Click the chassis name.

The Overview tab page is displayed, as shown in Figure 4-6.

Figure 4-6 Overview tab page of the chassis

Step 5 Click Export Log in the Control Panel area.

The list of installed devices is displayed.

Step 6 Select the switch modules whose logs you want to export and click OK.

After the task is complete, decompress the downloaded package to obtain switch module logs.

----End

4.6 Collecting Qlogic HBA Logs


Collect HBA logs when an NIC is faulty.

Collecting QLogic HBA logs has no adverse impact on services. The following describes how
to collect QLogic HBA logs on mainstream OSs. You can download the scripts from the
official QLogic support website.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 35


Huawei Servers
Troubleshooting 4 Collecting Information

Table 4-10 Collecting QLogic HBA logs on mainstream OSs


OS Information to Be Collected

Windows Collect system information to help technical support personnel


diagnose and rectify faults.

Linux Collect information to help diagnose Fibre Channel (FC) and


iSCSI HBA faults.

Solaris Collect data.

VMware Collect VMware system logs, including QLogic HBA logs.

4.7 Collecting Other Logs


Use the following methods to collect other host logs:
l Collect Emulex HBA logs when an NIC is faulty. Use the official tool OneCapture to
collect Emulex HBA logs. This tool may affect services.
l Collect recorded videos. For details, see "Video Play" in the iMana 200 User Guide or
"Play Back" in the iBMC User Guide.
You can obtain the user guide from the HUAWEI Server Information Service
Platform.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 36


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

5 Diagnosing and Rectifying Faults

5.1 Fault Diagnosis Rules


5.2 Using Tools to Diagnose Faults
5.3 Handling Alarms
5.4 Using Error Codes to Locate Faults
5.5 Checking Indicators to Locate Faults
5.6 Handling Faults Based on Symptoms

5.1 Fault Diagnosis Rules

NOTICE

l Obtain the customer's written authorization before performing any operation.


l Before performing any operation, ensure that service data will not be lost or has been
backed up.

Observe the following fault diagnosis rules:

l Check the external components and then the internal components.


During troubleshooting, check external devices for faults (such as a power failure and
peer device failure) first.
l Check the network and then network elements (NEs).
According to the network topology, check whether the network environment is normal
and then check whether the NEs are normal. Determine which NE is faulty if possible.
l Check the high-speed signal alarms and then the low-speed signal alarms.
Alarm signal streams show that high-speed signal alarms often cause low-speed signal
alarms. Therefore, clear high-speed signal alarms first.
l Analyze alarms of high severity and then analyze alarms those of low severity.
Analyze critical or major alarms first, and then analyze minor alarms.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 37


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

5.2 Using Tools to Diagnose Faults

NOTICE

FusionServer Tools Toolkit and Smart Provisioning can be used only after services on the
server are stopped. Notify the customer to back up data before using the tools.

Table 5-1 Diagnosis tools


Tool Supported Feature Document Link
Server

FusionSer Huawei l Hardware information FusionServer Tools 2.0


ver Tools FusionServer collection Toolkit User Guide
Toolkit V2 and V3 l Quick diagnosis
servers
l Tests for CPUs, drives, and
DIMMs
l Reference tools and scripts
for common configuration
and deployment
l Creation of a bootable
USB flash drive for easy
O&M
l Automatic configuration
diagnosis for channel
partners

Smart Huawei l OS installation Smart Provisioning User


Provision FusionServer l RAID configuration Guide
ing V5 servers
l Firmware upgrade
l Configuration import and
export
l Hardware diagnosis
l Log collection

5.3 Handling Alarms


This section describes how to use the server management system to handle alarms. Search for
alarm codes in the alarm handling manual to find the handling methods. See Table 5-2 to
obtain the server alarm handling manual.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 38


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Table 5-2 Methods for handling alarms


Server Type Reference

E9000 For details, see the FusionServer Pro E9000 Server V100R001
HMM Alarm Handling.
To check switch module alarms, run the following commands on
the Ethernet switching plane:
l display trapbuffer
l display alarm active
l display alarm history
NOTE
For details about how to log in to the Ethernet switching plane of a switch
module, see 8.15 Logging In to a Server Over a Network Port by Using
PuTTY, 8.16 Logging In to a Server Over a Serial Port by Using
PuTTY, or 8.17 Logging In to a Compute Node, Passthrough Module,
or Switch Module by Using the SOL Function of the MM910.

E6000 For details, see the E6000 Server V100R002 Alarm Reference.

Rack server For details, see the FusionServer Pro Rack Server iBMC Alarm
Handling.

X6000 For details, see the FusionServer Pro X6000 Server iBMC (Earlier
than V250) Alarm Handling or X6000 Server Alarm Handling
(iMana 200).

X8000 For details, see the X8000 Server V100R001 Alarm Reference.

X6800 For details, see the FusionServer Pro X6800 Server iBMC (Earlier
than V250) Alarm Handling.

G2500 For details, see the G2500 Server 1.0.0 iBMC Alarm Handling.

FusionServer G5500 For details, see the FusionServer G5500 Server Alarm Handling.

Atlas 800 AI server For details, see the Atlas 800 AI Server (Model 3010) iBMC Alarm
(model 3010) Handling.

5.4 Using Error Codes to Locate Faults


The following servers support the fault diagnosis LED: RH1288 V3, RH2288 V3, RH2288H
V3, RH5885 V3, 5288 V3, Atlas 800 AI (model 3010), 1288H V5, 2288H V5, 2488 V5,
2488H V5, and 5885H V5. Table 5-3 describes the status of the fault diagnosis LED and their
meanings. Figure 5-1 shows the location of the fault diagnosis LED on the RH1288 V3. See
the alarm handling manual to find the handling methods for the error codes displayed on the
fault diagnosis LED.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 39


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Table 5-3 Error codes


Module Name Displayed Meaning Diagnosis
Information Procedure

Fault diagnosis LED --- The server is N/A


operating properly.

Error code A component fault For details about


has occurred. error codes, see the
section "Error Code
Handling" in the
FusionServer Pro
Rack Server iBMC
Alarm Handling or
Atlas 800 AI Server
(Model 3010) iBMC
Alarm Handling.

Figure 5-1 Location of the fault diagnosis LED

5.5 Checking Indicators to Locate Faults


For details about the positions of indicators, see the sections about the front and rear panels in
the user guide of the specific server. To obtain the user guide of each server, perform the
following steps:
1. Log in to the Support > Intelligent Servers or Support > AI Computing Platform
page.
2. Choose a server model to access the product page.
3. On the Documentation tab page, choose Operation & Maintenance > User Guide.
4. View the required user guide.

Process
Figure 5-2 shows the process for checking the indicators.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 40


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Figure 5-2 Process for checking the indicators

Indicators Available on All Servers


Step 1 Observe the status indicators of the servers.

Table 5-4 Status indicators

Indicator Status Meaning Diagnosis

Health indicator Steady green The server is N/A


(HLY) operating properly.

Blinking red A fault alarm is 1. Log in to the


generated. iMana 200 or
iBMC WebUI to
view the alarm
information. For
details, see
section "Basic
Operations" in
the
corresponding
iMana user guide
or section
"Alarm and
SEL" in the
corresponding
iBMC user
guide.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 41


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Off The device is 2. (Optional) View


powered off or is the error code on
faulty. the fault
diagnosis LED
on the front
panel. For
details, see 5.4
Using Error
Codes to Locate
Faults.

Power indicator Steady green The server is N/A


(PWR) powered on.

Blinking yellow The iMana 200 or


iBMC management
system is being
started. In this case,
you cannot power on
or off the server by
pressing the power
button.

Steady yellow The server is ready Press the power


to power on. button to power on
the server. If the
server cannot be
powered on, log in
to the iMana 200 or
iBMC WebUI and
view the alarm to
rectify the fault.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 42


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Off The server is not 1. If the iMana 200


connected to a or iBMC WebUI
power source. is accessible, log
in and check for
any alarms.
2. If the iMana 200
or iBMC WebUI
is inaccessible,
check whether
the PSU
indicator and
management
module indicator
on the rear of the
E9000 chassis
are steady green.
If yes, the chassis
power supply is
normal. If no,
check the
external power
supply.
3. If you have
confirmed that
the power supply
and PSUs of an
E9000 server are
normal, the
compute nodes
are faulty.
Contact Huawei
technical support
to replace the
compute nodes.
Do not reseat the
compute nodes
or power on or
off the chassis.

UID button/indicator Steady blue The server is being


located.

Blinking blue The server is


distinguished from
multiple servers that
have also been
located.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 43


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Off The server is not NOTE


being located or is The UID indicator is
located on the UID
not connected to a
button. The UID
power source. indicator displays a
steady blue light to
help identify and
locate a server. To
turn on or off the
UID indicator, you
can manually press
the UID button or run
commands on the
iMana 200 or iBMC
CLI.
To reset iMana 200 or
iBMC, hold down the
UID button for 4 to 6
seconds.

Step 2 View iMana 200 or iBMC system event logs (SELs) to locate faults.

Step 3 Check the status indicators of the components.


l Table 5-5, Table 5-6, Table 5-7, and Table 5-8 describe the drive indicators, NVMe
PCIe SSD indicators, PSU indicators, and NIC indicators on all servers and the
corresponding handling procedures.
l Table 5-9 describes the module indicators on the RH5885 V2, RH5885 V3, and
RH5885H V3 and the corresponding handling procedures.
l Table 5-10 describes the module indicators on the RH8100 and X6800 and the
corresponding handling procedures.
l Table 5-11, Table 5-12, and Table 5-13 describe the MM910 management module
indicators, E9000 fan module indicators, and E9000 switch module indicators and the
corresponding handling procedures.

Table 5-5 Drive status indicators


Drive Active Drive Fault Meaning Diagnosis
Indicator Indicator

Steady green Off The drive is N/A


operating properly.

Blinking green Data is being read


from or written to the
drive.

Steady green Blinking yellow The server is


locating the drive or
Blinking green rebuilding RAID.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 44


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Drive Active Drive Fault Meaning Diagnosis


Indicator Indicator

Steady green, Steady yellow The drive is faulty. Log in to the iMana
blinking green, or 200 or iBMC and
off use FusionServer
Tools Toolkit or
Smart
Provisioning to
check for any drive
faults.

Off Off The drive is faulty or Check whether the


not detected. drive is connected,
or log in to the
iMana 200 or iBMC
and use
FusionServer Tools
Toolkit or Smart
Provisioning to
check for any drive
faults.

The NVMe PCIe SSD indicators are available only on high-density servers, specific rack servers
(RH1288 V3, RH2288 V3, RH2288H V3, RH5288 V3, RH5885 V3, RH5885H V3, and RH8100 V3),
and CH225 V3 compute nodes in the E9000 blade server.

Table 5-6 NVMe PCIe SSD indicators


NVMe PCIe SSD NVMe PCIe SSD Meaning Diagnosis
Active Indicator Fault Indicator

Steady green Off The NVMe PCIe N/A


SSD is detected and
working properly.

Blinking green at 2 Read/Write


Hz operations on the
NVMe PCIe SSD
are in progress.

Off The NVMe PCIe


SSD is not detected.

Steady green or off Steady yellow The NVMe PCIe Reseat the NVMe
SSD is faulty. PCIe SSD. If the
problem persists,
replace the SSD.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 45


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

NVMe PCIe SSD NVMe PCIe SSD Meaning Diagnosis


Active Indicator Fault Indicator

Off Blinking yellow at 2 The NVMe PCIe N/A


Hz SSD is being hot-
swapped.

Blinking yellow at The hot removal of Remove the NVMe


0.5 Hz the NVMe PCIe PCIe SSD.
SSD is complete and NOTE
the SSD can be If the fault indicator
removed. is blinking yellow at
0.5 Hz after the
NVMe PCIe SSD is
inserted, reseat the
SSD.

Table 5-7 PSU status indicators


Indicator Status Meaning Diagnosis

PSU operating Steady green The power supply is N/A


status indicator (460 normal.
W/750 W/800 W/
1200 W) Off The PSU has no Check whether the
power, or the system power cable is
is on standby or connected properly
abnormal. to the PSU and
whether the PSU is
normal.

PSU operating Steady green The PSU is N/A


status indicator operating properly.
(2000 W/2500 W/
3000 W) Blinking green (once The PSU is in If the fault occurs in
every 2 seconds) hibernation or is not an E9000 server,
connected properly. check whether
hibernation settings
are enabled. If
hibernation settings
are disabled or the
fault occurs in
another type of
server, check
whether the PSU is
connected properly.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 46


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Steady red The PSU is not 1. Check whether


functioning the PSU is
correctly. functioning
correctly.
2. If the PSU is
functioning
correctly, check
whether the
external power
supply to the
PSU is
functioning
correctly.

Off The PSU has no Check whether the


power input or is power cable is
faulty. connected properly.

PSU operating Steady green The PSU is N/A


status indicator (500 operating properly.
W/900 W/1500 W)
Blinking green (once l The power NOTE
every second) supply is normal. Do not reseat a PSU.

l The input is Check whether the


overvoltage or external power
undervoltage. supply to the PSU is
functioning
correctly.

Blinking green (four The PSU is being N/A


times every second) upgraded online.

Steady orange The input is normal, Reseat the PSU and


but no power output check whether the
is supplied due to fault is rectified. If
overheat protection, the fault persists,
overcurrent replace the PSU.
protection, short
circuit protection,
output overvoltage
protection, or some
component failures.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 47


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Off The PSU has no 1. Check whether


power input or is the PSU is
faulty. functioning
correctly.
2. If the PSU is
functioning
correctly, check
whether the
external power
supply to the
PSU is
functioning
correctly.

Table 5-8 NIC indicators


NIC Type Chip Port Indicator Network Operation
Mode Status Status
l

l SM211 i350 Active Blinking Data is being N/A


flexible yellow transmitted on
NIC with the network.
two
flexible Off No data is 1. Connect the network
GE ports being port to another
transmitted on switch and to another
l SM212 the network. network cable to
flexible determine whether
NIC with the original switch
four GE and network cable
electrical are faulty.
ports
2. Check whether the
NIC is properly
operating.

Link Steady The network N/A


green connection is
normal.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 48


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

NIC Type Chip Port Indicator Network Operation


Mode Status Status
l

Off The network 1. Connect the network


connection is port to another
unavailable. switch and to another
network cable to
determine whether
the original switch
and network cable
are faulty.
2. Check whether the
NIC is properly
operating.

SM231 82599 Active Steady No data is N/A


flexible NIC yellow being
with two transmitted on
10GE optical the network.
ports
Blinking Data is being
yellow transmitted on
the network.

Link Steady The network


green connection is
normal.
Blinking
green

Off The network 1. Connect the network


connection is port to another
unavailable. switch and to another
network cable to
determine whether
the original switch
and network cable
are faulty.
2. Check whether the
NIC is properly
operating.

SM233 X540 Link Steady High speed N/A


flexible NIC Speed green (10 Gbit/s)
with two
10GE
electrical
ports

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 49


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

NIC Type Chip Port Indicator Network Operation


Mode Status Status
l

Steady Low speed (1 1. Connect the network


yellow Gbit/s) port to another
switch and to another
network cable to
determine whether
the original switch
and network cable
are faulty.
2. Check whether the
NIC is properly
operating.

Off The network 1. Connect the network


connection is port to another
unavailable. switch and to another
network cable to
determine whether
the original switch
and network cable
are faulty.
2. Check whether the
NIC is properly
operating.

Link/ Steady No data is N/A


Active green being
transmitted on
the network.

Blinking Data is being


green transmitted on
the network.

Off The network 1. Connect the network


connection is port to another
unavailable. switch and to another
network cable to
determine whether
the original switch
and network cable
are faulty.
2. Check whether the
NIC is properly
operating.

l SM251 CX3 Active Steady The network N/A


flexible green connection is
NIC with normal.
one 56G

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 50


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

NIC Type Chip Port Indicator Network Operation


Mode Status Status
l

IB optical Blinking The network 1. Connect the network


port green connection is port to another
l SM252 abnormal. switch and to another
flexible network cable to
Off The network determine whether
NIC with
connection is the original switch
two 56G
unavailable. and network cable
IB optical
ports are faulty.
2. Check whether the
NIC is properly
operating.

Link Steady No data is N/A


yellow being
transmitted on
the network.

Blinking Data is being


yellow transmitted on
the network.

Off The network 1. Connect the network


connection is port to another
unavailable. switch and to another
network cable to
determine whether
the original switch
and network cable
are faulty.
2. Check whether the
NIC is properly
operating.
NOTE
For details, visit the official websites of the PCIe card vendors.

----End

Indicators Available Only on the RH5885 V2, RH5885 V3, and RH5885H V3

Table 5-9 Module indicators on the RH5885 V2, RH5885 V3, and RH5885H V3
Indicator Status Meaning Diagnosis

Power indicator on a Steady green The memory riser is N/A


memory riser on.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 51


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Off The memory riser is


off.

Memory riser fault Steady red A DIMM on the Locate the faulty
indicator memory riser is DIMM according to
faulty. the DIMM fault
locating indicator,
and replace the
faulty DIMM with a
spare one.

Off All DIMMs on the N/A


memory riser are
normal.

DIMM fault locating Steady red The DIMM is faulty. After you remove
indicator the memory riser
and hold down the
DIMM fault locating
button, the indicator
of the faulty DIMM
turns on.

Off The DIMM is N/A


operating properly.

Memory riser Steady green Memory mirroring N/A


mirroring indicator has been configured
(available only on on the memory riser.
the RH5885H V3)
Off Memory mirroring
has not been
configured on the
memory riser.

Status indicator on a Steady yellow The PCIe card is If this indicator is


hot-swappable PCIe abnormal, or the steady yellow but
card server is in the the server is not in
power-on self-test the POST phase,
(POST) phase. check and replace
the PCIe card.

Off The PCIe card is N/A


operating properly.

Power indicator on a Steady green The power supply to N/A


hot-swappable PCIe the PCIe card is
card normal.

Blinking green The PCIe card is


powering on or off.

Off The PCIe card is off.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 52


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Diagnostic panel on Steady green A fault alarm is For details, see 2.5.1
the RH5885 V2 generated for the "Components on the
server server component. Front Panel" and
2.5.2 "Indicators and
Buttons" in the
RH5885 V2 Server
(8S) V100R001C02
User Guide.

Off The server N/A


component is
operating properly.

Fault diagnosis Steady red A fault alarm is For details, see 2.4
panel on the generated for the "Indicators and
RH5885 V3 server server component. Buttons" in the
RH5885 V3 Server
V100R003 User
Guide.

Off The server N/A


component is
operating properly.

Indicators Available Only on the RH8100 V3 and X6800

Table 5-10 Module indicators on the RH8100 and X6800


Indicator Status Meaning Diagnosis

RH8100 V3 fan Steady green The fan module Check whether the
module indicator hardware or fan module
backplane is faulty hardware or
or the fan module backplane is faulty
software is and whether the fan
performing an module software is
online upgrade. (An performing an
online upgrade takes online upgrade.
about 3 minutes.)

Blinking green The fan module is N/A


(once every 2 properly
seconds) communicating with
the iBMC.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 53


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Blinking green (four The communication Log in to the iBMC


times every second) between the fan WebUI and check
module and the whether the iBMC is
iBMC is abnormal. running properly.
l If the iBMC
software is
abnormal,
upgrade the
iBMC software
or replace the
high-
performance
fusion console
(HFC). For
details, see 6
Software and
Firmware
Upgrade.
l If the iBMC is
normal, reseat
the fan module.
If the alarm
persists, replace
the fan module.

Steady red The fan module Reseat the fan


hardware or module. If the alarm
backplane is faulty. persists, replace the
fan module.

Blinking red The fan module has Reseat the fan


an alarm, or the fan module. If the alarm
module hardware or persists, replace the
backplane is fan module.
damaged.

Off The fan module is N/A


not powered on.

Fan module Steady green The fan module is N/A


operating status operating properly.
indicator on the
X6800 Steady red The fan module is Replace the faulty
faulty. fan module.

Off The fan module has Check whether the


no power supply. fan module is
installed properly.

Memory riser Steady green The memory riser is N/A


button/status operating properly.
indicator

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 54


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Blinking green The memory riser is


in the intermediate
state of hot swap.

Blinking red (once The memory riser is View the iBMC


every second) not operating event alarm logs to
properly. check whether the
memory riser is
faulty.

Blinking red (five The memory riser is Check whether the


times every second) not installed memory riser is
properly. installed properly.

Off The memory riser is


off.

Memory riser ATTN Steady yellow The hot insertion or Check whether
indicator removal operation services can be
has failed. migrated or stopped.
After services are
stopped, power off
and then power on
the server.
l If the indicator is
off, attempt to
hot-swap the
memory riser
again. If hot
swap fails again,
replace the
memory riser and
DIMMs on it.
l If the indicator is
steady yellow,
replace the
memory riser and
DIMMs on it.

Blinking yellow The memory riser is N/A


waiting to cancel the
hot swap operation.
To cancel the
operation, press the
memory riser button
again within 5
seconds.

Off The hot insertion or


removal operation is
normal.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 55


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Memory riser Steady green The memory riser is N/A


backup indicator idle.

Off The memory riser is


not idle.

Memory riser Steady green Memory mirroring N/A


mirroring indicator has been configured
on the memory riser.

Off Memory mirroring


has not been
configured on the
memory riser.

Compute module Steady green The compute N/A


status indicator module is operating
properly.

Blinking red (once The compute View the iBMC


every second) module is faulty. event alarm logs to
check whether the
compute module is
faulty.

Blinking red (five The compute Check whether the


times every second) module is not compute module is
installed properly. installed properly.

Off The compute


module is not
powered on.

Indicators available only on the E9000

Table 5-11 MM910 management module indicators


Indicator Status Meaning Diagnosis

Power indicator Steady green The MM910 has N/A


(PWR) on the been powered on.
MM910
Blinking green The MM910 is
being powered on.

Off The MM910 is not Check whether the


powered on. MM910 is installed
properly.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 56


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Health status Steady green All components N/A


indicator (HLY) on inside the chassis are
the MM910 operating properly.

Blinking red (once A major alarm is Check whether the


every second) generated for a MM910 is installed
component in the properly and log in
chassis. The to the HMM WebUI
indicators on both to view alarms.
the active and
standby MM910s
are red.

Blinking red (four A critical alarm is


times every second) generated for a
component in the
chassis, and the
indicators on both
the active and
standby MM910s
are red.

Blinking red (five The MM910 is not


times every second) installed properly.

Off The MM910 is not N/A


powered on or is
being powered on.

Active/standby Steady green The MM910 is N/A


status indicator active.
(ACT) on the
MM910 Off The MM910 is in
standby mode.

Table 5-12 E9000 fan module indicators


Indicator Status Meaning Diagnosis
Procedure

Fan module Blinking green The fan module is N/A


operating status (once every 2 operating properly.
indicator on an seconds)
E9000

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 57


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis


Procedure

Blinking green (four The communication Check whether the


times every second) between the fan fan module is faulty
module and the by inserting it into a
MM910 is working slot. Check
abnormal, and the whether the slot is
fan module has no faulty by inserting a
alarm. working fan module
into that slot.

Blinking red (once The fan module has 1. Log in to the


every 2 seconds) reported an alarm. HMM WebUI
and check fan
alarms.
2. Check whether
the power
connector of the
fan module is
connected
properly. If it is
connected
properly, replace
the fan module.

Off The fan module has Check whether the


no power supply. fan module is
installed properly
and whether its
control circuit is
functioning
correctly.

Table 5-13 E9000 switch module indicators


Indicator Status Meaning Diagnosis
Procedure

Stack status Steady green A switch module N/A


indicator (STAT) that can be stacked
is active in stacking
mode or is not
stacked, and is
operating properly.

A switch module
that cannot be
stacked is operating
properly.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 58


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis


Procedure

Blinking green A switch module


that can be stacked
is standby or slave
in stacking mode
and is operating
properly.

A switch module
that cannot be
stacked is being
powered on.

Off The switch module


is not powered on.

Health indicator Steady green The switch module N/A


(HLY) is operating
properly.

Blinking red The switch module Log in to the HMM


has a fault alarm or WebUI to view
is not installed event alarms, and
properly. check whether the
switch module is
installed and
operating properly.

Off The switch module N/A


is not powered on.

GE electrical port Steady green The network is N/A


indicator connected properly.

Blinking green Data is being


transmitted.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 59


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis


Procedure

Off No data is being 1. Connect the


transmitted or the network port to
network is another switch,
disconnected. optical cable, and
optical module to
determine
whether the
switch and
optical cable are
functioning
correctly and
whether the
optical module is
of the correct
type and speed.
2. Check the NIC
status in the OS.
3. Check whether
the ports on the
switch and NIC
are up.

l Connection Steady green The port is N/A


status indicator connected properly.
of a 10GE
optical port Off The port is not 1. Connect the
connected properly. network port to
l 25GE optical another switch,
port connectivity optical cable, and
status indicator optical module to
determine
whether the
switch and
optical cable are
functioning
correctly and
whether the
optical module is
of the correct
type and speed.
2. Check the NIC
status in the OS.
3. Check whether
the ports on the
switch and NIC
are up.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 60


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis


Procedure

l Data Blinking orange Data is being N/A


transmission transmitted or
status indicator received over the
of a 10GE port.
optical port
Off No data is being
l Data transmitted over the
transmission port.
status indicator
of a 25GE
optical port

40GE optical port Steady green The network is N/A


indicator connected properly.

Blinking green Data is being


transmitted.

Off No data is being 1. Connect the


transmitted or the network port to
network is another switch,
disconnected. optical cable, and
optical module to
determine
whether the
switch and
optical cable are
functioning
correctly and
whether the
optical module is
of the correct
type and speed.
2. Check the NIC
status in the OS.
3. Check whether
the ports on the
switch and NIC
are up.

l Connection Steady orange Signals are not Check whether the


status indicator synchronized network cable is
of the 8G FC between the port on connected properly
optical port the switch module and whether the
l Data and the port on the optical module and
transmission peer device. NIC are normal.
status indicator Blinking orange The port is disabled.
for the 16G FC (once every 2
optical port seconds)

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 61


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis


Procedure

Blinking orange The port is not


(twice every 1 functioning
second) correctly.

Off If the connection


status indicator is
off, no optical
module is installed
or the optical
module is not
receiving optical
signals properly.

l Connection Steady green The port is N/A


status indicator functioning correctly
of an 8G FC and its link is
optical port connected.
l Connection Blinking green The port is If the peer device is
status indicator (once every 2 functioning correctly a switch, check
for the 16G FC seconds) but isolated. No link whether the working
optical port is set up. mode of the CX912
matches that of the
peer device. For
details, see the
FusionServer Pro
E9000 Server
V100R001
Deployment Guide.
If the peer device is
a storage device,
check the port.

Blinking green A port inloop N/A


(twice every second) (diagnosis mode)
occurs.

Blinking green (four The link is


times every second) connected and data
is being transmitted.

Off If the diagnosis Check whether the


status indicator is optical module is
off, no optical installed and
module is installed operating properly
or the optical and whether the
module is not optical cable is
receiving optical faulty.
signals properly.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 62


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis


Procedure

Data transmission Blinking orange An overtemperature View the iMana 200


status indicator of an (twice every second) alarm is generated if or iBMC event
8G FC optical port the connection status alarm logs to check
on the CX911 indicator is blinking whether an
green. overtemperature
alarm is generated.

Blinking orange Data is being N/A


(more than twice transmitted or
every second) received over the
port.

Off No data is being


transmitted over the
port.

Connection status Steady green The link is N/A


indicator of an 8G connected properly.
FC optical port on
the CX911 Blinking green The CX911 is being
(once every second) registered, or the
port is in the
diagnostic state.

Blinking green The link is not Check the port,


(twice every second) connected properly optical module, and
or the port is not optical cable.
functioning
correctly. An
overtemperature
alarm is generated if
the data
transmission status
indicator is blinking
orange twice every
second.

Off No optical module is Check the optical


installed or the module and optical
optical module is cable.
receiving optical
signals abnormally.

l InfiniBand (IB) Steady green The port is N/A


optical port connected properly.
status indicator
Blinking green Data is being
l OPA port status transmitted or
indicator received over the
port.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 63


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis


Procedure

Off The port is not


connected.

5.6 Handling Faults Based on Symptoms


Table 5-14 lists the minimum configuration of servers.

Table 5-14 Minimum configuration of servers

Model Minimum Remarks


Configuration

RH1288 V3, RH2288 V3, One CPU in the CPU1 None


RH2288H V3, and 5288 V3 socket

One DIMM in the


DIMM000 (A) slot

RH8100 V3 (8P) One CPU in the CPU1 Dual system mode (one
socket PSU in any slot)

One memory board in slot


1

One DIMM in the


DIMM000 slot

One HFC board in the


HFC2 slot

RH8100 V3 (dual-system One CPU in the CPU1 slot Dual system primary 4P
primary 4P) (one PSU in any slot)
One memory board in slot
1

One DIMM in the


DIMM000 slot

One HFC board in the


HFC2 slot

RH8100 V3 (dual system One CPU in CPU5 slot Dual system secondary
secondary 4P) 4P (one PSU in any slot)
One memory board in slot
9

One DIMM in the


DIMM000 slot

One HFC board in the


HFC1 slot

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 64


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Model Minimum Remarks


Configuration

RH5885 V3 Two CPUs in the CPU1 One PSU in any slot


and CPU2 sockets

One DIMM in the


DIMM000 slot

RH5885H V3 Two CPUs in the CPU1 One PSU in any slot


and CPU2 sockets

One DIMM in the DIMM


A1 slot of the first memory
board

CH121 V5, CH242 V5, One CPU in the CPU1 None


CH121L V5, and CH221 V5 socket

One DIMM in the


DIMM000 slot

5.6.1 Power Failures


The terms depicting server power status are defined as follows:
l Power connected: The server is connected to a power source and the power indicator is
on.
l Standby: The server is connected to a power source and the power indicator is steady
yellow.
l Power-on: The server is on and the power indicator is steady green.
l POST: The server is in the power-on self-test process.
Diagnose and rectify power failures depending on the symptoms.

l If a fault can be located using logs or tools, see "Handling Procedure". If a fault needs to be rectified
quickly onsite, see "Quick Recovery Method".
l For more fault symptoms and solutions, see the Intelligent Computing Case Library. The
Intelligent Computing Product Case Query Assistant is available only to Huawei partners and
Huawei engineers.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 65


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Sympto
m

A PSU is 1. Check the PSU indicator and record 1. Check whether the current
faulty (the any alarms on the iMana 200 or configuration has sufficient
PSU has iBMC WebUI. For details, see 5.5 power supplies.
no power Checking Indicators to Locate l If yes, services are not
output and Faults. affected.
the health NOTE
indicator l If no, contact Huawei
l For E9000 servers, record alarms technical support.
is blinking on the MM910 WebUI.
red). 2. Replace the faulty PSU with a
2. Check whether an "AC lost" alarm
spare PSU. Do not install the
is generated.
faulty PSU into a server again.
l If yes, check that the power
cable is connected properly and
that the PDU is supplying power
properly.
l If no, go to 3.
3. Replace the PSU with a spare PSU
and check whether the fault is
rectified.
l If yes, no further action is
required.
l If no, go to 4.
4. Replace the PSU backplane or
replace the mainboard if no PSU
backplane is configured. Check
whether the fault is rectified.
l If yes, no further action is
required.
l If no, contact Huawei technical
support.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 66


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Sympto
m

The rack 1. Check whether the external power Follow the handling procedure to
server or supply to the rack server is normal. replace any faulty modules.
Atlas 800 l If yes, go to 2.
AI server
(model l If no, resolve this issue.
3010) has 2. Replace the PSU with a normal one
no power. and check whether the fault is
(All rectified.
indicators l If yes, no further action is
are off.) required.
l If no, go to 3.
3. Replace the mainboard and PSU
backplane and check whether the
fault is rectified.
l If yes, no further action is
required.
l If no, contact Huawei technical
support.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 67


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Sympto
m

The 1. Check whether the external power Follow the handling procedure to
chassis supply to the chassis is normal or replace any faulty modules.
where a whether a power overload has
blade occurred.
server or a 2. Remove all compute nodes, switch
high- modules, management modules and
density fan modules, label them with the
server is slot numbers, and check whether
located their power connectors are normal.
has no
power. 3. Remove all PSUs, install the PSUs
back one at a time in ascending
order by slot number (ensure that
only one PSU is installed at the
same time), and check whether the
chassis can be connected to the
power source. If the chassis cannot
be connected to the power source
no matter which PSU is installed,
replace the chassis.
4. If the chassis cannot be connected
to the power source after a PSU is
installed, replace the PSU.
5. After verifying that the chassis and
PSUs can be connected to the
power source, install only one PSU.
Then install the switch modules,
compute nodes, fan modules and
management modules one at a time
in ascending order by slot number,
and check whether the module can
be connected to the power source.
6. After the fault is rectified, install the
switch modules, compute nodes, fan
modules and management modules
back into their original slots.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 68


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Sympto
m

The 1. Remove the compute node or server 1. Remove the faulty compute
chassis of node, and check whether its power node or server node. Check
a blade connector is damaged. whether other compute nodes or
server or l If yes, replace the compute node server nodes work properly. (Do
high- or server node mainboard or not install the node into a server
density replace the chassis. again.)
server has l If yes, services are not
power but l If no, go to 2.
affected.
a compute 2. Do not install the faulty compute
node or node or server node into a server l If no, contact Huawei
server again. Install a spare component technical support.
node does when available. 2. Follow the handling procedure
not. to replace any faulty modules.

5.6.2 KVM Login Faults


1. Diagnose the fault based on the fault symptoms listed in the following table.

l If a fault can be located using logs or tools, see "Handling Procedure". If a fault needs to be
rectified quickly onsite, see "Quick Recovery Method".
l For more fault symptoms and solutions, see the Intelligent Computing Case Library. The
Intelligent Computing Product Case Query Assistant is available only to Huawei partners and
Huawei engineers.
2. If the KVM connection is abnormal, you are advised to use the Independent Remote
Console for login.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 69


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

The KVM 1. Use a third-party tool, such as 1. Follow the handling procedure
is PuTTY, to run the telnet IP to replace any faulty modules.
inaccessibl address:8208 command to check 2. Restart iMana 200/iBMC and
e. whether the KVM port is normal. replace the local PC.
The default port number is 8208.
Log in to the iMana 200 or iBMC 3. Connect the management
WebUI, choose Configuration > network port to the local PC
Services, and check the VMM directly instead of through a
parameter to obtain the actual port switching network.
number. If Telnet access is
unavailable, use a PC to directly
connect to iMana 200 or iBMC for
troubleshooting.
2. Clear all browser and Java cache
and close all browsers. Then re-log
in to iMana 200 or iBMC.
3. Adjust the Java security level to
medium or lower, or add the KVM
address to the Java exception sites.
4. Check the OS and browser versions
on the client. Firefox 23.0 is
recommended. For details about the
operating environment
requirements, see the iMana 200 or
iBMC help document.

The KVM l If the number of login users exceeds


displays an the maximum value, use the iBMC
error WebUI or CLI to check whether
message. other users are using the KVM. If
other users are using the KVM,
restart iMana 200 or iBMC to force
the users to log out.
l If the KVM displays a message
indicating an unauthorized user,
clear all browsers and the Java
cache, and close all the browsers.
Then re-log in to iMana 200/iBMC.
l If the input signal is out of range,
check whether the OS resolution
exceeds the maximum value 1280 x
1024.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 70


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

Login to l If the keyboard and mouse cannot


the KVM is be used but services are operating
successful, properly, reset the USB and check
but the whether the problem is solved.
KVM – If yes, no further action is
functions required.
abnormally.
– If no, restart the service system,
clear the CMOS, and upgrade
iMana 200/iBMC and BIOS.
l If an ISO file fails to be mounted to
the virtual DVD-ROM drive,
attempt to log in to the virtual
DVD-ROM drive port over Telnet
to check whether the port is normal,
attempt to mount the ISO file by
using FusionServer Tools Toolkit
or Smart Provisioning to check
whether the ISO file is correct, and
upgrade the HMM, iMana 200,
iBMC, and BIOS versions.

5.6.3 POST Faults


Diagnose and rectify power-on self-test (POST) faults depending on the symptoms.

l If a fault can be located using logs or tools, see "Handling Procedure". If a fault needs to be rectified
quickly onsite, see "Quick Recovery Method".
l For more fault symptoms and solutions, see the Intelligent Computing Case Library. The
Intelligent Computing Product Case Query Assistant is available only to Huawei partners and
Huawei engineers.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 71


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

The server 1. View serial port logs to determine For a rack server or Atlas 800 AI
fails to whether the iMana 200 or iBMC server (model 3010), perform the
enter the has been repeatedly reset. following operations:
standby If the iMana 200 or iBMC has 1. Power off the server, remove
mode after been repeatedly reset, the logs and reinstall the power cables,
it powers repeatedly record the following power on the server, and check
on. (The information: whether the iMana 200 or
power ### JFFS2 load complete:
1107083 bytes loaded to iBMC is functioning correctly.
indicator is 0x8b000000
blinking l If yes, upgrade iMana 200 or
## Booting kernel from
yellow for Legacy Image at 8a000000 ... iBMC by using software of
over 5 Image Name: its current version or a later
linux-2.6.34 version.
minutes.) Image Type: ARM Linux
Kernel Image (uncompressed) l If no, check the iMana 200
Data Size: 1511292
or iBMC version. If the
Bytes = 1.4 MiB
Load Address: 86008000 version is 1.91 or later, go to
Entry Point: 86008000 2; otherwise, go to 3.
Verifying Checksum ... OK
## Loading init Ramdisk from 2. Keep the power cables removed
Legacy Image at 8b000000 ... and add a jumper cap to the
Image Name: Ramdisk Clear_BMC_PW pin on the
Image
Image Type: ARM Linux mainboard to attempt to restore
RAMDisk Image (uncompressed) the default settings of the iMana
Data Size: 1107019 200 or iBMC. Then reconnect
Bytes = 1.1 MiB
Load Address: 00000000 power cables.
Entry Point: 00000000 3. Replace the mainboard or BMC
Verifying Checksum ... OK
Loading Kernel Image ... board.
OK
For an E9000 server, perform the
OK
following operations:
Starting kernel ...
1. Remove and reinstall the
NOTE compute node and check
l The CH140 and CH140 V3 whether the iMana 200 or
compute nodes of the E9000 do iBMC is functioning correctly.
not provide any serial ports.
Directly ping the IP address of the l If yes, upgrade the iMana
iMana 200 or iBMC. If the ping 200 or iBMC by using
tests occasionally or always fail, software of its current
use the quick recovery method. If version or a later version.
the problem persists, contact
Huawei technical support. l If no, check the iMana 200
l During the iMana 200 or iBMC
or iBMC version. If the
startup process, the serial port on a version is 1.91 or later, go to
server is used by default. After the 2; otherwise, go to 3.
startup is complete, the serial port 2. Keep the compute node
is switched for the system serial
port.
removed and add a jumper cap
to the Clear_BMC_PW pin on
2. Contact Huawei technical support the mainboard to attempt to
to query a case or replace the restore the default settings of
mainboard. iMana 200 or iBMC. Then
reinstall the compute node.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 72


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

3. Replace the mainboard or BMC


board.

A server in 1. Collect iMana 200 or iBMC logs, 1. Remove the external PCIe
standby and query the complex devices such as NICs and FC
mode programmable logical device HBAs. Then check whether the
cannot (CPLD) register to determine fault is rectified.
power on. whether the power supply link to l If yes, no further action is
(The power the mainboard has failed. required.
indicator is 2. Check whether the mainboard
steady l If no, go to 2.
(with integrated CPUs) and
yellow.) DIMMs are installed properly. 2. Retain only the minimum server
configuration (a single CPU, a
single mainboard, and a single
DIMM). Then check whether
the fault is rectified.
l If yes, no further action is
required.
l If no, go to 3.
3. Check whether the CPUs,
mainboard, and memory
modules are faulty, and replace
the faulty components.

A server 1. Collect iMana 200 or iBMC logs, 1. Check all external power
powers off and query the CPLD register to supplies, including the PDUs,
immediatel determine whether the power PSUs, and power cables.
y when supply link to the mainboard has Replace any faulty components
powered failed. and check whether the fault is
on. NOTE rectified.
For an E9000 server, you are advised l If yes, no further action is
to use the MM910 for one-click log
required.
collection.
2. Check the power supply unit l If no, go to 2.
(PSU) backplane and the 2. Replace the mainboard or PSU
mainboard. backplane.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 73


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

The 1. Collect iMana 200 or iBMC logs, 1. Run the ipmcset -d clearcmos
message and query the CPLD register to command to clear the CMOS.
"no signal" determine whether the power Then check whether the fault is
is displayed supply link to the mainboard has rectified.
immediatel failed. l If yes, no further action is
y after the NOTE required.
server For an E9000 server, you are advised
powers on. to use the MM910 for one-click log l If no, go to 2.
collection. NOTICE
2. Set the printing level for Running the ipmcset -d
clearcmos command will
debugging the BIOS with the restore the BIOS defaults.
iMana 200 or iBMC CLI, restart Exercise caution when running
the server, and save system serial this command.
port logs. When the fault is 2. Upgrade the iMana 200 or
repeated, collect iMana 200 or iBMC, and the BIOS. Then
iBMC logs and download the .bin check whether the fault is
file of the BIOS. rectified.
The server 1. Enable the video recording l If yes, no further action is
repeatedly function on the iMana 200 or required.
powers on iBMC WebUI. l If no, go to 3.
and then 2. Set the printing level for
powers off. 3. Remove the external devices,
debugging the BIOS with the including the PCIe cards and
iMana 200 or iBMC CLI, restart HBAs. Then check whether the
the server, and save system serial fault is rectified.
port logs. When the fault is
repeated, collect iMana 200 or l If yes, no further action is
iBMC logs and download the .bin required.
file of the BIOS. l If no, go to 4.
3. Restore the default BIOS settings, 4. Retain only the minimum server
and check whether the server configuration (a single CPU, a
operates properly. single mainboard, and a single
l If yes, modify the BIOS DIMM). Then check whether
parameters in the OS side based the fault is rectified.
on actual requirements. l If yes, no further action is
l If no, collect iMana 200 or required.
iBMC logs, download the .bin l If no, go to 5.
file of the BIOS. For details, 5. Check whether the CPUs,
see the iBMC User Guide of mainboard, and memory
the corresponding version. modules are faulty, and replace
NOTE the faulty components.
For an E9000 server, you are advised to
use the MM910 for one-click log
collection.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 74


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

The POST 1. Capture the current screen.


stops 2. Collect iMana 200 or iBMC logs,
responding and query the CPLD register to
at a screen. determine whether the power
supply link to the mainboard has
failed.
3. Set the printing level for
debugging the BIOS with the
iMana 200 or iBMC CLI.
4. Enable the video recording
function on the iMana 200 or
iBMC WebUI, restart the server,
and save system serial port logs.
When the fault is repeated, collect
iMana 200 or iBMC logs and
download the .bin file of the BIOS.
5. Check the external USB devices,
CPUs, drives, DIMMs, and PCIe
devices.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 75


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

RAID self- 1. Capture the current iMana 200/ 1. If a RAID controller card
check is iBMC KVM or local KVM screen. firmware error exists, replace
suspended. 2. Collect iMana 200 or iBMC logs. the RAID controller card,
supercapacitor, or BBU. Then
check whether the fault is
rectified.
l If yes, no further action is
required.
l If no, go to 2.
2. Check whether the drives, drive
backplane, and SAS cables are
faulty.
l If yes, replace faulty
components.
l If no, go to 3.
3. If the RAID array is offline,
import it again. Then check
whether the fault is rectified.
l If yes, no further action is
required.
l If no, go to 4.
4. If the BBU or supercapacitor
runs out of power, follow the
instructions shown in the
displayed messages to keep the
server running. After the server
runs for 30 minutes, check the
BBU or supercapacitor status. If
the BBU or supercapacitor is
abnormal, replace it.

NIC 1. Check whether the NIC supports Follow the handling procedure.
Preboot PXE.
Execution 2. Check the BIOS PXE
Environme configuration. Ensure that the NIC
nt (PXE) PXE function and NIC UMC
has failed. function are enabled. To enable the
NIC PXE function, press Ctrl+S.
3. Check the NIC.
4. Check the PXE network
environment on the service side.

5.6.4 Memory Errors


Diagnose and rectify memory errors depending on the symptoms.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 76


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

l If a fault can be located using logs or tools, see "Handling Procedure". If a fault needs to be rectified
quickly onsite, see "Quick Recovery Method".
l For more fault symptoms and solutions, see the Intelligent Computing Case Library. The
Intelligent Computing Product Case Query Assistant is available only to Huawei partners and
Huawei engineers.

Fault Handling Procedure Quick Recovery Method


Symptom

The 1. Check whether the DIMMs are 1. If the iBMC generates the
memory compatible with the server by "DIMMxxx Configuration
capacity using the Intelligent Computing Error" alarm, replace the related
detected by Compatibility Checker. DIMM.
the system l If yes, go to 2. 2. If the DIMM status displayed in
is less than iBMC or the OS is abnormal
the l If no, replace the DIMM with a
compatible model specified by (unidentified or faulty), replace
configured the faulty DIMMs.
memory the Intelligent Computing
capacity. Compatibility Checker. 3. If memory mirroring or memory
2. Check whether memory mirroring rank sparing is configured in the
has been enabled in the BIOS. BIOS, the total available
memory capacity is less than the
l If yes, the memory capacity is configured physical memory
reduced by 50% due to the capacity.
memory mirroring function.
You can disable the function in 4. If the DIMMs do not comply
the BIOS. If the problem with the DIMM installation
persists, go to 3. rules, use Huawei Server
Product Memory
l If no, go to 3. Configuration Assistant to
3. Check whether the DIMM reinstall the DIMMs.
installation positions meet 5. If DIMM installation slots are
configuration rules. faulty, replace the mainboard.
l If yes, go to 4.
l If no, reinstall the DIMMs in
correct slots according to the
configuration rules.
4. Check whether a "DIMM
configuration error" alarm is
generated by iBMC.
l If yes, replace the faulty
DIMM. For details, see 5.3
Handling Alarms.
l If no, go to 5.
5. Check whether any DIMM slots
are abnormal. If a DIMM slot is
abnormal, replace the mainboard.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 77


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

An 1. Install the faulty DIMM on a 1. Switch the position of a DIMM


uncorrectab different channel, and use a test you suspect to be faulty and a
le DIMM tool to check whether the DIMM is DIMM which is functioning
error is causing the error. correctly. Then, determine
generated. l If the error is caused by the whether the fault is caused by
DIMM, replace the DIMM. the DIMM or DIMM slot.
l If the error is caused by the l If the fault is caused by the
DIMM slot, check the DIMM DIMM you suspect to be
connector. If the connector is faulty, replace the DIMM.
damaged, replace the l If the fault is caused by the
mainboard or memory board. DIMM slot, change the CPU
2. Remove the CPU connected to the with a normal one. If the
faulty DIMM channel, and check problem is caused by the
whether the CPU socket pins are CPU, replace the CPU.
damaged. Otherwise, replace the
mainboard or memory
l If yes, replace the mainboard. board.
l If no, go to 3. 2. If the preceding steps do not
3. Replace the CPU connected to the reproduce the fault, use
faulty DIMM channel. FusionServer Tools Toolkit or
Smart Provisioning to perform
memory pressure tests. If the
fault is reproduced, perform 1.
Otherwise, contact Huawei
technical support.

5.6.5 Drive I/O Faults


Diagnose and rectify drive I/O faults depending on the symptoms.

l If a fault can be located using logs or tools, see "Handling Procedure". If a fault needs to be rectified
quickly onsite, see "Quick Recovery Method".
l For more fault symptoms and solutions, see the Intelligent Computing Case Library. The
Intelligent Computing Product Case Query Assistant is available only to Huawei partners and
Huawei engineers.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 78


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

A "Disk 1. If the drive is in a RAID array and 1. If the faulty drive is not in a
Fault" the RAID array is not functioning RAID array (except drives in
alarm is correctly, troubleshoot the RAID passthrough mode), the drive
reported to array. cannot be used and needs to be
iMana 200 2. If the server has stopped, use replaced. It is recommended that
or iBMC. Smart Provisioning to inspect the you configure RAID for all
server hardware. If the server is drives and then deploy the
operating, replace the drive. redundant services.
3. If the fault persists, insert the new 2. Back up the data of redundant
drive into the slot that you suspect RAID arrays to avoid data loss.
to be faulty to check whether that 3. Follow the handling procedure
slot is faulty. to replace any faulty modules.
NOTE
For RAID controller cards that
support out-of-band management, if a
hard drive is in the Unconfigured
Good (Foreign) state, an iBMC alarm
will be generated but the fault
indicator will be off.

A RAID 1. Power off the server, swap the 1. If the redundant RAID array
controller drive that cannot be identified with fails or no RAID array is
card fails to a normal drive, and power on the configured, the related drive
identify server to check whether the drive partitions are unavailable.
one or is faulty. 2. Move the unidentified drives or
more l If the fault is caused by the all drives in the RAID array to a
drives. drive, replace the drive. standby server. Ensure that you
l If the fault is caused by the retain their order during this
drive slot, check whether SAS process and attempt to back up
cables are connected properly data.
to all SAS ports on the drive 3. Follow the handling procedure
backplane. For details, see the to replace any faulty modules.
server user guide.
l If the fault persists, go to 2.
2. Replace the RAID controller card
first, the SAS cables second, and
the drive backplane third.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 79


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

A RAID 1. Check whether the active Follow the handling procedure to


controller indicators on the drives are on. If replace any faulty modules without
card cannot they are off, ensure that both the changing the drive installation
identify power cable and drive are installed positions.
any drives. properly.
2. If the fault persists, check that the
SAS cables and signal cables are
connected properly. For details, see
"Internal Cabling" in the user
guide.
3. If the fault persists, replace any
RAID controller card first, the
SAS cables second, and the drive
backplane third.

Note: If a fault occurs on the RH2288A V2 server, check whether the cable connecting the
mainboard to the power adapter board is connected properly. Figure 5-3 shows the cable
connection.

Figure 5-3 Cable connection

5.6.6 Ethernet Controller Faults


Diagnose and rectify Ethernet controller faults depending on the symptoms.

l If a fault can be located using logs or tools, see "Handling Procedure". If a fault needs to be rectified
quickly onsite, see "Quick Recovery Method".
l For more fault symptoms and solutions, see the Intelligent Computing Case Library. The
Intelligent Computing Product Case Query Assistant is available only to Huawei partners and
Huawei engineers.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 80


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Diagnosis Procedure Quick Recovery Method


Symptom

A network 1. Ensure that the NIC type, NIC 1. If a visible NIC port becomes
port is driver, OS, BIOS version, and invisible when the server is
invisible. iMana 200 or iBMC version on the running, and services can be
server or compute node are interrupted, power the server off
compatible. and on. If the fault persists, go
l If the OS compatibility is not to 2.
specified by the Intelligent 2. Insert the NIC into another PCIe
Computing Compatibility slot and check whether the fault
Checker, contact the technical is rectified.
support team of the OS vendor l If the NIC is causing the
to resolve to problem. fault, replace the NIC.
NOTE l If the PCIe slot is causing
You are advised to used compatible
OSs specified by the Intelligent
the fault, replace the
Computing Compatibility Checker. mainboard.
l If the NIC driver version is
incompatible, upgrade the
driver before continuing.
2. To check whether the PCI device
of the NIC is visible, run the lspci |
grep -i eth* command in Linux (or
equivalent in other operating
systems) and observe the response.
l If yes, go to 4.
l If no, go to 3.
3. If the PCI device is invisible,
perform the following steps:
a. Check the logical topology of
the NIC. If the NIC PCI bus
does not have a CPU, screw-in
PCI cards connected to the bus
are invisible.
b. Power the iMana 200 or iBMC
off and then on. Check whether
the fault persists.
c. Insert the NIC you suspect to
be faulty into another slot, and
a normal NIC into the slot you
suspect to be faulty. Then check
which of these cause the fault.
4. If the PCI device is visible but its
network port is invisible, the driver
cannot be loaded. To rectify the
fault, perform the following steps:
a. Run the ifconfig ethN up
command in Linux (or

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 81


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Diagnosis Procedure Quick Recovery Method


Symptom

equivalent in other operating


systems) to ensure the
information in the network port
configuration file is consistent
with the actual physical
network ports and whether the
network ports are up.
b. If the driver fails to install
when running the compilation
script, check whether GNU C
Compiler (GCC) and C/C++
Compiler and Tools have been
correctly installed.
c. Check the optical module type.
If an Intel NIC and a non-Intel
optical module are configured,
the driver cannot be loaded and
the network port is invisible.
d. Reinstall the driver. Check that
no errors are reported during
the driver installation and check
whether system logs record any
failures when loading driver.
5. Collect OS logs.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 82


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Diagnosis Procedure Quick Recovery Method


Symptom

A 1. Check whether the network cable 1. Use the ping command to check
communica is connected properly to the whether the server or other
tion error network port. servers on the network have
occurs on a 2. Ensure that the NIC type, NIC network faults.
network driver, OS, BIOS version, and l If the fault occurs on more
port. iMana 200 or iBMC version meet than one server, check
the compatibility requirements of whether the external
the server or compute node. If the switching network is normal.
NIC driver is incompatible, l If the fault occurs only on
upgrade the driver before one server, go to 2.
continuing.
2. Check the indicator to see the
3. To check whether the network NIC port status. If the indicator
ports are up, run the ifconfig ethN is off, switch the optical
up command in Linux (the module, optical cable, and
command may vary in different uplink switch port related to the
OSs). To check whether IP faulty NIC port with those of a
addresses are set for the required normal NIC port if any of these
network ports, run the ethtool components are faulty. Then
ethN command. replace them.
4. Run the ethtool -p ethN command 3. If the NIC is causing the fault,
in Linux (the command may vary restart the server when
in other OSs) to check whether the interruption will not affect
information in the network port services, and check whether the
configuration file of the rack server communication is normal. If the
or Atlas 800 AI server (model fault persists, power the server
3010) is consistent with the actual off and on. If the fault still
physical network ports, and check persists, replace the NIC.
whether the network port status
indicators are on and whether the
network ports on the switch are up.
NOTE
The ethtool -p ethN command applies
only to plug-in PCIe cards.
5. Check whether the network ports
on the compute node and switch
module are up. For details, see
E9000 Blade Server Mezzanine
Module-Switch Module Interface
Mapping Tool.
6. Check the settings of IP addresses,
gateway addresses, VLANs,
bondings, and uplink switch
network ports.
7. Collect OS logs.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 83


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Diagnosis Procedure Quick Recovery Method


Symptom

A packet 1. Ensure that the NIC type, NIC 1. Check whether the packet loss
error or driver, OS, BIOS version, and occurs only on a single server.
packet loss iMana 200 or iBMC version meet Run the ethtool -S ethN
occurs on a the compatibility requirements of command to check the packet
network the server or compute node. If the loss type and run the top
port. NIC driver is incompatible, command to check the system
upgrade the driver before resource usage (software
continuing. interrupts, CPU usage, and
2. Check whether there are an memory usage) and NIC traffic.
increasing number of network port 2. When you have the customer's
packet losses and errors. If there is permission to interrupt services,
no continuous increase, ignore this connect a PC to the port and
error. check for packet loss. Connect
3. Insert the NIC that you suspect to the PC to other working ports,
be faulty into another slot, and and check optical modules,
insert a normal NIC into the slot optical cables, and uplink
that you suspect to be faulty. Then, switches. Then, replace or
check which of these is causing the adjust components based on the
fault. actual situation.
4. Connect the suspicious network 3. If the NIC is causing the fault,
cable to a normal server, connect a restart the server when
normal network cable to the interruption will not affect
suspicious server, and check services, and check whether the
whether the fault is caused by the communication is normal. If the
suspicious network cable. fault persists, power the server
off and on. If the fault still
5. Switch the service traffic from the persists, replace the NIC.
network port that you suspect to be
faulty to a different network port.
Then, check whether the fault is
caused by the network port.
6. To check parameters regarding the
packet error or loss, run the ethtool
-S ethN command in Linux (or
similar in other operating systems).
7. Collect OS logs.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 84


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Diagnosis Procedure Quick Recovery Method


Symptom

The 1. Ensure that the NIC type, NIC


performanc driver, OS, BIOS version, and
e of a iMana 200 or iBMC version meet
network the compatibility requirements of
port does the server or compute node. If the
not meet NIC driver is incompatible,
requiremen upgrade the driver before
ts. continuing.
2. Check whether the physical
network port meets performance
requirements.
3. Check whether the binding
between the network port interrupt
and CPU queue has been modified.
4. To check whether the TSO and
GSO settings of the network port
have been modified, run the
ethtool -k ethN command in Linux
(or equivalent in other operating
systems).
5. To check whether the network port
buffer information has been
modified, run the ethtool -g ethN
command in Linux (or equivalent
in other operating systems).
6. Collect OS logs.

5.6.7 FC Controller Faults


Common FC Controller Faults and Handling Procedures
Diagnose and rectify FC controller faults according to the symptoms.

For more fault symptoms and solutions, see the Intelligent Computing Case Library. The Intelligent
Computing Product Case Query Assistant is available only to Huawei partners and Huawei engineers.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 85


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Symptom Handling Procedure

The storage device fails 1. Connect to the switch and run the brocade: switchshow
to identify the host command to query port connection status.
World Wide Port Name 2. If the switch fails to obtain the host WWPN, the host bus
(WWPN). adapter (HBA) cannot register with the switch. In this case,
do as follows:
a. Check that the HBA and the processor connected to the
PCIe bus are installed properly.
b. (Optional) Check the mapping between the HBAs and
switch modules for E9000 and E6000 servers.
c. Check FC links between the HBA and the switch by
checking the optical cable connections and the optical
module power. If E9000 servers are used, check the HBA
work mode.
d. Ensure that the lpfc driver and firmware matching the
E9000 are installed.
e. If multiple switches are connected, check whether the
switch connection mode (AG or TR) is correct.
f. Collect the OS message logs and check lpfc driver
information for faults.
g. Collect log information of the switches.
3. If the HBA is successfully registered with the switch, the
switch obtains the host WWPN, but the storage cannot
identify host WWPNs, rectify the fault as follows:
a. Check the FC links (optical cables and modules) between
the switch and the storage device.
b. Check whether the HBA and the storage ports are in the
same zone.
c. Check whether the zone configurations are the same for
switches from the same vendor.
d. Collect the OS message logs and check lpfc driver
information for faults.
e. Collect the log information of switches.

The storage device has 1. Check whether the lpfc driver and firmware matching the
identified the HBA E9000 have been installed.
WWPN, but LUNs 2. Collect the OS message logs and check lpfc driver
cannot be mapped to information for faults.
the host.
3. Collect log information of the switches.
4. If no faults are identified, faults may exist on the storage
device or OS SCSI application layer. Contact the OS or
storage device vendor.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 86


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Symptom Handling Procedure

Some multipath links 1. Ensure that the installed lpfc driver and firmware match the
of LUNs are down. E9000.
2. Check for error codes on FC links between the HBA and the
storage device.
3. Collect the OS message log and check lpfc and multipath
driver information for faults.
4. Collect log information of the switches.
5. Contact the OS multipath driver vendor or storage device
vendor.

Poor data read/write 1. Check whether the installed lpfc driver and firmware match
performance of LUNs the E9000.
2. Check for error codes on FC links between the HBA and the
storage device.
3. Run the iostat command on the host to query the I/O delay
and concurrent I/O operations.
4. Collect the OS message log and check the lpfc driver
information and the I/O queue depth configured for the HAB
driver.
5. Perform drive performance tests (read and write 100 GB and
100 MB files).
6. Contact storage analysis engineers.

Quick Recovery from FC Controller Faults


Table 5-15 describes the common quick recovery methods and handling procedures of FC
controller faults.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 87


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Table 5-15 Quick recovery methods and handling procedures of FC controller faults
Fault Symptom Quick Recovery Method

All HBA links are 1. Check the link redundancy status.


disconnected. l If the links are redundant, reset the switch module ports
connected to the faulty HBAs, and go to 2.
l If the links are not redundant, go to 3.
2. Check whether the ports connected to the faulty HBAs are
functioning correctly.
l If yes, check whether the fault is rectified.
l If no, migrate all services, and safely power off the server.
Next, remove and reinstall the compute node, and power
on the server. If the fault persists, apply for spare HBAs to
replace the faulty ones.
3. Before contacting Huawei technical support, it is
recommended that you migrate services and collect switch
module logs, OS logs, LLD networking information, and
device time differences.

Storage services are 1. Migrate all services, and safely power off the server. Next,
affected but HBA links remove and reinstall the compute node, and power on the
are normal. server. Then, check whether the fault is rectified.
l If yes, no further action is required.
l If no, contact the storage vendor for quick fault recovery.
2. Before contacting Huawei technical support, it is
recommended that you migrate services and collect switch
module logs, OS logs, LLD networking information, and
device time differences.

Storage LUN 1. Check for FC link error codes on the FC switch module. If
performance issues error codes exist, run the porterrshow command and
determine the cause of the fault based on the port mapping
relationships.
l If any links between the switch modules and the external
switches are faulty, remove and reconnect the optical
cables and modules. If a link is still faulty and spare
components are available, replace any related optical
cables and modules and try again.
l If a link between an HBA and switch module is faulty,
move the compute node to a working slot to check
whether the fault is caused by the HBA, switch module, or
backplane. Replace any faulty modules as required.
2. Clear the error code count history, observe the error codes for
10 minutes, test the performance, and contact the storage
vendor for quick fault recovery.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 88


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

5.6.8 Switch Module Faults


Switch Module Quick Recovery Method
Rectify switch module faults depending on the symptoms.

For more fault symptoms and solutions, see the Intelligent Computing Case Library. The Intelligent
Computing Product Case Query Assistant is available only to Huawei partners and Huawei engineers.

Fault Symptom Quick Recovery Method

A switch module fails to be started. 1. Switch between active and standby MM910s and
After logging in to the switch check whether the switch module can start
module over SOL, the SOL screen normally.
displays the following: Can not get l If yes, no further action is required.
config file from smm. Begin
reboot .... l If no, go to 2.
2. Restart the baseboard management controller
(BMC) of the switch module and check whether
the switch module can be started properly.
l If yes, no further action is required.
l If no, go to 3.
3. Upgrade the switch module software to the latest
version. For details, see the "Upgrading Software
by Using U-Boot" section in the "Common
Operations" chapter of the E9000 Server
V100R001 Upgrade Guide.

A switch module fails to start. After 1. If services are running, connect the network
logging in to the switch module cable or the optical cable to the switch module
over SOL, the SOL screen displays and press Y to continue.
the following: Ensure that the 2. If no services are running, press Y to continue.
optical fibers or cables are
inserted on the same ports on the
panel after the board
replacement. During system
startup, do not power off or
remove the board. To continue
the startup, press Y:.

After logging in to a switch module Upgrade the switch module software to a specified
over SOL, the SOL screen shows version or the latest version depending on the
Critical Error! and only the meth displayed message.
port can be displayed by running
display interface.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 89


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Symptom Quick Recovery Method

A network storm occurs (the Perform one of the following operations:


Mulcast and Broadcast counters of l Run the following commands to disable the port
a port encounter a fault). with abnormal traffic:
[~HUAWEI]interface 10ge 1/17/1
[~HUAWEI-10ge 1/17/1]shutdown
l Disconnect the optical cable or network cable
from the port that has abnormal traffic.

A port is Up but no traffic passes 1. On the interface view, run the following
through the port. commands to check whether the fault is rectified:
[~HUAWEI]interface 10ge 1/17/1
[~HUAWEI-10ge 1/17/1]restart
l If yes, no further action is required.
l If no, go to 2.
2. Run the reboot command to restart the switch
module.

Incorrect packets are generated Run the display interface command and check
(running the display interface CRC and Symbols.
command shows that the value of 1. If the values of CRC and Symbols are not zero,
Total Error in the Input area is not perform the following operations:
zero and keeps increasing).
l Ensure that the optical cables are connected
properly to the faulty switch module and the
device it is directly connected to.
l Check whether any optical cables are
damaged.
l Check whether the optical modules of the
faulty switch module and the device it is
directly connected to are working properly.
l If there is a transmission device between the
switch module and its connected device,
check the transmission device gateway for
alarms.
2. If the values of CRC and Symbols are zero, run
the reboot command to restart the switch
module.

5.6.9 OS Faults
OS Installation Faults
Diagnose and rectify faults related to OS installation depending on the symptoms.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 90


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

For more fault symptoms and solutions, see the Intelligent Computing Case Library. The Intelligent
Computing Product Case Query Assistant is available only to Huawei partners and Huawei engineers.

Possible Cause Diagnosis Procedure

Incompatible OS Use the Intelligent Computing Compatibility Checker to determine


whether the OS is compatible with the server.

Incorrect Use the Intelligent Computing Compatibility Checker to check


installation compatible OSs on the server and installation description. For details,
method see the Huawei Server OS Installation Guide.

ServiceCD issue 1. Use the Intelligent Computing Compatibility Checker to


determine whether the OS installation requires the ServiceCD.
2. Ensure that the ServiceCD version is correct.
3. Check whether the installation method selected using the
ServiceCD is correct.

Installation 1. Check whether the OS installation procedure is correct. For details,


process issue see the Huawei Server OS Installation Guide.
2. Check whether the OS installation requires a physical DVD drive
or other media.
3. Check whether the OS installation requires a special installation
DVD, for example, one integrated with drivers.
4. Check whether the OS installation DVD is an original from the
manufacturer or whether it has been modified by a third party.
5. Disconnect any external storage devices.
6. Ensure that the default BIOS settings are used.
7. Ask the OS vendor for installation support.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 91


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Possible Cause Diagnosis Procedure

Drive 1. Ensure that the target drive is identified by the RAID controller,
identification and use the Intelligent Computing Compatibility Checker to
issue check whether the target drive is compatible with the server. Then
check the BIOS to see whether the target storage devices, including
SATADOMs, microSD cards, and built-in USB flash drives, are
identified.
2. Check the RAID controller card model and determine whether to
configure RAID (LSI SAS1078, LSI SAS2108, LSI SAS2208, LSI
SAS3008, LSI SAS2308, LSI SAS3108, Avago SAS 3408, Avago
SAS 3416iMR, Avago SAS 3416IT, Avago SAS 3508, Software
RAID).
NOTE
The V5 server or Atlas 800 AI server (model 3010) supports OS installation
on the drive that is managed by the standard RAID controller card.
3. Check the RAID array properties to ensure that the boot drive and
the target drive are the same or in the same RAID array.
4. Set the BIOS mode to UEFI if the drive capacity is over 2 TB.
NOTE
V1 and V3 servers do not support UEFI mode.
5. Check whether the drive is a 4K drive.
6. Check whether the loaded RAID controller card driver is correct.
7. Format the drive or reconfigure the RAID array.

OS Faults
If you have confirmed that faults are not caused by other factors, diagnose them as follows:

Fault Symptom Diagnosis Method Conclusion

The server is suspended Disable C state, P state, T state, The OS version does not
or restarted. and ASPM in the BIOS and support CPUs of the current
ensure that the server functions platform.
correctly.

Check whether the Kdump The built-in OS drivers are


information contains crashed incompatible.
process names or board vendor
names. For example, FC_XX
indicates an FC device
breakdown.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 92


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Symptom Diagnosis Method Conclusion

Check whether it is a PCIe card The PCIe card is


compatibility issue. incompatible.
l There is a power supply
issue. (A cat err alarm is
generated on iMana 200 or
iBMC.)
l The PCIe protocol is not
supported.
l There is a driver issue.

Check whether the breakdown The OS kernel is


screenshot contains CPUidle. incompatible with the
NOTE hardware platform.
The G2500 server does not NOTE
currently support this method. The G2500 server does not
currently support this method.

Use the iMana 200 or iBMC to Circuit hardware is faulty.


locate the fault. For example,
determine whether the alarm
was reported for the DIMM,
drive, or mainboard component.

Check whether the system logs A drive fault occurred.


contain read-only file system
records, and use FusionServer
Tools Toolkit or Smart
Provisioning to rate the drive.
Decide whether to replace the
drive based on the result.

Check whether an imana cat err Hardware is faulty.


alarm is displayed on iMana
200. Use the fdm log of iMana
200 to locate the fault.

Check whether there is a l The hardware is faulty.


Machine Check Exception issue. l The software or
Locate such a fault by checking hardware interface
the /var/log/mce.log and error setting is incorrect.
codes of serial port Kdump
information.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 93


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Symptom Diagnosis Method Conclusion

Collect the following Locate the fault based on


information: the report.
l For new servers, confirm the
proportion of abnormal
servers and check whether
normal and abnormal servers
have the same
configurations.
l For existing servers, confirm
the number of servers that
are not functioning correctly,
and check whether the issues
occur under specific
circumstances.
l Check iMana 200 or iBMC
for hardware alarms.
After collecting the preceding
information, use FusionServer
Tools Toolkit or Smart
Provisioning to check whether
the issue occurs on a single
server or multiple servers.

Check whether a breakdown l The new software


occurs under specific version has bugs.
circumstances after software l Original interfaces are
upgrades have been performed disabled for security
for customer service software, purposes causing issues.
database, middleware, kernel,
BIOS, management modules,
iMana 200 or iBMC, or storage
devices.

Check whether the Kdump The OS has bugs or kernel


information of the breakdown defects.
screenshot periodically displays
update_cpu_power,
divide_error, or timer_xx.
NOTE
The G2500 server does not
currently support this method.

Check whether the Kdump


information of the breakdown
screenshot non-periodically
displays gethostbyname.
NOTE
The G2500 server does not
currently support this method.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 94


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Symptom Diagnosis Method Conclusion

Check whether the breakdown The OS kernel is


screenshot contains CPUidle. incompatible with the
NOTE hardware platform.
The G2500 server does not
currently support this method.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 95


Huawei Servers
Troubleshooting 6 Software and Firmware Upgrade

6 Software and Firmware Upgrade

Table 6-1 lists the software and firmware to be upgraded and reference documents of TaiShan
servers.

Table 6-1 Upgradeable software and firmware and reference documents


Server Series Upgradable Software and Reference
Firmware

E9000 l MM910: software, complex l For details, see the upgrade


programmable logic device guide.
(CPLD), fan module firmware, To obtain the upgrade guide,
and online help perform the following steps:
l Chassis intelligent display: 1. Log in to the Support >
software Intelligent Servers page.
l Compute node: iMana 200/iBMC, 2. Choose a server model to
BIOS, and CPLD access the product page.
l Switch module: iBMC, CPLD, 3. On the Documentation
daughter card CPLD, and tab page, choose
switching plane Installation & Upgrade
> Upgrade Guide.
l Mezzanine card: firmware
4. View the required upgrade
Rack server iMana 200/iBMC, BIOS, and LCD guide.
l To obtain the upgrade
X6800 BIOS, HMM, and iBMC
package, perform the
X6000 BIOS and iMana 200/iBMC following steps:
1. Log in to the Support >
Intelligent Servers page.
2. Choose a server model to
access the product page.
3. Click the Software
Download tab.
4. Select the latest patch
version.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 96


Huawei Servers
Troubleshooting 6 Software and Firmware Upgrade

Server Series Upgradable Software and Reference


Firmware

X8000 BIOS and iMana 200 5. Download the required


upgrade package.

FusionServer BIOS, HMM, and iBMC l For details, see the upgrade
G5500 guide.
To obtain the upgrade guide,
Atlas 800 AI iBMC, BIOS, and LCD perform the following steps:
server (model
3010) 1. Log in to the Support >
AI Computing Platform
page.
2. Choose a server model to
access the product page.
3. On the Documentation
tab page, choose
Installation & Upgrade
> Upgrade Guide.
4. View the required upgrade
guide.
l To obtain the upgrade
package, perform the
following steps:
1. Log in to the Support >
AI Computing Platform
page.
2. Choose a server model to
access the product page.
3. Click the Software
Download tab.
4. Select the latest patch
version.
5. Download the required
upgrade package.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 97


Huawei Servers
Troubleshooting 7 Preventive Maintenance

7 Preventive Maintenance

About This Chapter


Preventive maintenance quickly detects, diagnoses, and rectifies server faults.
Obtain authorization from the customer before performing preventive maintenance on Huawei
servers.

NOTICE

Take protective measures to prevent ESD damage and any other damage to servers during
preventive maintenance.

7.1 Inspecting the Equipment Room Environment and Cable Layout


7.2 Inspecting Servers
7.3 Huawei Server Inspection Report

7.1 Inspecting the Equipment Room Environment and


Cable Layout

7.1.1 Precautions
Familiarize yourself with the security icons listed in Table 7-1 before preventive maintenance
to reduce the chance of injury to yourself or damage to the equipment. These security icons
will be on some server components.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 98


Huawei Servers
Troubleshooting 7 Preventive Maintenance

Table 7-1 Security icons

Icon Description

Indicates that removing the cover of this component can result in an


electric shock. To prevent an electric shock, do not remove the cover of
the component.
Warning: All components with this icon have electric shock risks and
there are no serviceable parts inside these components.

Indicates a hazard. Operation of the component may cause an electric


shock. There are no serviceable parts inside the component, and therefore
do not remove the cover of the component.
Warning: To prevent an electric shock, do not remove the cover of the
component.

Indicates that this component operates at a high temperature and touching


it can result in burns.
To prevent burns, do not touch the component until it cools down.

Indicates a hazard. Misoperations can damage the device or cause


personal injury.

Indicates that this device can cause personal injury or can fail to operate
properly if it is not externally grounded. Each end of a ground cable
should be connected to a different device, and the devices must be
connected to ground points.

Indicates that this device can cause personal injury or can fail to operate
properly if it is not internally grounded. Each end of a ground cable
should be connected to different device components, and the device must
be connected to a ground point.

Indicates an ESD-sensitive area, in which devices can easily be damaged.


To prevent damage, do not touch devices with bare hands when operating
in this area, and take ESD measures, such as wearing an ESD wrist strap
or ESD gloves.

7.1.2 Inspecting the Equipment Room Environment


The environmental factors to inspect in an equipment room include the temperature, relative
humidity, altitude, and power supply conditions.

For details, see 7.3 Huawei Server Inspection Report.

7.1.3 Inspecting Cable Layout


Visually inspect the cable layout. Obtain the customer's permission before removing or
inserting cables.

To prevent any damage to the cables, take the following precautions before inspecting the
cable layout:

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 99


Huawei Servers
Troubleshooting 7 Preventive Maintenance

l Check that power cables meet the following requirements:


– The connector surface of each three-wire power ground cable is in a good
condition.
– All power cable types are correct.
– The insulation layer of each power cable is in a good condition.
l Keep cables slack and away from heat sources.
l Do not use excessive force to install or remove a cable.
l Install or remove a cable by holding its connectors.
l Do not twist or tear cables.
l Lay out and connect cables properly, and ensure that they are not in contact with any
components that are removable or replaceable.
For details, see 7.3 Huawei Server Inspection Report.

7.2 Inspecting Servers

7.2.1 Precautions
l Obtain the customer's consent before inspecting servers. Do not modify server
configuration or power on/power off servers before obtaining written consent from the
customer.
l Before inspecting servers, obtain the iMana 200 or iBMC IP address, MM910 IP
address, and password of the root user for each server to be inspected. After inspecting
servers, advise the customer to change the password of the root user as soon as possible.

7.2.2 Inspecting Indicators


The front and rear panels of Huawei servers have various indicators and buttons, such as the
UID button/indicator, health indicator, network port status indicators, fan module indicator,
and power button/indicator. You can observe the indicators on a server to determine the server
status. For details about the indicator status and handling measures, see 5.5 Checking
Indicators to Locate Faults.

Indicators on the Front Panel


Check the following indicators on the server front panel:
l Health indicator
l Power button/indicator
l Drive indicators
l NIC status indicator (on the front NICs or on the MM610 or MM620)

Indicators on the Rear Panel


Check the following indicators on the server rear panel:
l Power indicator
l Network port/Optical port status indicators

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 100


Huawei Servers
Troubleshooting 7 Preventive Maintenance

l Fan module indicators


l E9000 switch module indicators
l E9000 management module indicators

For details, see 7.3 Huawei Server Inspection Report.

7.2.3 Using SmartKit to Perform Health Inspection


Use SmartKit to inspect server health status. SmartKit provides the following functions:

l Supports inspection for racks servers, high-density servers, blade servers, KunLun
servers, and Atlas servers, and allows users to export inspection reports.
l Supports inspection for mainstream OSs including SLES, RHEL, CentOS, VMware,
Ubuntu, and Windows, and allows users to export inspection reports.
l Supports batch log collection for BMC and blade server management modules, and
supports SLES, RHEL, and CentOS mainstream versions.
l Supports batch upgrade for BMC, BIOS, CPLD, and Smart Provisioning firmware of
rack servers, high-density servers, blade servers, KunLun servers, and Atlas servers.
l Supports firmware bundle upgrade by using the E9000 active management module.
l Supports batch configuration for PSUs, BIOSs, BMCs, and RAID controller cards of
rack servers, high-density servers, blade servers, KunLun servers, and Atlas servers.
l Supports batch configuration for E9000 management modules.

Inspection and log collection do not modify data, collect service data, or affect services, and will delete
the collection scripts and files when finished.

For details about the supported server models and detailed inspection operations, see the
FusionServer Tools 2.0 SmartKit User Guide.

7.2.4 Checking the System Status Through iBMC

Prerequisites
You can log in to the iBMC WebUI.

Procedure
Step 1 Log in to the iBMC WebUI. For details, see 8.9 Logging In to the iBMC WebUI.

Step 2 View system alarms and events.


1. On the menu bar of the iBMC WebUI, choose Alarm & SEL.
2. In the navigation tree, choose Current Alarms to view current alarms.
3. In the navigation tree, choose System Events to view system events.

Step 3 View the status of hardware, including drives, DIMMs, and sensors.
1. On the menu bar of the iBMC WebUI, choose Information.
2. In the navigation tree, choose System Info. On the right panel, click the Storage tab and
view hardware status information.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 101


Huawei Servers
Troubleshooting 7 Preventive Maintenance

3. In the navigation tree, choose Real-Time Monitoring to view the CPU usage, memory
usage, and air intake vent temperature.

– The RH5885 V3, RH5885H V3, and RH8100 V3 do not support display of the CPU usage and
memory usage.
– After iBMA 2.0 is installed and started on the server OS, the CPU usage is obtained from the
iBMA 2.0 and the CPU usage data is the same as the data collected on the OS.
– If iBMA 2.0 is not installed on the server OS or iBMA 2.0 has not completely started, the CPU
usage data is obtained from the Intel Management Engine (ME). The CPU usage is the average
compute usage per second of all CPU cores calculated by the CPU internal module.
– If iBMA 2.0 is not installed on the server OS, obtain the latest iBMA user guide and software
package, and install iBMA 2.0 by referring to the user guide.
4. In the navigation tree, choose Sensor Info to view the status of sensors.

----End

7.3 Huawei Server Inspection Report


Inspection Information
Customer Information

Customer Name

Equipment Room Equ


Address ipm
ent
Ro
om
Na
me

Equipment Room Pho


Director ne
Nu
mb
er

Inspecting Party formation

Time of
Inspection

Inspected By Phone
Number

Huawei Contact Phone


Number

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 102


Huawei Servers
Troubleshooting 7 Preventive Maintenance

Service Hotline
Enterprise China 4008229999
Region:

Enterprise global Global Service Hotline


technical assistance
center (TAC):

China region 400830218 (customer service)/800830218/02986360000


carrier TAC

Huawei engineers 8008303118/02981770177


and partners:

Global carrier TAC: 02981770999

Inspecting the Equipment Room Environment


Equipment Room Environment Inspection Results

N Item Criteria Result


o.

1 Operatin 10°C to 35°C (50°F to 95°F) □ Normal □ Abnormal


g Brief description:
temperat
ure

2 Storage –40°C to +65°C (–40°F to □ Normal □ Abnormal


temperat +149°F) Brief description:
ure

3 Tempera 20°C/h (68°F/h) □ Normal □ Abnormal


ture Brief description:
change
rate

4 Operatin 8% to 90% RH (non- □ Normal □ Abnormal


g condensing) Brief description:
humidit
y

5 Storage 5% to 95% RH (non- □ Normal □ Abnormal


humidit condensing) Brief description:
y

6 Operatin ≤3050m □ Normal □ Abnormal


g Brief description:
altitude

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 103


Huawei Servers
Troubleshooting 7 Preventive Maintenance

Equipment Room Environment Inspection Results

7 Power l AC input: 100 V to 240 V □ Normal □ Abnormal


supply AC at 50 or 60 Hz Brief description:
l DC input:
– –57.6 V to –38.4 V
DC (voltage range), –
48 V DC (nominal
voltage)
– 192 V to 288 V DC
(voltage range), 240 V
DC (nominal voltage)
– 260 V to 400 V DC
(voltage range), 380 V
DC (nominal voltage)

Inspecting Cable Layout


Cable Layout Inspection Results

N Item Criteria Result


o.

1 General Route the service cables and □ Normal □ Abnormal


cable power cables along the two Brief description:
layout sides of the cabinet
respectively.

2 Power l Power cables are not □ Normal □ Abnormal


cable tangled and are arranged Brief description:
layout in an orderly fashion.
l Power cables are arranged
in the same way as those
in any existing cabinets.
l No power cables are
coiled.

3 Service l Service cables are not □ Normal □ Abnormal


cable tangled and are arranged Brief description:
layout in an orderly fashion.
l Service cables are
arranged in the same way
as those in any existing
cabinets.

4 Optical Optical cables are not coiled □ Normal □ Abnormal


cable too tightly, bent at acute Brief description:
layout angles, or stretched.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 104


Huawei Servers
Troubleshooting 7 Preventive Maintenance

Cable Layout Inspection Results

5 Ground Ground cables are connected □ Normal □ Abnormal


cable properly. Brief description:
connecti
on

6 Cable Cable labels are properly □ Normal □ Abnormal


labels attached. The information on Brief description:
the labels is legible, correct
and easy to understand.

7 Power Power cables are connected □ Normal □ Abnormal


cable to power sockets properly. Brief description:
connecti
on

8 Signal Signal cables and data cables □ Normal □ Abnormal


cable are connected to devices such Brief description:
connect as servers and switches
or properly.

Inspecting Servers
View the inspection report generated by SmartKit to check server health status. An item has
passed the inspection if the value of Result for the item is OK in the report.

Server Inspection Results

N Item Criteria Result


o.

1 iMana Server health status logs □ Normal □ Abnormal


200/ contain no alarm information. Brief description:
iBMC
informat
ion

2 Manage Server health status logs □ Normal □ Abnormal


ment contain no alarm information. Brief description:
module
informat
ion

Inspection Conclusions and Suggestions


Huawei's preventive maintenance engineers will perform a comprehensive inspection of your
Huawei servers to quickly detect any potential problems. These engineers will then submit a
detailed inspection report, and suggestions, to help improve your service availability.
If you receive inspection results, please provide your comments and suggestions in the
following Customer's Inspection Comments and Suggestions table:

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 105


Huawei Servers
Troubleshooting 7 Preventive Maintenance

Inspection Conclusions and Suggestions

Insp Ph Date
ecte on
d By e
Nu
m
be
r

Customer's Inspection Comments and Suggestions

Ins P Date
pe h
cte o
d ne
By N
u
m
be
r

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 106


Huawei Servers
Troubleshooting 8 Common Operations

8 Common Operations

8.1 Obtaining a Product SN


8.2 Using iMana 200 to Collect Information in Batches
8.3 Using iBMC to Collect Information in Batches
8.4 Using the MM910 WebUI to Collect Information in Batches (for Versions Earlier Than
U54 2.20)
8.5 Using the MM910 WebUI to Collect Information in Batches (for U54 2.20 or Later)
8.6 Using the FusionDirector WebUI to Collection Information in Batches
8.7 Using the MM510 CLI to Collect Information (FusionServer G5500)
8.8 Logging In to the iMana 200 WebUI
8.9 Logging In to the iBMC WebUI
8.10 Logging In to the Web Tools of the MX510
8.11 Logging In to the MM910 WebUI
8.12 Logging In to the FusionDirector WebUI
8.13 Logging In to the MM510 CLI
8.14 Logging In to the RMC CLI
8.15 Logging In to a Server Over a Network Port by Using PuTTY
8.16 Logging In to a Server Over a Serial Port by Using PuTTY
8.17 Logging In to a Compute Node, Passthrough Module, or Switch Module by Using the
SOL Function of the MM910
8.18 Logging In to a Compute Node, Passthrough Module, or Switch Module by Using the
SOL Function of the MM920/MM921
8.19 Using WinSCP to Transfer Files
8.20 Configuring an FTP Server
8.21 Using SFTP to Transfer Files

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 107


Huawei Servers
Troubleshooting 8 Common Operations

8.1 Obtaining a Product SN


Overview
A serial number (SN) or equipment serial number (ESN) uniquely identifies a server and is
required when you apply for technical support to Huawei.

Check the first two digits of the product SN before reading the following information.
l If the first two digits of the product SN are 02 or 03, see Figure 8-1.

A product SN starts with SN or ESN. The following is an example.

Figure 8-1 SN example

No. Description

1 SN ID (two characters).

2 Material identification code (four characters).

3 Vendor code (two characters). The value 10 indicates Huawei


and other values indicate outsourcing vendors.

4 Year and month (two characters).


l The first character indicates the year. The digits 1 to 9
indicate 2001 to 2009, the letters A to H indicate 2010 to
2017, the letters J to N indicate 2018 to 2022, and the
letters P to Y indicate 2023 to 2032.
NOTE
The years from 2010 are represented by upper-case letters excluding
I, O, and Z because the three letters are similar to the digits 1, 0, and
2.
l The second character indicates the month. Digits 1 to 9
indicate January to September, and letters A to C indicate
October to December.

5 Sequence number (six characters).

6 RoHS compliance (one character). Y indicates


environmental-friendly processing.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 108


Huawei Servers
Troubleshooting 8 Common Operations

No. Description

7 Internal model, that is, product name.

l If the first two digits are 21, see Figure 8-2.

A product SN starts with SN or ESN. The following is an example.

Figure 8-2 ESN example

No. Description

1 SN ID (two characters), which is 21.

2 Material identification code (eight digits), that is, processing


code.

3 Vendor code (two characters). The value 10 indicates Huawei


and other values indicate outsourcing vendors.

4 Year and month (two characters).


l The first character indicates the year. The digits 1 to 9
indicate 2001 to 2009, the letters A to H indicate 2010 to
2017, the letters J to N indicate 2018 to 2022, and the
letters P to Y indicate 2023 to 2032.
NOTE
The years from 2010 are represented by upper-case letters excluding
I, O, and Z because the three letters are similar to the digits 1, 0, and
2.
The second character indicates the month. Digits 1 to 9
indicate January to September, and letters A to C indicate
October to December.

5 Sequence number (six characters).

6 RoHS compliance (one character). Y indicates


environmental-friendly processing.

7 Internal model, that is, product name.

Obtaining a Product SN
Use one of the following methods to obtain a product SN:

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 109


Huawei Servers
Troubleshooting 8 Common Operations

l Use SmartKit.
Use the server inspection function of SmartKit to obtain ESNs in batches. For details
about the product SN, see "Asset Inspection Information" > "Board SN" in the inspection
report.
l View the product label.
A product label is attached to each Huawei server. You can view the product label to
obtain its ESN. The product label position varies with the Huawei server model. For
details, see the user guide of a specific server.
– Figure 8-3 shows the product SN of a rack server.

Figure 8-3 Product SN of a rack server

– Figure 8-4 shows the product SN of an Atlas 800 AI server (model 3010).

Figure 8-4 Product SN of an Atlas 800 AI server (model 3010)

– Figure 8-5 shows the product SN of an X6800. In Figure 8-5, (1) is the product
label of the server, and (2) is the product label of a server node.

Figure 8-5 Product SN of an X6800

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 110


Huawei Servers
Troubleshooting 8 Common Operations

– Figure 8-6 shows the product SN of an E9000. In Figure 8-6, (1) is the product
label of the server, and (2) is the product label of a compute node.

Figure 8-6 Product SN of an E9000

The product labels of switch modules and MM910s are on their ejector levers.
l Use the iMana 200 WebUI.

iMana 200 applies to the following products:


l Rack servers: RH1288 V2, RH2265 V2, RH2268 V2, RH2285 V2, RH2285H V2, RH2288
V2, RH2288E V2, RH2288H V2, RH2485 V2, RH2488 V2, RH5885 V2, RH5885 V3, and
RH5885H V3
l X6000 server node: XH310 V2, XH311 V2, XH320 V2, XH321 V2, and XH621 V2
l X8000 server node: DH310 V2, DH320 V2, DH321 V2, DH620 V2, DH621 V2, DH626 V2,
and DH628 V2
l E9000 compute node: CH121, CH140, CH220, CH221, CH222, CH240, CH242, and CH242
V3

a. Log in to the iMana 200 WebUI. For details, see 8.8 Logging In to the iMana 200
WebUI.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 111


Huawei Servers
Troubleshooting 8 Common Operations

b. On the Overview page, view the product SN of the server. See Figure 8-7.

Figure 8-7 Viewing the product SN of the server

l Use the iBMC WebUI.

iBMC applies to the following products:


l Rack server: RH1288 V2, RH2288A V2, 5288 V3, RH1288 V3, RH2288 V3, RH2288H V3,
RH5885 V3, RH5885H V3, RH8100 V3, 1288H V5, 2288 C V5, 2288H V5, 2488 V5, 2488H
V5, 5288 V5, and 5885H V5
l Atlas 800 AI server (model 3010)
l X6000 server node: XH310 V3, XH321 V3, XH321 V5, and XH321L V5
l X6800 server node: XH620 V3, XH622 V3, XH628 V3, and XH628 V5
l E9000 compute node: CH121 V3, CH121L V3, CH140 V3, CH140L V3, CH220 V3, CH222
V3, CH225 V3, CH226 V3, CH121 V5, CH121L V5, CH221 V5, CH225 V5, and CH242 V5
l Kunlun server: 9008 V5

a. Log in to the iBMC WebUI. For details, see 8.9 Logging In to the iBMC WebUI.
b. Choose Information > Information Summary/Overview/Summary. (The menu
varies depending on software versions.) View the product SN of the server. See
Figure 8-8.

Figure 8-8 Viewing the product SN of the server

l Use the MM910 WebUI.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 112


Huawei Servers
Troubleshooting 8 Common Operations

This method applies only to E9000 servers whose MM910 version is (U54) 2.20 or later.
a. Log in to the MM910 WebUI. For details, see 8.11 Logging In to the MM910
WebUI.
b. Choose Chassis Information > Manufacturing Information and view the product
SN of the server. See Figure 8-9.

Figure 8-9 Viewing the product SN of the server

c. Choose Chassis Information > Compute Node Slot Number > Manufacturing
Information and view the SN of the compute node, as shown in Figure 8-10.

Figure 8-10 Viewing the product SN of the compute node

l Use the FusionDirector WebUI.

l This method applies only to E9000 servers whose management module is the MM920/
MM921.
l Before the operations, add the MM920/MM921 to FusionDirector.
a. Log in to the FusionDirector WebUI. For details, see 8.12 Logging In to the
FusionDirector WebUI.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 113


Huawei Servers
Troubleshooting 8 Common Operations

b. Choose Menu > Compute > Hardware > Chassis.


c. On displayed chassis list, click a chassis name to access the chassis details page.
d. Click the Overview tab to view the chassis SN, as shown in Figure 8-11.

Figure 8-11 Product SN

e. Click the Device tab and click Server, Management Module, and Switch Module
respectively to view the SNs of the compute node, management module, and switch
module, as shown in Figure 8-12.

Figure 8-12 Product SN

8.2 Using iMana 200 to Collect Information in Batches


This method applies only to servers and blades. To collect logs of switch modules in batches,
use the MM910 WebUI.

Procedure
Step 1 Use PuTTY to log in to the server. For details, see 8.15 Logging In to a Server Over a
Network Port by Using PuTTY or 8.17 Logging In to a Compute Node, Passthrough
Module, or Switch Module by Using the SOL Function of the MM910.
Step 2 On the iMana 200 CLI, run the imtool command (for versions earlier than 7.01) or the
ipmcset -t maintenance -d imtool command (for 7.01 and later versions). Information
similar to the following is displayed:
root@BMC:/#ipmcset -t maintenance -d imtool
tar: removing leading '/' from member names
Tar result information success.
iMana:/->

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 114


Huawei Servers
Troubleshooting 8 Common Operations

If the following information is displayed, log collection is successful.


tar: removing leading '/' from member names
Tar result information success.

Step 3 Use a cross-platform file transfer tool to connect to the iMana 200 IP address.

In this document, WinSCP is used as the cross-platform file transfer tool. For details, see 8.19
Using WinSCP to Transfer Files.

Step 4 Download the tar.gz package in the /tmp directory on iMana 200 to a directory on the local
PC. See Figure 8-13.

Figure 8-13 WinSCP window

----End

8.3 Using iBMC to Collect Information in Batches


Scenarios

Table 8-1 One-click information collection by the iBMC for each server
Server Series One-Click Information Description
Collection

E9000 Compute node information For details about one-click


information collection on
the E9000, see section
"Information Collection" in
the MM910 Management
Module User Guide.

E6000 N/A

Rack server or Atlas 800 AI Server information


server (model 3010)

X6000 Compute node information

X8000

X6800

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 115


Huawei Servers
Troubleshooting 8 Common Operations

Server Series One-Click Information Description


Collection

FusionServer G5500 Server information, MM510


management module
information, and
heterogeneous node
information

Procedure
Step 1 Log in to the iBMC WebUI. For details, see 8.9 Logging In to the iBMC WebUI.

Step 2 Choose Information > Overview > Shortcuts > One-Click Info Collection, as shown in
Figure 8-14.

Figure 8-14 One-click information collection

Step 3 Click One-Click Info Collection.


When information collection is complete, a file named dump_info.tar.gz is generated.
Step 4 Click the file name and download the file to the local PC as prompted.

----End

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 116


Huawei Servers
Troubleshooting 8 Common Operations

8.4 Using the MM910 WebUI to Collect Information in


Batches (for Versions Earlier Than U54 2.20)
Operation Scenario
For versions earlier than (U54) 2.20, use the MM910 WebUI to collect logs in batches.

Procedure
Step 1 Log in to the MM910 WebUI. For details, see 8.11 Logging In to the MM910 WebUI.

Step 2 Choose System Management on the menu bar, choose SEL Information in the navigation
tree, and click the SMM tab and then the One touch collect tab.

The log collection page is displayed.

Step 3 On the log collection page, choose Collect All > Start.

Log collection takes about 20 minutes. When log collection is complete, a log file named
one_touch_info_all.tar.gz is displayed in the File Name area.

Step 4 Click the log file name and download it to the local PC as prompted.

For MM910 earlier than (U54) 2.20, you need to collect logs of both the active and standby HMMs.

----End

8.5 Using the MM910 WebUI to Collect Information in


Batches (for U54 2.20 or Later)
Operation Scenario
For (U54) 2.20 or later, use the MM910 WebUI to collect logs in batches.

Procedure
Step 1 Log in to the MM910 WebUI. For details, see 8.11 Logging In to the MM910 WebUI.

Step 2 Choose System Management > Information Collection, and set log collection parameters.
l Select MM for Collected from.
l Select One-click full collection for Collected content.

Step 3 Click Collect.

Log collection takes about 20 minutes. When log collection is complete, a log file named
one_touch_info_all.tar.gz is displayed in the File Name area.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 117


Huawei Servers
Troubleshooting 8 Common Operations

Step 4 In the dialog box displayed, download the log file to the local PC as prompted. (In some
browsers, the log file is automatically saved in the default directory.)

----End

8.6 Using the FusionDirector WebUI to Collection


Information in Batches
Operation Scenario
If the management module is MM920 or MM921, you can use the FusionDirector WebUI to
collect logs.

Prerequisites
The MM920 or MM921 has been managed by FusionDirector.

Procedure
Step 1 Log in to the FusionDirector WebUI. For details, see 8.12 Logging In to the FusionDirector
WebUI.

Step 2 Choose Menu > Alarms and Logs > Log. The Log page is displayed.

Step 3 Click Collect Log. In the displayed dialog box, click OK.

The Task area is displayed on the right of the page, showing the progress and status of the log
collecting task.

When the task is complete, a message indicating success is displayed.

Step 4 Click Export Log to export the log information to a local directory.

----End

8.7 Using the MM510 CLI to Collect Information


(FusionServer G5500)
The MM510 is the management module of the FusionServer G5500.

Use the MM510 CLI to collect information about the MM510 and heterogeneous nodes in batches. To
collect information about the server, MM510, and heterogeneous nodes in batches, use the iBMC. For
details, see 8.3 Using iBMC to Collect Information in Batches.

Prerequisites
You have logged in to the CLI of the MM510. For details, see 8.13 Logging In to the
MM510 CLI.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 118


Huawei Servers
Troubleshooting 8 Common Operations

Example
# One-click information collection
iBMC:/->ipmcget -d diaginfo
Download diagnose info to /tmp/ successfully.

8.8 Logging In to the iMana 200 WebUI


Operation Scenario
This section describes how to log in to the iMana 200 WebUI by using a browser on the local
PC. This section uses a PC running Windows 7 and Internet Explorer 8.0 as an example.

Prerequisites
Conditions

If the remote control function is required, ensure that the OS, browser, and Java Runtime
Environment (JRE) of the required versions have been installed on the local PC. Table 8-2
shows the system configuration requirements of the local PC.

Ensure that the local PC meets the following networking conditions:

l The local PC is properly connected to the iMana 200 management network port on the
server by using a network cable.
l The IP addresses of the local PC and the iMana 200 management network port are on the
same network segment.

Table 8-2 Local PC configuration requirements

OS Software Version

l Windows 7 32- Browser Internet Explorer IE 8.0/10.0


bit
Mozilla Firefox Mozilla Firefox 9.0/23.0
l Windows 8 32-
bit Google Chrome Chrome 13.0/31.0
l Windows Server
2008 32-bit JRE 1.6.0 U25/1.7.0 U40 (32-bit)

l Windows 7 64- Browser Internet Explorer IE 8.0/10.0


bit
Mozilla Firefox Mozilla Firefox 9.0/23.0
l Windows 8 64-
bit Google Chrome Chrome 13.0/31.0
l Windows Server
2008 R2 64-bit JRE 1.6.0 U25/1.7.0 U40 (64-bit)
l Windows Server
2012 64-bit

l RHEL 4.3 64-bit Browser Mozilla Firefox Mozilla Firefox 9.0/23.0


l RHEL 6.0 64-bit JRE JRE 1.6.0 U25/1.7.0 U40

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 119


Huawei Servers
Troubleshooting 8 Common Operations

OS Software Version

MAC X v10.7 Browser Safari Safari 5.1

Mozilla Firefox Mozilla Firefox 9.0/23.0

JRE JRE 1.6.0 U25/1.7.0 U40

If the JRE does not meet requirements, download and install a proper Java version.

Data
Table 8-3 lists the required data before you log in to the iBMC WebUI.

Table 8-3 Required data


Type Paramete Description Example
r

User User name Username for logging in to the iMana 200 root
login WebUI
informat
ion Password User password for logging in to the iBMC Huawei12#$
WebUI.
NOTE
The default iMana 200 user is root. The root user
belongs to the administrator group. The default
password is Huawei12#$.

Procedure
Step 1 Connect the local PC to the iMana 200 management network port on the server by using a
crossover cable or twisted pair cable.
Figure 8-15 shows the network diagram.

Figure 8-15 Network diagram

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 120


Huawei Servers
Troubleshooting 8 Common Operations

Step 2 Open Internet Explorer on the local PC.


Step 3 In the address box, enter the iMana 200 address in the format of https://IP address of the
iMana 200 management network port on the server (for example, https://192.168.2.100).
Step 4 Press Enter.
The iMana 200 login page is displayed, as shown in Figure 8-16.

l If the message "There is a problem with this website's security certificate" is displayed, click
Continue to this website (not recommended).
l If the Security Alert dialog box indicating a certificate error is displayed, click Yes.

Figure 8-16 Logging in to the iMana 200 WebUI

Step 5 On the iMana 200 login page, enter the username and password.

The user account will be locked after five consecutive login failures caused by incorrect passwords. If
your user account is locked, log in again 5 minutes later.

Step 6 Select This iMana from the Log on to drop-down list.

You can click Reset to clear the information entered on the User Login page.

Step 7 Click Log In.


The Overview page is displayed. The login username is displayed in the upper right corner of
the page.

----End

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 121


Huawei Servers
Troubleshooting 8 Common Operations

8.9 Logging In to the iBMC WebUI


Scenarios
Log in to the iBMC WebUI by using a browser on the local PC. This section uses a PC
running Windows 7 and Internet Explorer 8.0 as an example.

Prerequisites
Conditions

Before using the remote control function, ensure that the OS, browser, and Java Runtime
Environment (JRE) of the required versions have been installed on the local PC. Table 8-4
lists the required software versions.

Ensure that the local PC meets the following networking conditions:

l The local PC is connected to the iBMC management network port on the server by using
a network cable.
l The IP addresses of the local PC and the iBMC management network port are on the
same network segment.

Table 8-4 Software requirements for the local PC

OS Software Version

l Windows 7 32- Browser Internet Explorer IE 8.0


bit
Mozilla Firefox Mozilla Firefox 9.0/23.0
l Windows 7 64-
bit Google Chrome Chrome 13.0/31.0

JRE JRE 1.6.0 U25/1.7.0 U40

l Windows 8 32- Browser Internet Explorer IE 10.0/11.0


bit
Mozilla Firefox Mozilla Firefox 9.0/23.0
l Windows 8 64-
bit Google Chrome Chrome 13.0/31.0

JRE JRE 1.6.0 U25/1.7.0 U40

Windows Server Browser Internet Explorer IE 8.0/10.0/11.0


2008 R2 64-bit
Mozilla Firefox Mozilla Firefox 9.0/23.0

Google Chrome Chrome 13.0/31.0

JRE JRE 1.6.0 U25/1.7.0 U40

Windows Server Browser Internet Explorer IE 10.0/11.0


2012 64-bit
Mozilla Firefox Mozilla Firefox 9.0/23.0

Google Chrome Chrome 13.0/31.0

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 122


Huawei Servers
Troubleshooting 8 Common Operations

OS Software Version

JRE JRE 1.6.0 U25/1.7.0 U40

l Red Hat Browser Mozilla Firefox Mozilla Firefox 9.0/23.0


Enterprise Linux
4.3 64-bit JRE JRE 1.6.0 U25/1.7.0 U40
l Red Hat
Enterprise Linux
6.0 64-bit

MAC X v10.7 Browser Safari Safari 5.1

Mozilla Firefox Mozilla Firefox 9.0/23.0

JRE JRE 1.6.0 U25/1.7.0 U40

Data
Table 8-5 lists the required data before you log in to the iBMC WebUI.

Table 8-5 Required data


Type Paramete Description Example
r

User User name Username for logging in to the iBMC WebUI. root
login
informat Password Password for logging in to the iBMC WebUI. Huawei12#$
ion NOTE
The default username for logging in to the iBMC
WebUI of V2 & V3 servers is root, and the default
password is Huawei12#$.
The default username for logging in to the iBMC
WebUI of V5 servers or Atlas 800 AI servers
(model 3010) is Administrator, and the default
password is Admin@9000.

Procedure
Step 1 Connect the local PC to the iBMC management network port on the server by using a
crossover cable or twisted pair cable.
Figure 8-17 shows the network diagram.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 123


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-17 Network diagram

Step 2 Open Internet Explorer on the local PC.

Step 3 In the address box, enter the IP address of the server iBMC management network port (for
example, https://192.168.2.100) and press Enter.

The iBMC login page is displayed, as shown in Figure 8-18.

l If the message "There is a problem with this website's security certificate" is displayed, click
Continue to this website (not recommended).
l If the Security Alert dialog box indicating a certificate error is displayed, click Yes.

Figure 8-18 Logging in to iBMC

Step 4 On the login page, enter the username and password for logging in to the iBMC WebUI.

The user account will be locked after five consecutive login failures with wrong passwords. If your user
account is locked, log in again 5 minutes later.

Step 5 Select This iBMC from the Domain drop-down list.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 124


Huawei Servers
Troubleshooting 8 Common Operations

Step 6 Click Log In.


The Overview page is displayed, showing the username in the upper right corner.

----End

8.10 Logging In to the Web Tools of the MX510


Operation Scenario
Log in to the Web Tools of the FC switching plane MX510 by using a browser on the local
PC to configure and manage this plane.
This section applies to the CX311, CX911, and CX915.

Data
The following data is required:
l IP address of the server to be connected
l User name for logging in to the server to be connected. The default username is admin.
l User password for logging in to the server to be connected. The default user password is
Huawei12#$.

Tool
JRE: third-party free software. You can obtain it from the Internet. JRE 1.8 or later is
required.

Procedure
Step 1 Connect a client (for example, a local PC) to the management network port of the
management module by using a network cable.
Step 2 In this displayed security alert dialog box, click Allow to allow web access.

Step 3 In the displayed security alert dialog box, select Do not block this program.

Step 4 In the address box of the PC browser, enter https://IP address of the FC switching plane and
press Enter.
The login dialog box is displayed, as shown in Figure 8-19.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 125


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-19 Login dialog box

Step 5 Enter the username and password, and click Add Fabric.

Step 6 In the dialog box displayed, click Yes.

The Web Tools home page is displayed, as shown in Figure 8-20.

Figure 8-20 Web Tools home page

----End

8.11 Logging In to the MM910 WebUI


Scenarios
Log in to the MM910 WebUI by using a browser on the local PC to configure and manage the
chassis, MM910s, compute nodes, storage nodes, switch modules, passthrough modules,
power supply units (PSUs), and fan modules.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 126


Huawei Servers
Troubleshooting 8 Common Operations

Impact on the System


This operation has no adverse impact on the system.

l The user account will be locked if incorrect passwords are entered for five consecutive times. The
user account will be automatically unlocked in 5 minutes, but cannot be forcibly unlocked. If you
attempt to enter a password again within 5 minutes, the lock duration is reset to 5 minutes no matter
whether the entered password is correct.
l The WebUI of the standby MM910 (displayed as "This is the standby MM.") does not display
component installation status. After logging in to the WebUI of the standby MM910, you can view
the status of the active MM910 and perform the following operations for the standby MM910: Set
the DHCP parameters and a static IP address, set and query the thresholds and hysteresis of
threshold sensors, collect system operating information, and upgrade the management software. To
perform other operations, log in to the WebUI of the active MM910.

Data
You have obtained the following data:
l Username for logging in to the server to be connected. The default username is root.
l User password for logging in to the server to be connected. The default user password is
Huawei12#$.

Procedure
Step 1 Connect the Ethernet port on the local PC to the MGMT ports on the active and standby
MM910s over the local area network (LAN).

NOTICE

If the active MM910 MGMT port has been connected to the network by using a network
cable and the client needs to be directly connected to the MM910, do not directly disconnect
the network cable from the active MM910 MGMT port that has been connected to the
network. Otherwise, an active/standby MM910 switchover will be triggered, which may cause
network interruption. You are advised to connect the client to the active MM910 STACK port
in the chassis by using a network cable. If the active MM910 STACK port in the chassis has
been connected to the MGMT port in another chassis, use an idle active MM910 STACK port
in another chassis.

Figure 8-21 shows the network connections.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 127


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-21 Network connections

l In V2.25 and earlier versions, the MM910 MGMT port is accessed by the external network through
the 2X and 3X switch modules by default. In this case, do not connect the MM910 MGMT port and
the switch module network ports to the same network. Otherwise, a network storm will occur and
the network connection will be interrupted.
To use the MGMT port on the MM910 panel as the management network port for connecting to an
external network, run the smmset -d outportmode -v 1 command on the CLI.
l In V2.26 and later versions, the MM910 MGMT port is provided as the default management
network port for the external network.

Step 2 Set the IP address and subnet mask or route information for the local PC so that the local PC
can communicate with the MM910 properly.
Step 3 On the menu bar of Internet Explorer, choose Tools > Internet Options.
The Internet Options dialog box is displayed.

This section uses a PC running Windows 7 and Internet Explorer 8.0 as an example.

Step 4 Click the Connections tab and click LAN Settings.


The LAN Settings dialog box is displayed.
Step 5 In the Proxy server area, deselect the Use a proxy server for your LAN check box.
Step 6 Click Yes.
The LAN Settings dialog box closes.
Step 7 Click Yes.
The Internet Options dialog box closes.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 128


Huawei Servers
Troubleshooting 8 Common Operations

Step 8 Open Internet Explorer, enter https://MM910 floating IP address in the address box, and press
Enter.
For example, enter https://10.85.4.77 in the address box.
"There is a problem with this website's security certificate" is displayed.
Step 9 Click Continue to this website (not recommended).
The page for logging in to the HMM WebUI is displayed.
Step 10 Set the parameters. See Figure 8-22 and Figure 8-23.
l Language: Select English.
l User name: Enter the username for login. The default username is root.
l Password: Enter the user password for login. The default password is Huawei12#$.
l Login To: Select This Machine/computer in most cases. Select LDAP if the system
manages domain users by using an active directory (AD) server.

Figure 8-22 Logging in to the HMM WebUI (MM910 (U54) 2.20 or later)

Figure 8-23 Logging in to the HMM WebUI (MM910 earlier than (U54) 2.20)

Step 11 Click Log In.


The HMM WebUI is displayed, as shown in Figure 8-24 or Figure 8-25.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 129


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-24 HMM WebUI (MM910 (U54) 2.20 or later)

Figure 8-25 HMM WebUI (MM910 earlier than (U54) 2.20)

----End

8.12 Logging In to the FusionDirector WebUI


Operation Scenario
Use Google Chrome to log in to the FusionDirector WebUI. On the FusionDirector WebUI,
you can manage chassis components and cluster devices.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 130


Huawei Servers
Troubleshooting 8 Common Operations

Prerequisites
Conditions
l Google Chrome 55 or later is required for logging in to FusionDirector.
l You have obtained the IP address, username, and password of FusionDirector.
The default username of the FusionDirector WebUI is Administrator, and the password
is Admin@9000.
l If you log in as an LDAP domain user, ensure that the LDAP server communicates with
FusionDirector properly, the LDAP function has been enabled on FusionDirector, and
the LDAP server and user group information has been configured.
l If you use the DNS domain name to log in, ensure that the DNS server communicates
with FusionDirector properly and the domain name and DNS server are configured on
FusionDirector.
Precautions
l FusionDirector supports a maximum of 100 concurrent users.
l The default timeout interval of FusionDirector is 30 minutes. If you do not perform any
operation on the WebUI within 30 minutes, the account is automatically logged out. You
need to enter the username and password to log in again.
l If the number of login failures caused by incorrect user names and passwords reaches the
value specified in the system security policy, the account is automatically locked. When
the lockout duration reaches the value specified in the security policy, the user is
automatically unlocked.
l To ensure system security, change the default password upon the first login and change
the password periodically.

Procedure
Step 1 Connect the Ethernet port of the PC to a management network port of the active or standby
MM920/MM921 over the LAN.
The 10GE optical port and MGMT port on the MM920/MM921 panel are management
network ports. This section uses the MGMT port as an example.
Figure 8-26 shows the network connections.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 131


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-26 Network connections

Step 2 Set an IP address and a subnet mask or add route information for the PC so that the PC can
communicate with FusionDirector.
Step 3 Open the browser, enter https://ipaddr in the address box, and press Enter.

l ipaddr indicates the address used to access the FusionDirector WebUI. It can be in either of the
following formats:
– IPv4 address in dotted-decimal format XXX.XXX.XXX.XXX.
– Fully qualified domain name (FQDN) of FusionDirector.
l The browser may display a message indicating that the website has a security certificate error. Ignore
this error and continue the login if the IP address is correct.

Step 4 Enter the login information.


Table 8-6 describes the information required on the login page.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 132


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-27 Login page

Table 8-6 Login parameters


Parameter Description

User name FusionDirector supports the following user names:


l Local users: The username is a string of 6 to 32 characters.
l LDAP users: The username can contain a maximum of 255 characters.

Password Specifies the password of the user. For security purposes, change the
password periodically.

Domain l If you log in as a local user, select Local.


Name l If you log in as an LDAP user, select LDAP.

Step 5 Click Log In.


The FusionDirector Dashboard is displayed, as shown in Figure 8-28.

l If the username or password is incorrect, you need to enter a verification code in the second login
attempt. If the verification code is not clear, click to refresh the verification code.
l If you enter incorrect passwords for three consecutive times, the account will be locked for 5
minutes. If the account is locked, try again later or contact the administrator.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 133


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-28 Dashboard page

----End

8.13 Logging In to the MM510 CLI


The MM510 is the management module of the FusionServer G5500.

Prerequisites
When logging in to the HMM CLI, ensure that:

l If you log in to the CLI over SSH, a maximum of five concurrent users are supported.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 134


Huawei Servers
Troubleshooting 8 Common Operations

l To log in to the CLI over the network port, you must connect the network port on the
configuration terminal to the network port on the server by using a network cable, and
ensure that the IP addresses of the two network ports are on the same network segment.
l To log in to the CLI over the serial port, you must connect the serial ports of the terminal
and the server by using a serial cable.

Login Method
l Login over SSH
l Login over the local serial port

l The HMM provides one default user Administrator, and the default password is on the
product nameplate.
l The system locks a user account if the user enters incorrect passwords for five consecutive
times. The user is automatically unlocked 5 minutes later, or an administrator can unlock the
user on the CLI.
l For security purposes, change the initial password after the first login and change your
password periodically.

Logging In over SSH


The Secure Shell (SSH) protocol provides secure remote login and other secure network
services over an insecure network.

The methods for logging in to the CMC CLI over SSH varies according to the client operating
system:

l If the client uses Linux:


a. Connect the client to the management network port on the server.
b. Run the ssh ipaddress command on the terminal tool (for example, shell) to log in
to the CLI. (In the command, ipaddress indicates the IP address of the management
network port.)

At the initial startup of the HMM, wait for about 3 minutes before you log in to the CLI.
l If the client uses Windows:
a. Download and install the SSH client communication tool.
b. Connect the client to the management network port on the server.
c. Enter the IP address, username, and password of the management network port on
the client communication tool.

Logging In over a Serial Port


1. Connect the serial cable.
2. Log in to the CLI by using the HyperTerminal and set the following parameters:
– Bits per second: 115200
– Data bits: 8
– Parity: None

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 135


Huawei Servers
Troubleshooting 8 Common Operations

– Stop bits: 1
– Flow control: None
Figure 8-29 lists the parameters to be specified.

Figure 8-29 HyperTerminal properties

3. Enter the username and password after the connection is established.

8.14 Logging In to the RMC CLI


Operation Scenario
Log in to the rack management controller (RMC) CLI.

Two login methods are available:

l SSH
SSH provides secure remote login and other secure network services over an insecure
network.
To log in to the RMC CLI over SSH, connect a PC to the RMC management network
port by using a network cable.
l Login over the local serial port

Prerequisites
The RMC is operating properly.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 136


Huawei Servers
Troubleshooting 8 Common Operations

Data
l IP address of the RMC management network port. The default IP address is
192.168.2.100.
l RMC user names and passwords
The RMC provides four default users:
– User root (default password: Huawei12#$)
– User admin (default password: Huawei12#$)
– User operator (default password: Huawei12#$)
– User taobao (default password: Huawei12#$)

Tool
A terminal tool (for example, PuTTY) has been installed on the PC. PuTTY is third-party free
software. PuTTY 0.60 or later is required for login over a serial port.

Document
For details about the RMC, see the X8000 Server RMC Command Reference.

Log in to the RMC CLI over a serial port.


Step 1 Connect the PC to the RMC serial port by using a serial cable.

Step 2 On the PC, double-click PuTTY.exe.


The PuTTY Configuration window is displayed.
Step 3 Set Connection type to Serial, as shown in Figure 8-30.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 137


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-30 PuTTY Configuration (Serial)

Step 4 Set the login parameters.


The following are examples of the parameters:
l Serial Line to connect to: COM1
l Speed (baud): 38400
l Data bits: 8
l Stop bits: 1
l Parity: None
l Flow control: None
Step 5 Click Open.
The PuTTY window is displayed, prompting "login as:" for you to enter a user name.
Step 6 Enter a user name and password.
After login, the RMC command prompt root@RMC:/ is displayed.

----End

Log in to the RMC over the management network port.


Step 1 Connect the PC to the RMC management network port by using a network cable.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 138


Huawei Servers
Troubleshooting 8 Common Operations

Step 2 On the PC, double-click PuTTY.exe.


The PuTTY Configuration window is displayed.
Step 3 Set Connection type to SSH, as shown in Figure 8-31.

Figure 8-31 PuTTY Configuration (SSH)

Step 4 In the Host Name (or IP address) text box, enter the IP address of the RMC management
network port.
Step 5 Click Open.
The PuTTY window is displayed, prompting "login as:" for you to enter a user name.
Step 6 Enter a user name and password.
After login, the RMC command prompt root@RMC:/ is displayed.

----End

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 139


Huawei Servers
Troubleshooting 8 Common Operations

8.15 Logging In to a Server Over a Network Port by Using


PuTTY
Scenarios
Use PuTTY to remotely log in to the server over a local area network (LAN) and to configure
and maintain the server.

The server in this section can be a management module, compute node, or switching plane.

Prerequisites
Conditions
The PC and the MM910/MM920/MM921 management network port have been connected by
using a network cable.
Data
You have obtained the following data:
l You have obtained the IP address of the server to be connected.
l You have obtained the user name and password for logging in to the server to be
connected.
Software Tools
PuTTY.exe (third-party software)

Procedure
Step 1 Set an IP address and a subnet mask or add route information for the PC so that the PC can
properly communicate with the server.
You can run the Ping Server IP address command on the PC CLI to check the
communication between the PC and the server.
Step 2 Double-click PuTTY.exe.
The PuTTY Configuration window is displayed, as shown in Figure 8-32.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 140


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-32 PuTTY Configuration

Step 3 Set the login parameters.


Set parameters as follows:
l Host Name (or IP address): Enter the IP address of the server to be logged in to, for
example, 191.100.34.32.
l Port: Retain the default value 22.
l Connection type: Retain the default value SSH.
l Close window on exit: Retain the default value Only on clean exit.

Configure Host Name and Saved Sessions, and click Save. You can double-click the saved record
under Saved Sessions to log in to the server the next time.

Step 4 (Optional) After logging in to the Ethernet plane by using PuTTY, if you fail to delete
characters on the CLI by using the Backspace key, choose Terminal > Keyboard, and select
Control-H under The Backspace key, as shown in Figure 8-33.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 141


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-33 PuTTY Configuration

Step 5 Click Open.

The PuTTY window is displayed, prompting "login as:" for you to enter a user name.

l If this is your first login to the server, the PuTTY Security Alert dialog box is displayed. Click Yes
to proceed.
l If an incorrect user name or password is entered, you must set up a new PuTTY session.

Step 6 Enter a user name and password.

If the login is successful, the server host name is displayed on the left of the prompt.

----End

8.16 Logging In to a Server Over a Serial Port by Using


PuTTY

By default, the server serial port is the OS serial port. For details about how to redirect the server serial
port, see "Querying and Redirecting the Serial Port (serialdir)" in the iBMC User Guide.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 142


Huawei Servers
Troubleshooting 8 Common Operations

Scenarios
Use PuTTY to log in to the server over a serial port in either of the following scenarios:

l The server is configured for the first time at a new site.


l A remote connection to the server cannot be established.

The server in this section can be a management module, compute node, or switching plane.

Prerequisites
Conditions

l A PC is connected to the server by using a serial cable.


l PuTTY 0.60 or later has been installed.

Data

You have obtained the user name and password for logging in to the server to be connected.

Software Tools

PuTTY.exe (third-party software) PuTTY 0.60 or later is required for login over a serial port.

Procedure
Step 1 Double-click PuTTY.exe.

The PuTTY Configuration window is displayed.

Step 2 In the navigation tree on the left, choose Connection > Serial.

Step 3 Set the login parameters.

The following are examples:

l Serial line to connect to: COMN


l Speed (baud): 115200
l Data bits: 8
l Stop bits: 1
l Parity: None
l Flow control: None

In COMN, N indicates the serial port number, and the value is an integer.

Step 4 In the navigation tree, choose Session.

Step 5 Select Connection type in Serial, as shown in Figure 8-34.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 143


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-34 PuTTY Configuration

Step 6 Click Open.


The PuTTY window is displayed.
Step 7 Enter a user name and password.
If the login is successful, the server host name is displayed on the left of the prompt.

----End

8.17 Logging In to a Compute Node, Passthrough Module,


or Switch Module by Using the SOL Function of the
MM910
Operation Scenario
You can use the Serial over LAN (SOL) function of the management module to access a
compute node, passthrough module, or switch module in a chassis for remote maintenance of
the E9000.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 144


Huawei Servers
Troubleshooting 8 Common Operations

Prerequisites
Conditions

l You have logged in to the MM910 CLI by using the floating IP address of the MM910.
l There is no jumper cap over the pins on the mainboard of the compute node, passthrough
module, or switch module.

Data

You have obtained the following data:

l User name and password for logging in to the management module. The default user
name of the MM910 is root, and the default password is Huawei12#$.
l User name and password for logging in to the compute node to be connected. The default
user name is root, and the password is Huawei12#$.
l Password for logging in to the passthrough module or switch module to be connected
The default password is Huawei12#$.

Procedure
Step 1 Use an SSH tool and the floating IP address of the MM910 to log in to the MM910 CLI.

In this document, PuTTY is used as the SSH tool. For details, see 8.15 Logging In to a
Server Over a Network Port by Using PuTTY.

Step 2 Log in to the SOL screen.

telnet 0 1101
*=====================================================================*
* Welcome to SMM SOL Server *
* Please log in with SMM account and password. *
*=====================================================================*
user name:

NOTICE

If you need to disconnect the service terminal or server power after logging in to the SOL
screen, exit the SOL screen first. Otherwise, re-logging in to the SOL screen will fail.

Step 3 Enter the user name and password.

The screen for selecting a slot number is displayed.


Log in Success!

*=================================================================================
==========================
please input the SOL Blade1~Blade16(1 ~ 16), Blade1A~Blade16A(17 ~ 32),
Swi1~Swi4(33 ~ 36) and COM#(n)
press Ctrl+R to return
*=================================================================================
==========================

Blade1~Blade16(1 ~ 16)
Blade1A~Blade16A(17 ~ 32)

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 145


Huawei Servers
Troubleshooting 8 Common Operations

Swi1~Swi4(33 ~ 36)
Please input your choice:

The numbers in the preceding information are described as follows:


l 1 to 32 indicate the compute nodes in slots 1 to 32, respectively.
l 33 to 36 indicate the switch modules in slots 1E, 2X, 3X, and 4E, respectively.
Step 4 Enter the slot number of the compute node, passthrough module, or switch module, and press
Enter.
l If you enter a compute node slot number, the following serial port information is
displayed:
1 systemcom
2 RAIDcom
3 BMCcom
4 Exboardcom

Or
1 SYS COM
2 BMC COM

Or
1 systemcom
2 BMCcom

l If you enter a switch module slot number, the following serial port information is
displayed:
1 BMCcom
2 fabriccom
3 basecom
4 FCcom

Or
1 BMCcom
2 fabriccom

Or
1 BMCcom
2 fabriccom
3 basecom

l If you enter a passthrough module slot number, the following serial port information is
displayed:
1 BMCcom

Step 5 Enter the value representing the serial port to be connected, and press Enter.
The serial port screen is displayed. On this screen, you can perform operations such as
configuration and query.

You can press Ctrl+R once to return to the slot number selection screen shown in Step 3, or press Ctrl
+R twice to exit the SOL screen.

----End

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 146


Huawei Servers
Troubleshooting 8 Common Operations

8.18 Logging In to a Compute Node, Passthrough Module,


or Switch Module by Using the SOL Function of the
MM920/MM921
Scenarios
You can use the SOL function of the management module to access a compute node,
passthrough module, or switch module in a chassis for remote maintenance of the E9000.

Prerequisites
Conditions

l You have logged in to the MM920/MM921 CLI by using the floating IP address of the
MM920/MM921.
l There is no jumper cap over the pins on the mainboard of the compute node, passthrough
module, or switch module.

Data

You have obtained the following data:

l Username and password for logging in to the management module. The default
username and password of the MM920/MM921 are Administrator and Admin@9000
respectively.
l Username and password for logging in to the compute node to be connected. The default
username and password are Administrator and Admin@9000 respectively.
l Password for logging in to the passthrough module or switch module to be connected
The default password is Huawei12#$.

Procedure
Step 1 Use an SSH tool and the floating IP address of the MM920/MM921 to log in to the CLI.

In this document, PuTTY is used as the SSH tool. For details, see 8.15 Logging In to a
Server Over a Network Port by Using PuTTY.

Step 2 Run the ipmcget -l bladeN -t SOL -d cominfo or ipmcget -l swiN -t SOL -d cominfo
command to query the SOL port information of the compute node, pass through module, or
switch module.

Step 3 Run the ipmcset -l bladeN -t sol -d activate -v com_value or ipmcset -l swiN -t sol -d
activate -v com_value command to enter the serial port input interface.

Step 4 Enter the username and password as prompted.

----End

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 147


Huawei Servers
Troubleshooting 8 Common Operations

8.19 Using WinSCP to Transfer Files


Scenarios
Use WinSCP to transfer files from a PC to a server.

Prerequisites
Conditions
The Secure File Transfer Protocol (SFTP) service has been enabled on the destination device.
Data
You have obtained the following data:
l You have obtained the IP address of the server to be connected.
l You have obtained the user name and password for logging in to the server to be
connected.
Software Tools
WinSCP.exe (third-party free software)

Procedure
Step 1 Open the WinSCP folder, and double-click WinSCP.exe.
The WinSCP Login dialog box is displayed, as shown in Figure 8-35.

To change the UI language, click Languages.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 148


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-35 WinSCP Login

Step 2 Set the login parameters.


The parameters are described as follows:
l Host name: Enter the IP address of the server to be connected. For example,
191.100.34.32.
l Port number: The default value is 22.
l User name: Enter the username. For example, admin123.
l Password: Enter the password. For example, admin123.
l Private key file: This parameter is left blank by default. Retain the default value.
l Protocol: Retain the default option SFTP in the File protocol drop-down list, and select
Allow SCP fallback.
Step 3 Click Login.
The WinSCP file transfer window is displayed.

l If a private key file is not selected at the first login, the warning message "Continue connecting and
add host key to cache" is displayed. Click Yes. The WinSCP file transfer window is displayed.
l On Windows 7, C:\Users\Administrator\Documents on the local PC is opened in the left pane,
and /root on the server is opened in the right pane by default.

Step 4 In the left and right panes, create, delete, or copy folders in specific directories as required.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 149


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-36 WinSCP window

----End

8.20 Configuring an FTP Server


Scenarios
Configure an FTP server to transfer files from a PC to a switching plane.

Prerequisites
l A PC is connected to the server by using a serial cable.
l WFTPD has been installed.

Software Tools
wftpd32.exe: used to transfer files between different platforms, for example, from a PC to a
switching plane of a switch module. wftpd32.exe is a free third-party tool. You can obtain it
from the Internet.

Procedure
Step 1 Double-click wftpd32.exe.

The No log file open - WFTPD window is displayed.

Step 2 Choose Logging > Log Options.

The Logging Options dialog box is displayed.

Step 3 Select all check boxes except Winsock Calls, and click OK.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 150


Huawei Servers
Troubleshooting 8 Common Operations

Step 4 Choose Security > Users/rights.


The Users/Rights Security Dialog dialog box is displayed.
Step 5 Click New User. In the displayed dialog box, enter a new username (for example, vxworks)
and click OK.
The Change Password dialog box is displayed.
Step 6 Enter a new password (for example, vxworks) in the New Password and Verify Password
text boxes, and click OK.
Step 7 Copy the upgrade file to a directory (for example, D:\FTP) on the PC.

The directory can contain only English characters.

Step 8 Select vxworks from the User Name combo box, and enter the upgrade file directory (for
example, D:\FTP) in the Home Directory text box. See Figure 8-37.

Figure 8-37 Users/Rights Security Dialog dialog box

Step 9 Click Done.


The FTP server is configured.

----End

8.21 Using SFTP to Transfer Files


Scenarios
Transfer files on the local PC using SFTP.

Prerequisites
The SFTP service has been enabled on the destination device.

Software Tools
mini-sftp-server.exe (free software)

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 151


Huawei Servers
Troubleshooting 8 Common Operations

Procedure
Step 1 Double-click mini-sftp-server.exe.
The Core FTP mini-sftp-server dialog box is displayed, as shown in Figure 8-38.

Figure 8-38 Core FTP mini-sftp-server

Step 2 Set the parameters as prompted:


The parameter descriptions are displayed as follows:
l User: specifies the username for logging in to the SFTP server.
l Password: specifies the password for logging in to the SFTP server.
l Port: specifies the port number, which is 22.
l Root path: specifies the home directory of the SFTP server.
Step 3 Click Options and enter the SFTP server IP address of the SFTP server. For example, enter
191.100.34.33.
Step 4 Click Start.
The file transfer page is displayed.

----End

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 152


Huawei Servers
Troubleshooting 9 Other Resources

9 Other Resources

9.1 Obtaining Technical Support


9.2 Product Information Resources
9.3 Product Configuration Resources
9.4 Maintenance Tools

9.1 Obtaining Technical Support


Technical Support Website
Obtain technical documents at Huawei enterprise support website.

Self-Service Platform and Community


Learn more about servers and communicate with experts at:

l HUAWEI Server Information Service Platform for specific server product


documentation.
l Huawei Enterprise iKnow for learning and discussion.
l Huawei Enterprise Support Community for quick product issue query.

News
For notices about product life cycles, warnings, and updates, visit Support > Bulletins >
Product Bulletins > Life Cycle Notices.

Cases
To learn server applications, visit Intelligent Computing Case Library.

The Intelligent Computing Case Library is available only to Huawei engineers and partners.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 153


Huawei Servers
Troubleshooting 9 Other Resources

Huawei Technical Support


If a fault persists after taking troubleshooting measures specified in documents, contact
technical support at your local Huawei office. If your local Huawei office is not available,
contact Huawei technical support as follows:

l Contact Huawei customer service center.


– Enterprise customers in China can contact Huawei in the following ways:
n Hotline: 400-822-9999
n Email: [email protected]
– Enterprise customers outside China: Global Service Hotline
– Carrier customers in China can contact Huawei in the following ways:
n Hotline: 400-830-2118
n Email: [email protected]
– Carrier customers outside China: Global Service Hotline
l Contact the technical support personnel of the local Huawei office.

9.2 Product Information Resources


Table 9-1 describes the product information resources.

Table 9-1 Product information resources

Information Resource Description How to Obtain

Server product Describes the server 1. Log in to the Support >


documentation structure, specifications, and Intelligent Servers or
installation method. Each Support > AI
Huawei server has a user Computing Platform
guide or maintenance and page.
service guide. 2. Choose a server model to
access the product page.
3. On the Documentation
tab page, choose
Operation &
Maintenance.
4. View the required user
guide or maintenance
and service guide.

Intelligent Computing Used to query OSs, Visit Intelligent


Compatibility Checker components, and external Computing Compatibility
devices that are compatible Checker.
with servers.

Maintenance Information Used to query the service Visit Maintenance


Inquiry System information about devices. Information Inquiry.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 154


Huawei Servers
Troubleshooting 9 Other Resources

Information Resource Description How to Obtain

Huawei Server Power Used to calculate server Visit Huawei Server Power
Calculator power consumption with Calculator.
different configurations.

Intelligent Computing Used to view the 3D Visit Intelligent


Interactive Product Display structure of the server Computing Interactive
hardware. Product Display.

9.3 Product Configuration Resources


Table 9-2 describes the product configuration resources.

Table 9-2 Product configuration resources

Tool Name Description How to Obtain

Removal and installation Describe how to remove and Multimedia Portal


videos install hardware.

DIMM Configuration Online application that Huawei Server Product


Assistant shows the DIMM Memory Configuration
installation sequence in a Assistant
graphical manner after the
product name, CPU
quantity, and DIMM
quantity are specified.

9.4 Maintenance Tools


Table 9-3 lists the software tools required for routine maintenance of Huawei servers.

Table 9-3 Software tools for routine maintenance

Name Server and Description


Version

FusionServer See the Only Huawei FusionServer V2 & V3 servers are


Tools Toolkit FusionServer supported. Diagnoses and configures servers.
Tools 2.0 Toolkit Download link: FusionServer Tools
User Guide.

FusionServer See the Used for new site deployment, delivery,


Tools 2.0 FusionServer troubleshooting, and firmware upgrade.
SmartKit Tools 2.0 Download link: FusionServer Tools
SmartKit User
Guide.

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 155


Huawei Servers
Troubleshooting 9 Other Resources

Name Server and Description


Version

ServiceCD See the Only Huawei FusionServer V2 & V3 servers are


FusionServer supported. This tool is used to boot and install an
Tools 2.0 OS.
ServiceCD2.0 Download link: FusionServer Tools
User Guide.

Smart See the Smart Only Huawei FusionServer V5 servers are


Provisioning Provisioning supported. Smart Provisioning is used to install
User Guide. OSs without a physical DVD-ROM drive,
configure RAID, upgrade firmware, and perform
troubleshooting.
Download link: Smart Provisioning

Issue 16 (2019-11-15) Copyright © Huawei Technologies Co., Ltd. 156

You might also like