0% found this document useful (0 votes)
18 views73 pages

9.HSS9860 HLR9820 Emergency O M Practice Training (GUL)

Uploaded by

walter32002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views73 pages

9.HSS9860 HLR9820 Emergency O M Practice Training (GUL)

Uploaded by

walter32002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

HSS9860&HLR9820

Emergency O&M
Practice Training(GUL)

www.huawei.com

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved.


References
 HLR9820 product Manuals -Emergency Troubleshooting

 HLR9820 Emergency Maintenance Manuals

 HLR9820 User Manuals

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page2
Objectives
 Upon completion of this course, you will be able to:
 Grasp how to position the reason of the Emergency quickly

 Grasp the technique of Emergency Troubleshooting

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page3
Contents
1. Overview of Emergency O&M
2. Technique of Emergency O&M

3. Common Operation of Emergency O&M

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page4
Overview of Emergency O&M
 Emergency Troubleshooting is how to handle an emergency
situation to restore services as soon as possible to reduce loss
for carriers.

Fault Troublesh Restore


Fault locating ooting service
Information defini
collection tion
Emergency Restore
Troublesho service
oting

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page5
Emergency Conditions
 Take urgent measures immediately after any of the following
situations occur:
 A large number of subscribers report complaints.

 A large number of hardware failure alarms are generated.

 Alarms indicating link congestion or the start of flow control are


generated.

 Bearer network quality deteriorates significantly.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page6
Basic Troubleshooting Process

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page7
Contents
1. Overview of Emergency O&M

2. Technique of Emergency O&M

3. Common Operation of Emergency O&M

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page8
Overview
 Q:How to position the emergency fault quickly ?

 A:you can use the 3 shortcut :


License
Abnormality

Hardware faults/ Flow control


communication
failure Alarm
handle

Data Query Misoperatio


inconsistency data n fault

Restart

FE global
data
abnormality

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 9
3 shortcut——Alarm handle
Alarm Fault Troubleshooting

ALM-6418 Hardware Query detail and interrelated information,


Communication Failure fault/communicati position board or module which have
Between Boards on failure something fault.
ALM-1003 Module Fault
Restart the board or the module.
ALM-22004 DSG SCTP
Link Fault If alarm still exist, isolate the board.

ALM-12012 Flow Control Congestion/flow 1.Contact the peer MSC/SGSN engineers to


Started control
ALM-10001 Flow Control Congestion/flow start flow control to reduce the number of
Started control messages sent to the HLR
ALM-1707 MTP Buffer
Congestion 2.Add links between MSC/SGSN and HLR
ALM-1709 MTP L2
Congestion
ALM-1809 M3UA Link
Congestion

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 10
3 shortcut——query data
Step Operation Fault Troubleshooting
1 For all DRU/DSU Cluster,run Data load data to the slave nodes
DSP MDBVERIFY: CID=x, MDBVR=RUNR, Inconsist from the master node;data
UDT=ALLDATA; ency on the HLR needs to be
If the value of InconsistentObjects for one
synchronized to the VLR.
DRU or DSU cluster exceeds 200, data has
become inconsistent.
2 •Obtain the time point when call loss occurred. Misopera Position the misoperation and
•Analyze operation logs to check whether tion try to roll back the data
data is configured or maintenance commands Faults configuration.
are executed around the time point. If data is
configured or commands are executed, roll
back the data configuration.
3 Confirm with maintenance personnel about
whether hardware is changed(like replace
circuit or hardware) or other operation on
other ME around the time point.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 11
3 shortcut——restart
Step Operation Fault

1 Restart GU FEU one by one: FE global


1.Restart board. data
For the GUFEU board which have the DPU module , abnormality
we must wait another have DPU module board restart /other
successfully and no alarm about link , then can restart mode(like
board. In case of link interruption. resource
2.Wait board restart completely , we can see the abnormality
condition of board on device panel. If board and ,system
module are all green, indicate board restart halted
successfully. )
3.Then restart another one.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 12
Contents
2. Technique of Emergency O&M
 2.1 Board Fault
 2.2 Communication Failure
 2.3 Faults Caused by Misoperations
 2.4 Link Unavailable
 2.5 Flow Control
 2.6 Data Inconsistency
 2.7 High DS Node Memory Usage
 2.8 Fault in Bearer Network Between Load-Sharing FEs
 2.9 Service Provisioning Fault
 2.10 Data Loss
 2.11 License Abnormality
 2.12 Switchover to the Redundancy HLR

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page13
Board Fault
 A fault occurs in hardware, such as boards, modules, and LAN
switches. The fault causes service failures such as call loss.

 Identification Methods

 A board is faulty if ALM-2001 Board Fault is displayed in the


Browse Alarms window.

 Troubleshooting Process
 Locate the board based on the subrack number, slot number,
and module number in the alarm, and restart the board.
 If the fault persists after the restart, isolate the board.
 If the fault still persists after the isolation, replace the board.
Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page14
Board Fault
Troubleshooting Process
Board Fault

Check that
NO
the value of set it to 0
LDRSOUR
CE is 0
YES
Isolating NO
YES
Boards
NO
Restart the
board
Replacin
YES g Board

End YES

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page15
Case:Problem on network card of DSU
due to service fault
Problem Description
Some Subscriber in operator X complain that they cannot receive calls or
they unable to connect the PLMN. By tracing the text number who call the
complaining subscriber,it is found that when MSC server send “send
routing information request ” to HLR,HLR return “ unknown
Subscriber ”. Query Alarm in the alarm panel, only find a correlative alarm
“ALM-4403 Communication Failure Between the OMU and Board”.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 16
Case:Problem on network card of DSU
due to service fault(cont.)
Solution

1. Display Detail of Alarm , find DSU board in 9 slot of 2 subrack is


communication failure with the OMU.

2. Restart of the DSU board , when alarm is cleared , service is ,


complaining subscriber can receive calls.

Cause Analysis
After check running log of DSU , found there is problem on network card of DSU, one
BASE network card of DSU is border on switch. System deplete a large number of
resource try to restart board . Due to this board deal with message overtime.
Cause service fault.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 17
Contents
2. Technique of Emergency O&M
 2.1 Board Fault
 2.2 Communication Failure
 2.3 Faults Caused by Misoperations
 2.4 Link Unavailable
 2.5 Flow Control
 2.6 Data Inconsistency
 2.7 High DS Node Memory Usage
 2.8 Fault in Bearer Network Between Load-Sharing FEs
 2.9 Service Provisioning Fault
 2.10 Data Loss
 2.11 License Abnormality
 2.12 Switchover to the Redundancy HLR

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page18
Communication Failure
 Fault Identification

 A communication failure occurs when parts used for inter-


board communication (such as board communication modules,
SWUs, and network ports) become unavailable.

 A communication failure has occurred if any of the following


alarms is displayed in the Browse Alarms window:
 ALM-6418 Communication Failure Between Boards

 ALM-2069 Communication Failure Between Modules

 ALM-12001 Node Heartbeat Timed Out

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page19
Communication Failure between board

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page20
Communication Failure between
module

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page21
Case: Communication Failure between
DRU and DSG
Problem Description
Some Subscriber in operator X complain that they cannot receive calls
or they unable to connect the PLMN.
By tracing the text number who call the complaining subscriber,it is
found that when MSC server send “send routing information request ”
to HLR,HLR return “ unknown Subscriber ”.
Query Alarm in the alarm panel, find a correlative alarm “ALM-2069
Communication Failure Between Modules”.

Display Detail of Alarms , find DSG module in 4 slot of 0 subrack


is communication failure with the DRU in 8 slot and 5 slot of 0
subrack .

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 22
Case: Communication Failure between
DRU and DSG (Cont.)
Solution

1. Via detail of Alarms ,we can infer that DRU module have
something fault .

2. Restart DRU module , service recovery.

Cause Analysis
Node information breakdown because of DSG progress quit
abnormally , cause DSG do not send heartbeat to DRU.DRU do not
send message to DSG.dut to service fault.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 23
Contents
2. Technique of Emergency O&M
 2.1 Board Fault
 2.2 Communication Failure
 2.3 Faults Caused by Misoperations
 2.4 Link Unavailable
 2.5 Flow Control
 2.6 Data Inconsistency
 2.7 High DS Node Memory Usage
 2.8 Fault in Bearer Network Between Load-Sharing FEs
 2.9 Service Provisioning Fault
 2.10 Data Loss
 2.11 License Abnormality
 2.12 Switchover to the Redundancy HLR

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page24
Faults Caused by Misoperations
 Identification Methods
 Misoperations during data configuration, maintenance command
execution, or hardware or cable installation, cause service failures.

 Incorrect data configuration is the most common one. If data is


incorrectly configured, no alarm is generated, but subscribers
experience call loss.

 The most important is obtain the time point when call loss
occurred.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page25
Faults Caused by Misoperations
 Troubleshooting Process Misoperations

obtain the time point


when call loss
occurred.
check whether
operation logs

whether replaced
important hardware

Check key
performance
measurement entities

End

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page26
Case:modify Multi-MSISDN area
table due to service fault
Problem Description
Some Subscriber in operator X complain that they cannot receive
calls or they unable to connect the PLMN. All the complaining
subscriber sign the Multi-MSISDN. Tracing the number , found HLR
have problem. Query Alarm in the alarm panel of the HLR, find no
Alarm.

Solution
1. Analyze the service success rates indicated by key performance
measurement entities described to obtain the time point when call
loss occurred.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 27
Case:modify Multi-MSISDN area
table due to service fault(Cont.)
2. Run LST OPTLOG to check operation logs recorded the execution
of configuration commands, find add Multi-MSISDN area table
before service fault.

3. Roll back the data configuration ,service recovered.

Cause Analysis
Before add Multi-MSISDN area table , did not MOD the
maximum number of subscribers ,due to some subscribers have
been delete , due to these subscribers update location failed .

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 28
Contents
2. Technique of Emergency O&M
 2.1 Board Fault
 2.2 Communication Failure
 2.3 Faults Caused by Misoperations
 2.4 Link Unavailable
 2.5 Flow Control
 2.6 Data Inconsistency
 2.7 High DS Node Memory Usage
 2.8 Fault in Bearer Network Between Load-Sharing FEs
 2.9 Service Provisioning Fault
 2.10 Data Loss
 2.11 License Abnormality
 2.12 Switchover to the Redundancy HLR

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page29
Link Unavailable-Link Interruption
 Identification Methods

 You can determine that a link is interrupted when any of the


following alarms is generated in the Browse Alarms window:

 ALM-1705 MTP Link Failed

 ALM-1811 M3UA Link Fault

 ALM-1701 MTP DSP Inaccessible

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page30
Link Unavailable-Link Interruption
 Troubleshooting Process
Link Interruption

YE
Check whether a large S Perform a redundancy
number of link alarms are switchover
generated

NO

Check whether link


YE performing operations
congestion has S described in Link Congestion
occurred.

NO
Contact Huawei technical
support.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page31
Link Unavailable-Link Congestion
 Troubleshooting Process
Link Congestion

Check the number of existing


links

Check whether the


number of existing
YES links exceeds 16 NO
Add more linksCheck whether
Contact the peer MSC engineers to NO alarms are cleared
start flow controlCheck whether
alarms are cleared

YES NO Contact Huawei technical


End support

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page32
Case: Problem on Switch board due to
service fault
Problem Description
Some Subscriber in operator X complain that they cannot receive
calls or they unable to connect the PLMN. By tracing the text
number who call the complaining subscriber,it is found that when
MSC server send “send routing information request ” to HLR,HLR
return “ unknown Subscriber ”. Query Alarm in the alarm panel,
only find a correlative alarm “ALM-12001 Node Heartbeat Timed
Out”.
Solution

1. Display Detail of Alarm , find all DSU boards in 2 subrack is

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 33
Case: Problem on Switch board due to
service fault
communication failure with the DRU boards in 1 subrack, infer it
is possible that board is fault. But didn’t find any “Board Fault”
alarm can’t position which board has something fault , restart DRU
and DSU board can’t solve the problem.

2. check that the redundancy HLR is available and switch services to


the redundancy HLR.
Cause Analysis
Check board running log and restart board,found SWU of 2
subrack is border on switch,cause service fault, Replace the
board. the alarm is cleared.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 34
Contents
2. Technique of Emergency O&M
 2.1 Board Fault
 2.2 Communication Failure
 2.3 Faults Caused by Misoperations
 2.4 Link Unavailable
 2.5 Flow Control
 2.6 Data Inconsistency
 2.7 High DS Node Memory Usage
 2.8 Fault in Bearer Network Between Load-Sharing FEs
 2.9 Service Provisioning Fault
 2.10 Data Loss
 2.11 License Abnormality
 2.12 Switchover to the Redundancy HLR

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page35
Flow Control
 Identification Methods
 The HLR starts flow control to discard some signaling messages
when it is overwhelmed with signaling messages, which causes
service failures.

 Flow control protects the HLR from breaking down when


overloaded.

 You can determine that flow control has occurred when any of the
following alarms is generated in the Browse Alarms window:

 ALM-12012 Flow Control Started

 ALM-10001 Flow Control Started

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page36
Flow Control
 Troubleshooting Process

Flow control

Stop tasks of low


priority

Stop batch operation


tasks

End

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page37
Case : Link Congestion
Problem Description
operator X complain that call success ratio has descend. Query Alarm in the
alarm panel of MSC, find “MTP Buffer Congestion” and ” MTP Link Failed”.

Solution
Query Alarm in the alarm panel of HLR, Link Congestion is major reason.

1. Contact the peer MSC engineers to start flow control to reduce the number of messages
sent to the HLR

2. Add 12 TDM 64K links between UMG and HLR , service recover.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 38
Contents
2. Technique of Emergency O&M
 2.1 Board Fault
 2.2 Communication Failure
 2.3 Faults Caused by Misoperations
 2.4 Link Unavailable
 2.5 Flow Control
 2.6 Data Inconsistency
 2.7 High DS Node Memory Usage
 2.8 Fault in Bearer Network Between Load-Sharing FEs
 2.9 Service Provisioning Fault
 2.10 Data Loss
 2.11 License Abnormality
 2.12 Switchover to the Redundancy HLR

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page39
Data Inconsistency
 Identification Methods
 If data is inconsistent between the master node and slave
nodes, services fail sometimes. For example, subscribers
sometimes cannot receive calls. To determine whether data has
become inconsistent, perform the following operations:
 Run LST CLUSTER to query all DRU and DSU clusters.
 Run DSP MDBVERIFY to check for data consistency between
DRU or DSU nodes in each cluster. If the value of Inconsistent
Objects for one DRU or DSU cluster exceeds 200, data has
become inconsistent.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page40
Data Inconsistency(Cont.)
Data
 Troubleshooting Process Inconsistency

Set the value of


LDRSOURCE to 0

load data from the


master node

synchronize data to
the VLR

Check the value of


InconsistentObjects is
0

End

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page41
Contents
2. Technique of Emergency O&M
 2.1 Board Fault
 2.2 Communication Failure
 2.3 Faults Caused by Misoperations
 2.4 Link Unavailable
 2.5 Flow Control
 2.6 Data Inconsistency
 2.7 High DS Node Memory Usage
 2.8 Fault in Bearer Network Between Load-Sharing FEs
 2.9 Service Provisioning Fault
 2.10 Data Loss
 2.11 License Abnormality
 2.12 Switchover to the Redundancy HLR

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page42
High DS Node Memory Usage
 Identification Methods
 When the DS node is overloaded, its memory usage becomes high,
and it cannot process some services. You can determine that the
DS node memory usage is high when ALM-369 Node Memory
Usage Too High or ALM-371 Memory of the Node Insufficient is
generated in the Browse Alarms window.

 Troubleshooting Process
 Stop service provisioning and delete redundant data to reduce DS
node memory usage. If the memory usage still remains high,
expand the system capacity.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page43
High DS Node Memory Usage(Cont.)
High DS Node
 Troubleshooting Process Memory Usage

Stop batch operation


tasks

delete the IMSIs that are


not associated with any
MSISDN

delete the IMSIs whose


accounts have expired

withdraw the GPRS


whose APNs are invalid

End

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page44
Contents
2. Technique of Emergency O&M
 2.1 Board Fault
 2.2 Communication Failure
 2.3 Faults Caused by Misoperations
 2.4 Link Unavailable
 2.5 Flow Control
 2.6 Data Inconsistency
 2.7 High DS Node Memory Usage
 2.8 Fault in Bearer Network Between Load-Sharing FEs
 2.9 Service Provisioning Fault
 2.10 Data Loss
 2.11 License Abnormality
 2.12 Switchover to the Redundancy HLR

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page45
Fault in Bearer Network Between
Load-Sharing FEs
 Identification Methods
 The active and redundancy HLRs whose FEs work in load-sharing
mode use a bearer network to communicate. Once the bearer
network becomes unavailable, services may fail. You can
determine that the bearer network is unavailable if the following
situations occur:
 ALM-12011 Redundancy Failed or ALM-12001 Node Heartbeat Timed
Out is generated in the Browse Alarms window, and the values of
Location ID and Destination location ID are different in ALM-
12001 Node Heartbeat Timed Out.
 The Network QoS Measurement unit on the FE and the Bearer
Network QoS Measurement unit on the BE indicate a decrease of at
least 5% in bearer network quality compared with the previous day.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page46
Fault in Bearer Network Between
Load-Sharing FEs(Cont.) Fault in Bearer
Network
 Troubleshooting Process
Query the values of
Link name of all links
On the redundancy
HLR

Deactivate all links

synchronize data to
the VLR

Check whether Alarm


is cleared

End

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page47
Contents
2. Technique of Emergency O&M
 2.1 Board Fault
 2.2 Communication Failure
 2.3 Faults Caused by Misoperations
 2.4 Link Unavailable
 2.5 Flow Control
 2.6 Data Inconsistency
 2.7 High DS Node Memory Usage
 2.8 Fault in Bearer Network Between Load-Sharing FEs
 2.9 Service Provisioning Fault
 2.10 Data Loss
 2.11 License Abnormality
 2.12 Switchover to the Redundancy HLR

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page48
Service Provisioning Fault
 Identification Methods

 A service provisioning fault has occurred if the provisioning


system remains disconnected from the HLR for over 15 minutes,
or a large number of service provisioning commands fail.

 Troubleshooting Process

 Check the network connection between the HLR and the


provisioning system, and check whether the number of
subscribers served by the HLR has reached the limit. If the fault
persists, perform a redundancy switchover.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page49
Service Provisioning Fault(Cont.)
Service
Provisioning Fault
 Troubleshooting Process
run the ping command
to check the network
connection

Check network cable is


connection

Check the indicator of


the port is blinking
green

Check the router and


LAN switch

End

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page50
Contents
2. Technique of Emergency O&M
 2.1 Board Fault
 2.2 Communication Failure
 2.3 Faults Caused by Misoperations
 2.4 Link Unavailable
 2.5 Flow Control
 2.6 Data Inconsistency
 2.7 High DS Node Memory Usage
 2.8 Fault in Bearer Network Between Load-Sharing FEs
 2.9 Service Provisioning Fault
 2.10 Data Loss
 2.11 License Abnormality
 2.12 Switchover to the Redundancy HLR

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page51
Data Loss
 Identification Methods

 Subscriber data loss is caused by database abnormality.


Subscribers whose data is lost cannot use services.

 Troubleshooting Process

 Compare the subscriber list in the physical database in use with


the subscriber lists in the backup database and operation logs
to identify the subscribers whose data is lost. Then redefine the
subscribers.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page52
Data Loss(Cont.) Data Loss

Export the subscriber list


 Troubleshooting Process
Obtain the subscriber list
generated a week ago

obtain the list of subscribers


who are defined

obtain the list of subscribers


who are removed
Obtain the list of
subscribers whose data is
lost

Redefine the subscribers

End

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page53
Case : Subscriber data is missing after
the upgrade of HLR
Problem Descriptions
 On 18th Dec 2009, The HLR hard disk controller has switched over, after
checked by the IBM engineer, the switchover is caused by the controller
microcode, hence plan to perform system microcode upgrade on 21st Dec
2009. After completing the disk array system upgrade, the following
problem is found in the customer report:

 Part of the cross MSC/VLR roaming subscribers data are inconsistent with
the HLR, the subscriber has roamed to the new VLR and perform location
update, the new VLR has also obtain the subscriber info, however, the
subscriber data in the HLR still shows the old VLR number, causing the
subscriber unable to receive calls.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 54
Case : Subscriber data is missing after
the upgrade of HLR
 Solution
 First , Rollback the dynamic subscriber data in the 3 clusters to one day earlier,
affected subscriber is estimated to be 78000 * 3 = 234000

 Solution 1: Send Reset to the entire network


 This method has great impact but able to quickly solve the inconsistency of
HLR/VLR

 Solution 2:
 Check with the MSC expert if forced location update time is set in the MSC
(meaning at the time of the forced location updated, the MSC will forcibly send
location update to the HLR)
 If the forced location update time is set in the MSC, the subscribers will perform
location update to the HLR at the specified time, so the dynamic data in the HLR
and the VLR is consistent. This method has little impact but it takes longer time
to solve the problem

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 55
Case : Subscriber data is missing after
the upgrade of HLR
 Cause Analysis

 According to the first line maintenance personnel: during the syn process
restart, the dbmon process in board number 4 and 8, frame number 2, has
been accidentally killed

 By checking the respawn process log in the 2 boards, it is found that the
dbmon process has indeed been restarted, and the time in log also matched
with the operation time described by the maintenance personnel

 Suggestion and Conclusion

 In normal case, for inconsistency of dynamic data, the affected


subscribers need to perform location update to solve the problem

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 56
Contents
2. Technique of Emergency O&M
 2.1 Board Fault
 2.2 Communication Failure
 2.3 Faults Caused by Misoperations
 2.4 Link Unavailable
 2.5 Flow Control
 2.6 Data Inconsistency
 2.7 High DS Node Memory Usage
 2.8 Fault in Bearer Network Between Load-Sharing FEs
 2.9 Service Provisioning Fault
 2.10 Data Loss
 2.11 License Abnormality
 2.12 Switchover to the Redundancy HLR

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page57
License Abnormality
 Identification Methods

 A license abnormality occurs when the license expires and the


license control item returns to its default value, which affects
services. You can determine that a license abnormality has
occurred if any of the following situations occurs:

 ALM-12015 Number of Subscribers Reached the Threshold


Specified by the License exists in the Browse Alarms window.

 Running phase is Expired in the DSP LICENSE command


output.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page58
License Abnormality(Cont.)
 Troubleshooting Process

License Abnormality

check whether LICENSE


Running phase is
Expired

check whether HLR


maximum number of
subscribers is set to a small
value

END

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page59
Case: Number of Subscribers Reached
the Threshold
Problem Description
Operator find there is “ALM-12015 Number of Subscribers Reached
the Threshold Specified by the License ”in the alarm panel. Some
Subscriber in operator X complain that they cannot connect the PLMN.

Solution
1. Run DSP LICENSE find Running phase is not Expired.

2. Run LST HLRSN to check HLR maximum number of subscribers is set


to 100000.

3. Adjust the value of HLR maximum number of subscribers based on site


situation, the alarm is cleared

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 60
Contents
2. Technique of Emergency O&M
 2.1 Board Fault
 2.2 Communication Failure
 2.3 Faults Caused by Misoperations
 2.4 Link Unavailable
 2.5 Flow Control
 2.6 Data Inconsistency
 2.7 High DS Node Memory Usage
 2.8 Fault in Bearer Network Between Load-Sharing FEs
 2.9 Service Provisioning Fault
 2.10 Data Loss
 2.11 License Abnormality
 2.12 Switchover to the Redundancy HLR

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page61
Switchover to the Redundancy HLR
 Scenarios
 Switch services from the active HLR to the redundancy HLR when
the active HLR cannot recover from the following faults or damage
in a timely manner:

 Hardware or software faults

 Unavailability of the channels (such as an IP bearer network, a


transport network, or links) between the HLR and another NE

 Faults in the power supply system

 Damage caused by an external force

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page62
Check the Data Consistency
 Procedure
1. Run the LST CLUSTER command, without specifying any parameter, to
check the information about all the clusters. Then, record all the cluster
IDs.
2. Run the DSP NODE command, without specifying any parameter, to
check the information about all the nodes. Then, record the IDs of the
master nodes in the clusters.
3. Run the DSP NODE command, with Node ID set to the queried node IDs,
to check the value of Number of the node objects. Then, record the
queried values.
4. Check the values of Number of the node objects for the master nodes,
which are configured with the same cluster IDs on the active and
redundancy HLRs. If the values queried on the redundancy HLR are the
same as those queried on the active HLR, it indicates that the data on the
redundancy HLR is consistent with that on the active HLR.
Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page63
USCDB Switchover
 Prerequisites

 Conditions
 A BE switchover is required when a pair of DRUs or DSUs of the
active HLR are faulty simultaneously.

 Data
 The following data is required:

 Priorities for the FEs to access the BEs

 The priorities are configured according to the data planning.

 IP addresses of the PGW of the redundancy HLR

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page64
USCDB Switchover
 Workflow

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page65
FE Switchover
 Conditions
 An FE switchover is required when one of the following conditions
is met:
 All the links from the active HLR to the peer NEs are faulty, or most
links are faulty and services cannot be provided normally. Alarms
ALM-1705 MTP Link Failed and ALM-1811 M3UA Link Fault are
generated when links are faulty.
 The FEU boards of the active HLR are faulty and cannot process
services.
 The BSG or CCU modules of the active HLR are all faulty.
 Data
 Names of the links between the active HLR and the peer NEs

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page66
FE Switchover
Workflow

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page67
Provisioning System Switchover
 Prerequisites

 Conditions
 A provisioning system switchover is required when one of the
following conditions is met:

 A pair of PGW boards is faulty and cannot provide services.

 The communication between the provisioning system and the


active HLR is interrupted.

 Data
 IP addresses of the PGW of the redundancy HLR

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page68
Provisioning System Switchover
Workflow

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page69
Information collecting
For emergency ,which can not be solved by
yourself ,you can connect HUAWEI operator;
collect information as follows:
First :basic information
1. Machine version and connect NES.

2. Alarm information(excel).

3. 2days text information before accident happen. After this


modify the interval from 1hour to o.5 hour.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 70
Log collecting
5. Key command:DSP LICRATE,DSP NODE,DSP MDBVERIFY (txt) .
6. Configuration: CGP ME :EXP MML
Second:log

FE Log of FE in this catalogue:


/opt/HUAWEI/cgp/workshop/omu/share/run_lo
g/dev_log/ME_ME number/DEBUG/
Get the new log.
BE Log of FE in this catalogue::
/opt/ne/MEID/proc/workspace0/prog
Get the new log.

Third :After HUAWEI operator analysis , the information they


want.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page 71
Summary
 In this course we discuss about Emergency O&M ,Service Faults,
Typical Service Faults Case.

Copyright © 2017 Huawei Technologies Co., Ltd. All rights reserved. Page72
Thank you
www.huawei.com

You might also like