PowerHA 7.2.1 sg248372
PowerHA 7.2.1 sg248372
Dino Quintero
Shawn Bodily
Bernhard Buehler
Bunphot Chuprasertsuk
Bing He
Maria-Katharina Esser
Fabio Martins
Matthew W Radford
Antony Steel
Redbooks
International Technical Support Organization
May 2017
SG24-8372-00
Note: Before using this information and the product it supports, read the information in “Notices” on
page ix.
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Chapter 2. IBM PowerHA SystemMirror V7.2.0 and V7.2.1 for IBM AIX new features 17
2.1 Resiliency enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.1 Integrated support for AIX Live Kernel Update . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 Automatic Repository Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.3 Verification enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.4 Using Logical Volume Manager rootvg failure monitoring. . . . . . . . . . . . . . . . . . . 21
2.1.5 Live Partition Mobility automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Cluster Aware AIX enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.1 Network failure detection tunable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.2 Built-in NETMON logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.3 Traffic stimulation for better interface failure detection . . . . . . . . . . . . . . . . . . . . . 26
2.2.4 Monitoring /var usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.5 New lscluster option -g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.6 CAA level added to the lscluster -c output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Enhanced split-brain handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Resource Optimized High Availability fallovers by using enterprise pools . . . . . . . . . . 27
2.5 Nondisruptive upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Geographic Logical Volume Manager wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7 New option for starting PowerHA by using clmgr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.8 Graphical user interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering
Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1 Cluster Aware AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.1.1 Cluster Aware AIX tunables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.1.2 What is new in Cluster Aware AIX: Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.1.3 Monitoring /var usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.1.4 New lscluster option -g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1.5 Network Failure Detection Time (FDT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2 Automatic repository update for the repository disk . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2.1 Introduction to the Automatic Repository Update . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2.2 Requirements for Automatic Repository Update. . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2.3 Configuring Automatic Repository Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2.4 Automatic Repository Update operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3 New manage option to start PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.4 Reliable Scalable Cluster Technology overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.4.1 What Reliable Scalable Cluster Technology is . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.4.2 Reliable Scalable Cluster Technology components . . . . . . . . . . . . . . . . . . . . . . . 94
4.5 PowerHA, Reliable Scalable Clustering Technology, and Cluster Aware AIX . . . . . . 103
4.5.1 Configuring PowerHA, Reliable Scalable Clustering Technology, and Cluster Aware
AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.5.2 Relationship between PowerHA, Reliable Scalable Clustering Technology, and
Cluster Aware AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.5.3 How to start and stop CAA and RSCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Contents v
6.10 Example 2: Setting up one Resource Optimized High Availability cluster (with On/Off
CoD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
6.10.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
6.10.2 Hardware topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
6.10.3 Cluster configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.10.4 Showing the Resource Optimized High Availability configuration. . . . . . . . . . . 217
6.11 Test scenarios for Example 2 (with On/Off CoD) . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
6.11.1 Bringing two resource groups online. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
6.11.2 Bringing one resource group offline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
6.12 Hardware Management Console high availability introduction . . . . . . . . . . . . . . . . . 226
6.12.1 Switching to the backup HMC for the Power Enterprise Pool . . . . . . . . . . . . . . 228
6.13 Test scenario for HMC fallover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
6.13.1 Hardware topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
6.13.2 Bringing one resource group offline when the primary HMC fails . . . . . . . . . . . 232
6.13.3 Testing summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
6.14 Managing, monitoring, and troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
6.14.1 The clmgr interface to manage Resource Optimized High Availability . . . . . . . 237
6.14.2 Changing the DLPAR and CoD resources dynamically . . . . . . . . . . . . . . . . . . 240
6.14.3 View the Resource Optimized High Availability report . . . . . . . . . . . . . . . . . . . 241
6.14.4 Troubleshooting DLPAR and CoD operations . . . . . . . . . . . . . . . . . . . . . . . . . 241
Contents vii
10.9.4 Cluster split . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
10.9.5 Cluster merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
10.9.6 Scenario summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
10.10 Scenario: Split and merge policy is manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
10.10.1 Scenario description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
10.10.2 Split and merge configuration in PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
10.10.3 Cluster split . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
10.10.4 Cluster merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
10.10.5 Scenario summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
10.11 Scenario: Active node halt policy quarantine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
10.11.1 Scenario description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
10.11.2 HMC password-less access configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
10.11.3 HMC configuration in PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
10.11.4 Quarantine policy configuration in PowerHA. . . . . . . . . . . . . . . . . . . . . . . . . . 391
10.11.5 Simulating a cluster split . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
10.11.6 Cluster merge occurs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
10.11.7 Scenario summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
10.12 Scenario: Enabling the disk fencing quarantine policy . . . . . . . . . . . . . . . . . . . . . . 395
10.12.1 Scenario description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
10.12.2 Quarantine policy configuration in PowerHA. . . . . . . . . . . . . . . . . . . . . . . . . . 396
10.12.3 Simulating a cluster split . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
10.12.4 Simulating a cluster merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
10.12.5 Scenario summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
This information was developed for products and services offered in the US. This material might be available
from IBM in other languages. However, you may be required to own a copy of the product or product version in
that language in order to access it.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user’s responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, MD-NC119, Armonk, NY 10504-1785, US
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without
incurring any obligation to you.
The performance data and client examples cited are presented for illustrative purposes only. Actual
performance results may vary depending on specific configurations and operating conditions.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
Statements regarding IBM’s future direction or intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to actual people or business enterprises is entirely
coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are
provided “AS IS”, without warranty of any kind. IBM shall not be liable for any damages arising out of your use
of the sample programs.
The following terms are trademarks or registered trademarks of International Business Machines Corporation,
and might also be trademarks or registered trademarks in other countries.
AIX® Power Systems™ Redpaper™
DS8000® POWER6® Redbooks (logo) ®
GPFS™ POWER7® RS/6000®
HACMP™ POWER8® Storwize®
HyperSwap® PowerHA® SystemMirror®
IBM® PowerVM® XIV®
IBM Spectrum™ PureSystems®
POWER® Redbooks®
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other company, product, or service names may be trademarks or service marks of others.
This IBM® Redbooks® publication helps strengthen the position of the IBM PowerHA®
SystemMirror® solution with a well-defined and documented deployment models within an
IBM Power Systems™ virtualized environment, which provides customers with a planned
foundation for business resilience and disaster recovery for their IBM Power Systems
infrastructure solutions.
This publication addresses topics to help meet customers’ complex high availability and
disaster recovery requirements on IBM Power Systems servers to help maximize their
systems’ availability and resources, and provide technical documentation to transfer the
how-to-skills to users and support teams.
Authors
This book was produced by a team of specialists from around the world working at the
International Technical Support Organization, Poughkeepsie Center.
Dino Quintero is a Complex Solutions Project Leader and an IBM Level 3 Certified Senior IT
Specialist with the ITSO in Poughkeepsie, New York. His areas of expertise include enterprise
continuous availability, enterprise systems management, system virtualization, technical
computing, and clustering solutions. He is an Open Group Distinguished IT Specialist. Dino
holds a Master of Computing Information Systems degree and a Bachelor of Science degree
in Computer Science from Marist College.
Shawn Bodily is an IBM Champion for Power Systems and a Senior IT Consultant for Clear
Technologies in Dallas, Texas. He has 24 years of IBM AIX® experience with the last 20 years
specializing in high availability and disaster recovery that is primarily focused around
PowerHA SystemMirror. He is a double AIX Advanced Technical Expert, and is certified in
IBM POWER® Systems and IBM Storage. He has written and presented extensively about
high availability and storage at technical conferences, webinars, and onsite to customers. He
is an IBM Redbooks platinum author who has co-authored 10 other IBM Redbooks
publications and three IBM Redpaper™ publications.
Bernhard Buehler is an IT Specialist in Germany. He works for IBM STG Lab Services in La
Gaude, France. He has worked at IBM for 35 years and has 26 years of experience in AIX and
the availability field. His areas of expertise include AIX, PowerHA SystemMirror, HA
architecture, script programming, and AIX security. He is a co-author of several IBM
Redbooks publications. He is also a co-author of several courses in the IBM AIX curriculum.
Maria-Katharina Esser is an IT Specialist for pre-sales technical support and works for the
IBM System and Technology Group (STG) in Munich, Germany. She has worked for IBM for
28 years, and has 17 years of experience in AIX, POWER, and storage.
Fabio Martins is a Senior Software Support Specialist with IBM Technical Support Services
in Brazil. He has worked at IBM for 12+ years. His areas of expertise include IBM AIX, IBM
PowerVM, IBM PowerKVM, IBM PowerHA SystemMirror, PowerVC, IBM PureSystems®, IBM
DS8000®, IBM Storwize®, Linux, and Brocade SAN switches and directors. He is a Certified
Product Services Professional in Software Support Specialization and a Certified Advanced
Technical Expert on IBM Power Systems. He has worked extensively on IBM Power Systems
for Brazilian customers, providing technical leadership and support, including how-to
questions, problem determination, root cause analysis, performance concerns, and other
general complex issues. He holds a bachelor degree in Computer Science from Universidade
Paulista (UNIP).
Matthew W Radford is a UNIX support specialist in the United Kingdom. He has worked in
IBM for 18 years and has eight years of experience in AIX and High Availability Cluster
Multi-Processing (IBM HACMP™). He holds a degree in Information Technology from the
University of Glamorgan. Matt has co-authored two other IBM Redbooks publications.
Anthony Steel is a Senior IT Specialist in ITS Singapore. He has 22 years experience in the
UNIX field, predominately AIX and Linux. He holds an honors degree in Theoretical
Chemistry from the University of Sydney. His areas of expertise include scripting, system
customization, performance, networking, high availability, and problem solving. He has written
and presented on LVM, TCP/IP, and high availability both in Australia and throughout Asia
Pacific.
Octavian Lascu
International Technical Support Organization, Poughkeepsie Center
Paul Moyer, Mike Coffey, Rajeev Nimmagadda, Prasad Dasari, Sharath Kacham
Aricent Technologies, an IBM Business Partner
Paul Desgranges
Groupe BULL
Kwan Ho Yau
IBM China
Ravi Shankar, Steven Finnes, Minh Pham, Alex Mcleod, Tom Weaver, Teresa Pham, Gus
Schlachter, Isaac Silva, Alexa Mcleod, Timothy Thornal, PI Ganesh, Gary Lowther, Gary
Domrow, Esdras E Cruz-Aguilar
IBM US
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
[email protected]
Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Preface xiii
xiv IBM PowerHA SystemMirror V7.2.1 for IBM AIX Updates
1
RSCT is a set of low-level operating system components that allow the implementation of
clustering technologies, such as IBM Spectrum™ Scale (formerly GPFS™). RSCT is
distributed with AIX. On the current AIX release, AIX 7.2, RSCT is Version 3.2.1.0. After
installing PowerHA and CAA file sets, the RSCT topology services subsystem is deactivated
and all its functions are performed by CAA.
PowerHA Version 7.1 and later relies heavily on the CAA infrastructure that was introduced in
AIX 6.1 TL6 and AIX 7.1. CAA provides communication interfaces and monitoring provision
for PowerHA and execution by using CAA commands with clcmd.
PowerHA Enterprise Edition also provides disaster recovery functions such as cross-site
mirroring, IBM HyperSwap®, Geographical Logical Volume Mirroring, and many
storage-based replication methods. These cross-site clustering methods support PowerHA
functions between two geographic sites. For more information, see the IBM PowerHA
SystemMirror 7.1.2 Enterprise Edition for AIX, SG24-8106.
For more information about features that are added in PowerHA V7.1.1 and later, see 1.3,
“History and evolution” on page 6.
High availability solutions can help to eliminate single points of failure through appropriate
design, planning, selection of hardware, configuration of software, control of applications, a
carefully controlled environment, and change management discipline.
In short, you can define high availability as the process of ensuring, by using duplicated or
shared hardware resources that are managed by a specialized software component, that an
application stays up and available for use.
A short definition for cluster multiprocessing might be multiple applications running over
several nodes with shared or concurrent access to the data.
PowerHA is only one of the high availability technologies, and it builds on increasingly reliable
operating systems, hot-swappable hardware, and increasingly resilient applications, by
offering monitoring and automated response.
A high availability solution that is based on PowerHA provides automated failure detection,
diagnosis, application recovery, and node reintegration. PowerHA can also provide excellent
horizontal and vertical scalability by combining other advanced functions, such as dynamic
logical partitioning (DLPAR) and Capacity on Demand (CoD).
The highly available solution for IBM Power Systems offers distinct benefits:
Proven solution with 27 years of product development
Using off-the-shelf hardware components
Proven commitment for supporting your customers
IP version 6 (IPv6) support for both internal and external cluster communication
Smart Assist technology enabling high availability support for all prominent applications
Flexibility (virtually any application running on a stand-alone AIX system can be protected
with PowerHA)
Figure 1-1 shows a typical PowerHA environment with both IP and non-IP heartbeat
networks. Non-IP heartbeat uses the cluster repository disk and an optional storage area
network (SAN).
The role of PowerHA is to manage the application recovery after the outage. PowerHA
provides monitoring and automatic recovery of the resources on which your application
depends.
Good design can remove single points of failure in the cluster: Nodes, storage, and networks.
PowerHA manages these components and also the resources that are required by the
application (including the application start/stop scripts).
In addition, by using Cluster Single Point of Control (C-SPOC), other management tasks such
as modifying storage and managing users can be performed without interrupting access to
the applications that are running in the cluster. C-SPOC also ensures that changes that are
made on one node are replicated across the cluster in a consistent manner.
Starting with PowerHA V7.1, the CAA feature of the operating system is used to configure,
verify, and monitor the cluster services. This major change improves the reliability of PowerHA
because the cluster service functions now run in kernel space rather than user space. CAA
was introduced in AIX 6.1 TL6. At the time of writing, the current release is PowerHA V7.2.1.
Note: Additional details and examples of implementing these features are found in IBM
PowerHA SystemMirror Standard Edition 7.1.1 for AIX Update, SG24-8030.
Note: Additional details and examples of implementing some of these features are found in
IBM PowerHA SystemMirror 7.1.2 Enterprise Edition for AIX, SG24-8106.
Note: Additional details and examples of implementing some of these features are found in
IBM PowerHA SystemMirror for AIX Cookbook, SG24-7739.
1.4.1 Terminology
The terminology that is used to describe PowerHA configuration and operation continues to
evolve. The following terms are used throughout this book:
Node An IBM Power Systems (or LPAR) running AIX and PowerHA
that are defined as part of a cluster. Each node has a collection
of resources (disks, file systems, IP addresses, and
applications) that can be transferred to another node in the
cluster in case the node or a component fails.
Cluster A loosely coupled collection of independent systems (nodes) or
logical partitions (LPARs) that are organized into a network for
the purpose of sharing resources and communicating with
each other.
PowerHA defines relationships among cooperating systems
where peer cluster nodes provide the services that are offered
by a cluster node if that node cannot do so. These individual
nodes are responsible for maintaining the functions of one or
more applications in case of a failure of any cluster component.
Client A client is a system that can access the application running on
the cluster nodes over a local area network (LAN). Clients run
a client application that connects to the server (node) where
the application runs.
All components, CPUs, memory, and disks have a special design and provide continuous
service, even if one subcomponent fails. Only special software solutions can run on
fault-tolerant hardware.
Such systems are expensive and specialized. Implementing a fault-tolerant solution requires
much effort and a high degree of customization for all system components.
In such systems, the software that is involved detects problems in the environment, and
manages application survivability by restarting it on the same or on another available machine
(taking over the identity of the original machine node).
Therefore, eliminating all single points of failure (SPOF) in the environment is important. For
example, if the machine has only one network interface (connection), provide a second
network interface (connection) in the same node to take over in case the primary interface
providing the service fails.
Another important issue is to protect the data by mirroring and placing it on shared disk areas
that are accessible from any machine in the cluster.
The PowerHA software provides the framework and a set of tools for integrating applications
in a highly available system. Applications to be integrated in a PowerHA cluster can require a
fair amount of customization, possibly both at the application level and at the PowerHA and
AIX platform level. PowerHA is a flexible platform that allows integration of generic
applications running on the AIX platform, providing for highly available systems at a
reasonable cost.
White papers
– PowerHA V7.1 quick config guide
– Implementing PowerHA with Storwize V7000
– PowerHA with EMC V-Plex
– Tips and Consideration with Oracle 11gR2 with PowerHA on AIX
Tip: For more information about LKU, see AIX Live Updates.
Consider the following key points about PowerHA integrated support for LKUs:
LKU can be performed on only one cluster node at a time.
Support includes all PowerHA SystemMirror Enterprise Edition Storage replication
features, including HyperSwap and GLVM.
However, for asynchronous GLVM, you must swap to sync mode before LKU is performed,
and then swap back to async mode upon LKU completion.
During LKU operation, enhanced concurrent volume groups (VGs) cannot be changed.
Workloads continue to run without interruption.
When enabling AIX LKU through SMIT, the option is set to either yes or no. However, when
you use the clmgr command, the settings are true or false. The default is for it to be enabled
(yes/true).
To modify by using SMIT, complete the following steps, as shown in Figure 2-1:
1. Run smitty sysmirror and select Cluster Nodes and Networks → Manage Nodes →
Change/Show a Node.
2. Select the wanted node.
3. Set the Enable AIX Live Update operation field as wanted.
4. Press Enter.
Change/Show a Node
[Entry Fields]
* Node Name Jess
New Node Name []
Communication Path to Node [Jess] +
Enable AIX Live Update operation Yes +
Figure 2-1 Enabling the AIX Live Kernel Update operation
Here is an example of how to check the current value of this setting by using the clmgr
command:
[root@Jess] /# clmgr view node Jess |grep LIVE
ENABLE_LIVE_UPDATE="true"
Here is an example of how to disable this setting by using the clmgr command:
[root@Jess] /# clmgr modify node Jess ENABLE_LIVE_UPDATE=false
In order for the change to take effect, the cluster must be synchronized.
Logs that are generated during the AIX Live Kernel Update operation
The two logs that are used during the operation of an AIX LKU are both in the /var/hacmp/log
directory:
lvupdate_orig.log This log file keeps information from the original source system logical
partition (LPAR).
lvupdate_surr.log This log file keeps information from the target surrogate system LPAR.
Chapter 2. IBM PowerHA SystemMirror V7.2.0 and V7.2.1 for IBM AIX new features 19
Tip: A demonstration of performing an LKU is available in this YouTube video.
A maximum of six repository disks per site can be defined in a cluster. The backup disks are
polled once a minute by clconfd to verify that they are still viable for an ARU operation. The
steps to define a backup repository disk are the same as in previous versions of PowerHA.
These steps and examples of failure situations can be found in 4.2, “Automatic repository
update for the repository disk” on page 82.
The new detailed verification checks, which run only when explicitly enabled, include the
following actions:
The physical volume identifier (PVID) checks between the logical volume manager (LVM)
and object data manager (ODM) on various nodes.
Use AIX Runtime Expert checks for LVM and network file system (NFS).
Checks whether network errors exceed a predefined 5% threshold.
GLVM buffer size.
Security configuration, such as password rules.
Kernel parameters, such as network, Virtual Memory Manager (VMM), and so on.
[Entry Fields]
* Verify, Synchronize or Both [Both] +
* Include custom verification library checks [Yes] +
* Automatically correct errors found during [No] +
verification?
If the VG is set as a critical VG, any input/output (I/O) request failure starts the LVM metadata
write operation to check the state of the disk before returning the I/O failure. If rootvg has the
critical VG option set and if the system cannot access a quorum of rootvg disks or all rootvg
disks if quorum is disabled, then the node is failed with a message sent to the console.
You can set and validate rootvg as a critical VG by running the commands that are shown in
Figure 2-3. The command must run once because you use the clcmd CAA distributed
command.
Chapter 2. IBM PowerHA SystemMirror V7.2.0 and V7.2.1 for IBM AIX new features 21
Testing rootvg failure detection
In this environment, the rootvg is in Storwize V7000 logical unit numbers (LUNs) that are
connected to the PowerHA nodes by virtual Fibre Channel (FC) adapters. Simulating a loss of
any disk can often be accomplished in multiple ways, but often one of the following methods is
used:
From within the storage management, simply unmap the volumes from the host.
Unmap the virtual FC adapter from the real adapter on the Virtual I/O Server (VIOS).
Unzone the virtual worldwide port names (WWPNs) from the storage area network (SAN).
In this environment, we use the first option of unmapping from the storage side. The other two
options usually affect all of the disks rather than only rootvg. However, usually that is fine too.
After the rootvg LUN is disconnected and detected, a kernel panic ensues. If the failure
occurs on a PowerHA node that is hosting a resource group (RG), then an RG fallover occurs
as with any unplanned outage.
If you check the error report after restarting the system successfully, it has a kernel panic
entry, as shown in Example 2-1.
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
ASSERT STRING
PANIC STRING
Critical VG Force off, halting.
The node must be restarted and cluster services resumed. As always, when a node rejoins
the cluster, movement of RGs might be wanted, or happen automatically depending on the
cluster configuration.
Note: Previously, it was preferable to unmanage a node before performing LPM, but not
many users were aware of this.
Note: A deadman switch is an action that occurs when CAA detects that a node
has become isolated in a multinode environment. This setting occurs when
nodes are not communicating with each other through the network and the
repository disk.
The AIX operating system can react differently depending on the deadman
switch setting or the deadman_mode, which is tunable. The deadman switch mode
can be set to either force a system shutdown or generate an Autonomic Health
Advisor File System (AHAFS) event.
Chapter 2. IBM PowerHA SystemMirror V7.2.0 and V7.2.1 for IBM AIX new features 23
The following new cluster heartbeat settings are associated with the auto handling of LPM:
Node Failure Detection Timeout during LPM
If specified, this timeout value (in seconds) is used during an LPM instead of the Node
Failure Detection Timeout value.
You can use this option to increase the Node Failure Detection Timeout during the LPM
duration to ensure that it is greater than the LPM freeze duration to avoid any risk of
unwanted cluster events. Enter a value 10 - 600.
LPM Node Policy
This specifies the action to be taken on the node during an LPM operation.
If unmanage is selected, the cluster services are stopped with the Unmanage Resource
Groups option during the duration of the LPM operation. Otherwise, PowerHA
SystemMirror continues to monitor the RGs and application availability.
As is common, these options can be set by using both SMIT and the clmgr command line. To
change these options by using SMIT, run smitty sysmirror and select Custom Cluster
Configuration → Cluster Nodes and Networks → Manage the Cluster → Cluster
Heartbeat Settings, as shown in Figure 2-4.
[Entry Fields]
An example of using clmgr to check and change these settings is shown in Example 2-2.
No matter which method you chose to change these settings, the cluster must be
synchronized for the change to take effect cluster-wide.
Note: The listed AIX and PowerHA levels are the preferred combinations to use all new
features. However, these are not the only possible combinations.
Note: The network_fdt tunable is also available for PowerHA V7.1.3. To get it for PowerHA
V7.1.3, you must open a PMR and request the “Tunable FDT interim fix bundle”.
The self-adjusting network heartbeat behavior (CAA), which was introduced with PowerHA
V7.1.0, still exists and still is used. It has no impact to the network failure detection time.
For more information, see 4.1.5, “Network Failure Detection Time (FDT)” on page 80.
Chapter 2. IBM PowerHA SystemMirror V7.2.0 and V7.2.1 for IBM AIX new features 25
2.2.2 Built-in NETMON logic
NETMON logic was previously handled by RSCT. As it was difficult to keep both CAA and
RSCT layers synchronized about the adapter state, NETMON logic was moved within the
CAA layer.
For more details, see 4.1.3, “Monitoring /var usage” on page 69.
The new option -g lists the interfaces that can potentially be used as a communication paths
of CAA between the cluster nodes. For a more detailed description, see 4.1.4, “New lscluster
option -g” on page 71.
This add-on was back level converted and is automatically included with AIX 7.1.4.2 or newer
and with AIX 7.2.0.2 or newer.
Chapter 2. IBM PowerHA SystemMirror V7.2.0 and V7.2.1 for IBM AIX new features 27
Hardware requirement for using Enterprise Pool CoD license
– IBM POWER7+: 9117-MMD, 9179-MHD with FW780.10 or later
– IBM POWER8®: 9119-MME, 9119-MHE with FW820 or later
Full details about using this integrated support can be found in Chapter 6, “Resource
Optimized High Availability” on page 143.
Note: This new option was backported to PowerHA V7.2 and V7.3.1.
At the time of writing, the only way to obtain the new option is to open a PMR and ask for
an interim fix for the defect 100862, or ask for an interim fix for APAR IV90262.
The new GUI has several features. The following list is just a brief overview. For a detailed
description, see Chapter 9, “IBM PowerHA SystemMirror User Interface” on page 303.
Visually display of relationships among resources.
Systems with the highest severity problems are highly visible.
Visualizes the health status for each resource.
Formatted events are easy to scan.
Visually distinguish critical, warning, and maintenance events.
Organized by day and time.
Can filter and search for specific types of events.
You can see the progression of events by using the timeline.
You can zoom in to see details or zoom out to see health over time.
You can search for an event in the event log.
If your system has internet access you can open a browser to the PowerHA IBM
Knowledge Center.
Chapter 2. IBM PowerHA SystemMirror V7.2.0 and V7.2.1 for IBM AIX new features 29
30 IBM PowerHA SystemMirror V7.2.1 for IBM AIX Updates
3
Figure 3-1 shows a high-level diagram of a cluster. In this example, there are two networks,
two managed systems, two Virtual Input/Output Servers (VIOS) per managed system, and
two storage subsystems. This example also uses the Logical Volume Manager (LVM)
mirroring for maintaining a complete copy of data within each storage subsystem.
This example also has a logical unit number (LUN) for the Cluster Aware AIX (CAA)
repository disk on each storage subsystem. For details about how to set up the CAA
repository disk, see 3.2, “Cluster Aware AIX repository disk” on page 36.
Another main concern is having redundant storage and verifying that the data within the
storage devices is synchronized across sites. The following section presents a method for
synchronizing the shared data.
The SAN Volume Controller in a stretched configuration allows the PowerHA cluster to
provide continuous availability of the storage LUNs even if there is a single component failure
anywhere in the storage environment. With this combination, the behavior of the cluster is
similar in terms of function and failure scenarios in a local cluster (Figure 3-3).
TCP/IP communications are essential, and multiple links and routes are suggested such that
a single network component or path failure can be incurred and communications between
sites still be maintained.
For data replication in synchronous mode where both writes must complete before
acknowledgment is sent to the application, the distance can greatly affect application
performance. Synchronous mode is commonly used for 100 kilometers or less. Asynchronous
modes are often used for distances over 100 km. However, these are common baseline
recommendations.
If there is a failure that requires moving the workload to the remaining site, PowerHA interacts
directly with the storage to switch the direction of the replication. PowerHA then makes the
LUNs read/write capable and varies on the appropriate volume groups (VGs) to activate the
application on the remaining site.
The amount of configuration information that is stored on this repository disk directly depends
on the number of cluster entities, such as shared disks, number of nodes, and number of
adapters in the environment. You must ensure that you have enough space for the following
components when you determine the size of a repository disk:
Node-to-node communication
CAA Cluster topology management
All migration processes
The preferred size for the repository disk in a two-node cluster is 1 GB.
If you have a multi-storage environment, such as the one that is described in 3.1.1, “Mirrored
architecture” on page 32, then see 3.2.2, “Cluster Aware AIX with multiple storage devices”
on page 36.
If you plan to use one or more disks, which can potentially be used as backup disks for the
CAA repository, it is a preferred practice to rename the disks, as described in “Renaming the
hdisk” on page 38. However, this is not possible in all cases.
Renaming these types of disks by using the AIX rendev command can confuse the
third-party MPIO software and create disk-related issues. For more information about any
disk renaming tool that is available as part of the vendor’s software kit, see your vendor
documentation.
Example 3-1 The lspv output before configuring Cluster Aware AIX
# lspv
hdisk0 00f71e6a059e7e1a rootvg active
hdisk1 00c3f55e34ff43cc None
hdisk2 00c3f55e34ff433d None
hdisk3 00f747c9b40ebfa5 None
hdisk4 00f747c9b476a148 None
hdisk5 00f71e6a059e701b rootvg active
#
After selecting hdisk3 as the CAA repository disk, synchronizing and creating the cluster, and
creating the application VG, you get the output that is listed in Example 3-2. The commands
that are used for this example are the following ones:
clmgr add cluster test_cl
clmgr sync cluster
As shown in Example 3-2, the problem is that the lspv command does not show that hdisk4 is
reserved as the backup disk for the CAA repository.
Example 3-2 The lspv output after configuring Cluster Aware AIX
# lspv
hdisk0 00f71e6a059e7e1a rootvg active
hdisk1 00c3f55e34ff43cc testvg
hdisk2 00c3f55e34ff433d testvg
hdisk3 00f747c9b40ebfa5 caavg_private active
hdisk4 00f747c9b476a148 None
hdisk5 00f71e6a059e701b rootvg active
#
To see which disk is reserved as a backup disk, use the clmgr -v query repository
command or the odmget HACMPsircol command. Example 3-3 shows the output of the clmgr
command, and Example 3-4 on page 38 shows the output of the odmget command.
NAME="hdisk4"
NODE="c2n1"
PVID="00f747c9b476a148"
UUID="c961dda2-f5e6-58da-934e-7878cfbe199f"
BACKUP="1"
TYPE="mpioosdisk"
DESCRIPTION="MPIO IBM 2076 FC Disk"
SIZE="1024"
AVAILABLE="95808"
CONCURRENT="true"
ENHANCED_CONCURRENT_MODE="true"
STATUS="BACKUP"#
As you can see in the clmgr output, you can directly see the hdisk name. The odmget
command output (Example 3-4) lists the PVIDs.
HACMPsircol:
name = "c2n1_cluster_sircol"
id = 0
uuid = "0"
ip_address = ""
repository = "00f747c9b40ebfa5"
backup_repository = "00f747c9b476a148"#
Renaming these types of disks by using the AIX rendev command can confuse the
third-party MPIO software and create disk-related issues. For more information about any
disk renaming tool that is available as part of the vendor’s software kit, see your vendor
documentation.
Initially we decide to use a longer name (caa_reposX). Example 3-6 shows what we did and
what the lspv command output looks like afterward.
Example 3-6 The lspv output after using rendev (using a long name)
#rendev -l hdisk3 -n caa_repos0
#rendev -l hdisk4 -n caa_repos1
# lspv
hdisk0 00f71e6a059e7e1a rootvg active
hdisk1 00c3f55e34ff43cc None
hdisk2 00c3f55e34ff433d None
caa_repos0 00f747c9b40ebfa5 None
caa_repos1 00f747c9b476a148 None
hdisk5 00f71e6a059e701b rootvg active
#
[Entry Fields]
* Cluster Name c2n1_cluster
* Heartbeat Mechanism Unicast +
* Repository Disk [] +
Cluster Multicast Address []
+--------------------------------------------------------------------------+
| Repository Disk |
| |
| Move cursor to desired item and press Enter. |
| |
| caa_rep (00f747c9b40ebfa5) on all cluster nodes |
| caa_rep (00f747c9b476a148) on all cluster nodes |
| hdisk1 (00c3f55e34ff43cc) on all cluster nodes |
| hdisk2 (00c3f55e34ff433d) on all cluster nodes |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F8=Image F10=Exit Enter=Do |
F5| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 3-5 SMIT panel that uses long repository disk names
Example 3-7 The lspv output after using rendev (using a short name)
#rendev -l hdisk3 -n caa_r0
#rendev -l hdisk4 -n caa_r1
# lspv
hdisk0 00f71e6a059e7e1a rootvg active
hdisk1 00c3f55e34ff43cc None
hdisk2 00c3f55e34ff433d None
caa_r0 00f747c9b40ebfa5 None
caa_r1 00f747c9b476a148 None
hdisk5 00f71e6a059e701b rootvg active
#
[Entry Fields]
* Cluster Name c2n1_cluster
* Heartbeat Mechanism Unicast +
* Repository Disk [] +
Cluster Multicast Address []
+--------------------------------------------------------------------------+
| Repository Disk |
| |
| Move cursor to desired item and press Enter. |
| |
| caa_r0 (00f747c9b40ebfa5) on all cluster nodes |
| caa_r1 (00f747c9b476a148) on all cluster nodes |
| hdisk1 (00c3f55e34ff43cc) on all cluster nodes |
| hdisk2 (00c3f55e34ff433d) on all cluster nodes |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F8=Image F10=Exit Enter=Do |
F5| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 3-6 SMIT panel that uses short names
Attention: Do not change any of these tunables without the explicit permission of IBM
technical support.
In general, you must never modify these values because these values are modified and
managed by PowerHA.
In general, do not change the monitoring values unless you are instructed by IBM.
Note: Starting with PowerHA V7.2, the traffic stimulation feature makes this flag obsolete.
For your information some details a listed in this section. To list and change this setting, use
the clmgr command. You must keep in mind that this change affects all IP networks.
Attention: Do not change this tunable without the explicit permission of IBM technical
support.
In the clctrl -tune -a command output, this is listed as no_if_traffic_monitor. The value 0
means enabled and the value 1 means disabled.
The option poll_uplink can be defined directly on the virtual interface if you are using shared
Ethernet adapter (SEA) fallover or the Etherchannel device that points to the virtual
interfaces. To enable poll_uplink, use the following command:
chdev -l entX -a poll_uplink=yes –P
There are no additional changes to PowerHA and CAA needed. The information about the
virtual link status is automatically detected by CAA. There is no need to change the
MONITOR_INTERFACE setting. Details about MONITOR_INTERFACE are described in
3.3.1, “CAA network monitoring” on page 42.
Figure 3-7 shows an overview of how the option works. In production environments, you
normally have at least two physical interfaces on the VIOS, and you can also use a dual-VIOS
setup. In a multiple physical interface environment, the virtual link is reported as down only
when all physical connections on the VIOS for this SEA are down.
To display the settings, use the lsattr –El entX command. Example 3-9 shows the default
settings for poll_uplink.
Compared to Example 3-10, Example 3-11 shows the entstat command output on a system
where poll_uplink is enabled and where all physical links that are related to this virtual
interface are up. The text in bold shows the additional displayed content:
VIRTUAL_PORT
PHYS_LINK_UP
Bridge Status: Up
The network down failure detection is much faster if poll_uplink is used and the link is
marked as down.
In PowerHA V7.1, this solution can still be used, but it is not recommended. The
cross-adapter checking logic is not implemented in PowerHA V7. The advantage of not
having this feature is that PowerHA V7.1 and later versions do not require that the
IP source route is enabled.
Note: With a single adapter, you use the SEA fallover or the Etherchannel fallover.
This setup eases the setup from a TCP/IP point of view, and it also reduces the content of the
netmon.cf file. But netmon.cf must still be used.
Independent from the PowerHA version used, if possible, you should have more than one
address defined (by interface). It is not recommended to use the gateway address. Modern
gateways start dropping ICMP packages if there is high workload. ICMP packages send to an
address behind the gateway are not affected by this behavior. Although the network team can
decide to drop all ICMP packets addressed to the gateway.
Split-brain situation
A cluster split-brain event can occur when a group of nodes cannot communicate with the
remaining nodes in a cluster. For example, in a two-site linked cluster, a split occurs if all
communication links between the two sites fail. Depending on the communication network
topology and the location of the interruption, a cluster split event splits the cluster into two (or
more) partitions, each of them containing one ore more cluster nodes. The resulting situation
is commonly referred to as a split-brain situation.
In a split-brain situation, the two partitions have no knowledge of each other’s status, each of
them considering the other as being offline. As a consequence, each partition tries to bring
online the other partition’s resource groups (RGs), thus generating a high risk of data
corruption on all shared disks. To prevent a split-brain situation, and subsequent potential
data corruption, split and merge policies are available to be configured.
Tie breaker
The tie-breaker feature uses a tie-breaker resource to choose a surviving partition that
continues to operate when a cluster split-brain event occurs. This feature prevents data
corruption on the shared cluster disks. The tie breaker is identified either as a SCSI disk or an
NFS-mounted file that must be accessible, under normal conditions, to all nodes in the
cluster.
Split policy
When a split-brain situation occurs, each partition attempts to acquire the tie breaker by
placing a lock on the tie-breaker disk or on the NFS file. The partition that first locks the SCSI
disk or reserves the NFS file wins, and the other loses.
All nodes in the winning partition continue to process cluster events, and all nodes in the
losing partition attempt to recover according to the defined split and merge action plan. This
plan most often implies either the restart of the cluster nodes, or merely the restart of cluster
services on those nodes.
Merge policy
There are situations in which, depending on the cluster split-brain policy, the cluster can have
two partitions that run independent of each other. However, most often, it is a preferred
practice to configure a merge policy that allows the partitions to operate together again after
communications are restored between them.
In this second approach, when partitions that were part of the cluster are brought back online
after the communication failure, they must be able to communicate with the partition that
owns the tie-breaker disk or NFS file. If a partition that is brought back online cannot
communicate with the tie-breaker disk or the NFS file, it does not join the cluster. The
tie-breaker disk or NFS file is released when all nodes in the configuration rejoin the cluster.
The merge policy configuration, in this case an NFS-based tie breaker, must be of the same
type as that for the split policy.
Because the goal was to test the NFS tie-breaker function as a method for handling split-brain
situations, the additional local nodes in a linked multisite cluster were considered irrelevant,
and therefore not included in the test setup. Each node had its own cluster repository disk
(clnode_1r and clnode_2r), and both nodes shared a common cluster disk (clnode_12, which
is the one that must be protected from data corruption that is caused by a split-brain
situation), as shown in Example 3-13.
clnode_2:/# lspv
clnode_2r 00f6f5d0f8ceed1a caavg_private active
clnode_12 00f6f5d0f8ca34ec datavg concurrent
hdisk0 00f6f5d09570f31b rootvg active
clnode_2:/#
During the setup of the cluster, the NFS communication network, with the en1 network
adapters in Example 3-14, was discovered and automatically added to the cluster
configuration as a heartbeat network, as net_ether_02. However, we manually removed it
afterward to prevent interference with the NFS tie-breaker tests. Therefore, the cluster
eventually had only one heartbeat network: net_ether_01.
At the end of our environment preparation, the cluster was active. The RG, IBM Hypertext
Transfer Protocol (HTTP) Server that is installed on the clnode_12 cluster disk with the datavg
VG was online, as shown in Example 3-16.
clnode_1:/#
clnode_1:/# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
rg_IHS ONLINE clnode_1@site1
ONLINE SECONDARY clnode_2@site2
clnode_1:/#
Our test environment used an NFS server that is configured on an AIX 7.1 TL3 SP5 LPAR.
This, of course, is not a requirement for deploying an NFS version 4 server.
A number of services must be active to allow NFSv4 communication between clients and
servers:
On the NFS server:
– biod
– nfsd
– nfsgryd
– portmap
– rpc.lockd
– rpc.mountd
– rpc.statd
– TCP
On the NFS client (all cluster nodes):
– biod
– nfsd
– rpc.mountd
– rpc.statd
– TCP
Most of the previous services are usually active by default, and particular attention is required
for the setup of the nfsrgyd service. As mentioned previously, this daemon must be running
on both the server and the clients. In our case, the two cluster nodes. This daemon provides a
name conversion service for NFS servers and clients that use NFS v4.
Starting the nfsrgyd daemon requires that the local NFS domain is set. The local NFS
domain is stored in the /etc/nfs/local_domain file and it can be set by using the chnfsdom
command, as shown in Example 3-17.
Alternatively, root, the public node directory, and the local NFS domain can be set with SMIT.
Use the smit nfs command, follow the path Network File System (NFS) → Configure NFS
on This System, then select the corresponding option:
Change Version 4 Server Root Node
Change Version 4 Server Public Node
Configure NFS Local Domain → Change NFS Local Domain
As a final step for the NFS configuration, create the NFS resource, also known as the NFS
export. Example 3-19 shows the NFS resource that was created by using SMIT by running
the smit mknfs command.
[Entry Fields]
* Pathname of directory to export [/nfs_root/nfs_tie_breaker] /
[...]
Public filesystem? no +
[...]
Allow access by NFS versions [4] +
[...]
* Security method 1 [sys,krb5p,krb5i,krb5,dh] +
* Mode to export directory read-write +
[...]
Test the NFS configuration by manually mounting the NFS export to the clients, as shown in
Example 3-20. The date column was removed from the output for clarity.
To configure the NFS tie breaker by using SMIT, complete the following steps:
1. The SMIT menu that enables the configuration of NFS Tie Breaker split policy can be
accessed by following the path Custom Cluster Configuration → Cluster Nodes and
Networks → Initial Cluster Setup (Custom) → Configure Cluster Split and Merge
Policy.
2. Select Split Management Policy, as shown in Example 3-21.
+-------------------------------------------------------------+
| Split Handling Policy |
| |
| Move cursor to desired item and press Enter. |
| |
| None |
| TieBreaker |
| Manual |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1=Help | /=Find n=Find Next |
F9=Shell +-------------------------------------------------------------+
+-------------------------------------------------------------+
| Select TieBreaker Type |
| |
| Move cursor to desired item and press Enter. |
| |
| Disk |
| NFS |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1=Help | /=Find n=Find Next |
F9=Shell +-------------------------------------------------------------+
4. After selecting NFS as the method for tie breaking, specify the NFS export server, directory,
and the local mount point, as shown in Example 3-23.
Example 3-23 Configuring the NFS tie breaker for split handling policy by using SMIT
NFS TieBreaker Configuration
[Entry Fields]
Split Handling Policy NFS
* NFS Export Server [nfsserver_nfs]
* Local Mount Directory [/nfs_tie_breaker]
* NFS Export Directory [/nfs_tie_breaker]
+-------------------------------------------------------------+
| Merge Handling Policy |
| |
| Move cursor to desired item and press Enter. |
| |
| Majority |
| TieBreaker |
| Manual |
| Priority |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1=Help | /=Find n=Find Next |
F9=Shell +-------------------------------------------------------------+
2. Selecting the option of TieBreaker opens the menu that is shown in Example 3-25, where
we again choose NFS as the method to use for tie breaking.
Example 3-25 Configuring NFS tie breaker for merge handling policy with SMIT
NFS TieBreaker Configuration
[Entry Fields]
Merge Handling Policy NFS
* NFS Export Server [nfsserver_nfs]
* Local Mount Directory [/nfs_tie_breaker]
* NFS Export Directory [/nfs_tie_breaker]
Example 3-26 Configuring the NFS tie breaker for the split and merge handling policy by using the CLI
clnode_1:/# /usr/es/sbin/cluster/utilities/cl_sm -s 'NFS' -k'nfsserver_nfs'
-g'/nfs_tie_breaker' -p'/nfs_tie_breaker'
The PowerHA SystemMirror split and merge policies have been updated.
Current policies are:
Split Handling Policy : NFS
Merge Handling Policy : NFS
NFS Export Server :
nfsserver_nfs
Local Mount Directory :
/nfs_tie_breaker
NFS Export Directory :
/nfs_tie_breaker
Split and Merge Action Plan : Restart
The configuration must be synchronized to make this change known across the
cluster.
clnode_1:/#
At this point, a PowerHA cluster synchronization and restart and a CAA cluster restart are
required. Complete the following steps:
1. Verify and synchronize the changes across the cluster either by using the SMIT menu (run
the smit sysmirror command, then follow the path Cluster Applications and
Resources → Resource Groups → Verify and Synchronize Cluster Configuration), or
by the CLI by using the clmgr sync cluster command.
2. Stop cluster services for all nodes in the cluster by running the clmgr stop cluster
command.
3. Stop the CAA daemon on all cluster nodes by running the stopsrc -s clconfd command.
4. Start the CAA daemon on all cluster nodes by running the startsrc -s clconfd
command.
Important: Verify all output messages that are generated by the synchronization and
restart of the cluster because if an error occurred when activating the NFS tie-breaker
policies, it might not necessarily produce an error on the overall result of a cluster
synchronization action.
When all cluster nodes are synchronized and active, and the split and merge management
policies are applied, the NFS resource is accessed by all nodes, as shown in Example 3-27
(the date column removed for clarity).
Example 3-27 Checking for the NFS export that is mounted on clients
clnode_1:/# mount | egrep "node|---|tie"
node mounted mounted over vfs options
------------- --------------- ---------------- ---- ----------------------
nfsserver_nfs /nfs_tie_breaker /nfs_tie_breaker nfs4
vers=4,fg,soft,retry=1,timeo=10
clnode_1:/#
Example 3-28 The cluster.mmddyyy log for the node releasing the resource group
Nov 13 14:42:13 EVENT START: network_down clnode_1 net_ether_01
Nov 13 14:42:13 EVENT COMPLETED: network_down clnode_1 net_ether_01 0
Nov 13 14:42:13 EVENT START: network_down_complete clnode_1 net_ether_01
Nov 13 14:42:13 EVENT COMPLETED: network_down_complete clnode_1 net_ether_01 0
Nov 13 14:42:20 EVENT START: resource_state_change clnode_1
Nov 13 14:42:20 EVENT COMPLETED: resource_state_change clnode_1 0
Nov 13 14:42:20 EVENT START: rg_move_release clnode_1 1
Nov 13 14:42:20 EVENT START: rg_move clnode_1 1 RELEASE
Nov 13 14:42:20 EVENT START: stop_server app_IHS
Nov 13 14:42:20 EVENT COMPLETED: stop_server app_IHS 0
Nov 13 14:42:21 EVENT START: release_service_addr
Nov 13 14:42:22 EVENT COMPLETED: release_service_addr 0
Nov 13 14:42:25 EVENT COMPLETED: rg_move clnode_1 1 RELEASE 0
Nov 13 14:42:25 EVENT COMPLETED: rg_move_release clnode_1 1 0
Nov 13 14:42:27 EVENT START: rg_move_fence clnode_1 1
Nov 13 14:42:27 EVENT COMPLETED: rg_move_fence clnode_1 1 0
Nov 13 14:42:30 EVENT START: network_up clnode_1 net_ether_01
Nov 13 14:42:30 EVENT COMPLETED: network_up clnode_1 net_ether_01 0
Nov 13 14:42:31 EVENT START: network_up_complete clnode_1 net_ether_01
Nov 13 14:42:31 EVENT COMPLETED: network_up_complete clnode_1 net_ether_01 0
Nov 13 14:42:33 EVENT START: rg_move_release clnode_1 1
Nov 13 14:42:33 EVENT START: rg_move clnode_1 1 RELEASE
Nov 13 14:42:33 EVENT COMPLETED: rg_move clnode_1 1 RELEASE 0
Nov 13 14:42:33 EVENT COMPLETED: rg_move_release clnode_1 1 0
Nov 13 14:42:35 EVENT START: rg_move_fence clnode_1 1
Nov 13 14:42:36 EVENT COMPLETED: rg_move_fence clnode_1 1 0
Nov 13 14:42:38 EVENT START: rg_move_fence clnode_1 1
Nov 13 14:42:39 EVENT COMPLETED: rg_move_fence clnode_1 1 0
Nov 13 14:42:39 EVENT START: rg_move_acquire clnode_1 1
Nov 13 14:42:39 EVENT START: rg_move clnode_1 1 ACQUIRE
Nov 13 14:42:39 EVENT COMPLETED: rg_move clnode_1 1 ACQUIRE 0
Nov 13 14:42:39 EVENT COMPLETED: rg_move_acquire clnode_1 1 0
Nov 13 14:42:41 EVENT START: rg_move_complete clnode_1 1
Nov 13 14:42:42 EVENT COMPLETED: rg_move_complete clnode_1 1 0
Nov 13 14:42:46 EVENT START: rg_move_fence clnode_1 1
Nov 13 14:42:47 EVENT COMPLETED: rg_move_fence clnode_1 1 0
Nov 13 14:42:47 EVENT START: rg_move_acquire clnode_1 1
Nov 13 14:42:47 EVENT START: rg_move clnode_1 1 ACQUIRE
Nov 13 14:42:47 EVENT COMPLETED: rg_move clnode_1 1 ACQUIRE 0
Nov 13 14:42:47 EVENT COMPLETED: rg_move_acquire clnode_1 1 0
Nov 13 14:42:49 EVENT START: rg_move_complete clnode_1 1
Nov 13 14:42:53 EVENT COMPLETED: rg_move_complete clnode_1 1 0
Nov 13 14:42:55 EVENT START: resource_state_change_complete clnode_1
Nov 13 14:42:55 EVENT COMPLETED: resource_state_change_complete clnode_1 0
Example 3-29 The cluster.mmddyyy log for the node acquiring the resource group
Nov 13 14:42:13 EVENT START: network_down clnode_1 net_ether_01
Nov 13 14:42:13 EVENT COMPLETED: network_down clnode_1 net_ether_01 0
Nov 13 14:42:14 EVENT START: network_down_complete clnode_1 net_ether_01
Nov 13 14:42:14 EVENT COMPLETED: network_down_complete clnode_1 net_ether_01 0
Nov 13 14:42:20 EVENT START: resource_state_change clnode_1
Nov 13 14:42:20 EVENT COMPLETED: resource_state_change clnode_1 0
Nov 13 14:42:20 EVENT START: rg_move_release clnode_1 1
Nov 13 14:42:20 EVENT START: rg_move clnode_1 1 RELEASE
Nov 13 14:42:20 EVENT COMPLETED: rg_move clnode_1 1 RELEASE 0
Nov 13 14:42:20 EVENT COMPLETED: rg_move_release clnode_1 1 0
Nov 13 14:42:27 EVENT START: rg_move_fence clnode_1 1
Nov 13 14:42:29 EVENT COMPLETED: rg_move_fence clnode_1 1 0
Nov 13 14:42:31 EVENT START: network_up clnode_1 net_ether_01
Nov 13 14:42:31 EVENT COMPLETED: network_up clnode_1 net_ether_01 0
Nov 13 14:42:31 EVENT START: network_up_complete clnode_1 net_ether_01
Nov 13 14:42:31 EVENT COMPLETED: network_up_complete clnode_1 net_ether_01 0
Nov 13 14:42:33 EVENT START: rg_move_release clnode_1 1
Nov 13 14:42:33 EVENT START: rg_move clnode_1 1 RELEASE
Nov 13 14:42:34 EVENT COMPLETED: rg_move clnode_1 1 RELEASE 0
Nov 13 14:42:34 EVENT COMPLETED: rg_move_release clnode_1 1 0
Nov 13 14:42:36 EVENT START: rg_move_fence clnode_1 1
Nov 13 14:42:36 EVENT COMPLETED: rg_move_fence clnode_1 1 0
Nov 13 14:42:39 EVENT START: rg_move_fence clnode_1 1
Nov 13 14:42:39 EVENT COMPLETED: rg_move_fence clnode_1 1 0
Nov 13 14:42:39 EVENT START: rg_move_acquire clnode_1 1
Nov 13 14:42:39 EVENT START: rg_move clnode_1 1 ACQUIRE
Nov 13 14:42:39 EVENT COMPLETED: rg_move clnode_1 1 ACQUIRE 0
Nov 13 14:42:39 EVENT COMPLETED: rg_move_acquire clnode_1 1 0
Nov 13 14:42:42 EVENT START: rg_move_complete clnode_1 1
Nov 13 14:42:45 EVENT COMPLETED: rg_move_complete clnode_1 1 0
Nov 13 14:42:47 EVENT START: rg_move_fence clnode_1 1
Nov 13 14:42:47 EVENT COMPLETED: rg_move_fence clnode_1 1 0
Nov 13 14:42:47 EVENT START: rg_move_acquire clnode_1 1
Nov 13 14:42:47 EVENT START: rg_move clnode_1 1 ACQUIRE
Nov 13 14:42:49 EVENT START: acquire_takeover_addr
Nov 13 14:42:50 EVENT COMPLETED: acquire_takeover_addr 0
Nov 13 14:42:50 EVENT COMPLETED: rg_move clnode_1 1 ACQUIRE 0
Nov 13 14:42:50 EVENT COMPLETED: rg_move_acquire clnode_1 1 0
Nov 13 14:42:50 EVENT START: rg_move_complete clnode_1 1
Nov 13 14:42:50 EVENT START: start_server app_IHS
Nov 13 14:42:51 EVENT COMPLETED: start_server app_IHS 0
Nov 13 14:42:52 EVENT COMPLETED: rg_move_complete clnode_1 1 0
Nov 13 14:42:55 EVENT START: resource_state_change_complete clnode_1
Nov 13 14:42:55 EVENT COMPLETED: resource_state_change_complete clnode_1 0
Example 3-30 The cluster nodes and resource group status before the simulated network down event
clnode_1:/# clmgr -cva name,state,raw_state query node
# NAME:STATE:RAW_STATE
clnode_1:NORMAL:ST_STABLE
clnode_2:NORMAL:ST_STABLE
clnode_1:/#
clnode_1:/# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
rg_IHS ONLINE clnode_1@site1
ONLINE SECONDARY clnode_2@site2
clnode_1:/#
2. Within about a minute of the previous step, as a response to the split-brain situation, the
node clnode_2 (with no communication to the NFS server) restarted itself. This can be
seen on the virtual terminal console opened (by using the HMC) on that node, and is also
reflected by the status of the cluster nodes (Example 3-32).
Example 3-32 Cluster nodes status immediately after a simulated network down event
clnode_1:/# clmgr -cva name,state,raw_state query node
# NAME:STATE:RAW_STATE
clnode_1:NORMAL:ST_STABLE
clnode_2:UNKNOWN:UNKNOWN
clnode_1:/#
3. After a restart, the node clnode_2 was functional, but with cluster services stopped
(Example 3-33).
Example 3-33 Cluster nodes and resource group status after node restart
clnode_1:/# clmgr -cva name,state,raw_state query node
# NAME:STATE:RAW_STATE
clnode_1:NORMAL:ST_STABLE
clnode_2:OFFLINE:ST_INIT
clnode_1:/#
clnode_2:/# clRGinfo
-----------------------------------------------------------------------------
5. You are now back to the point before the simulated network loss event, with both nodes
operational and the RG online on node clnode_1 (Example 3-35).
Example 3-35 Cluster nodes and resource group status after cluster services start
clnode_2:/# clmgr -cva name,state,raw_state query node
# NAME:STATE:RAW_STATE
clnode_1:NORMAL:ST_STABLE
clnode_2:NORMAL:ST_STABLE
clnode_2:/#
clnode_2:/# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
rg_IHS ONLINE clnode_1@site1
ONLINE SECONDARY clnode_2@site2
clnode_2:/#
The test was performed exactly like the one on the standby node, as described in “Loss of all
network communication on standby node” on page 60, and the process was similar. The only
notable difference was that the previously active node, now disconnected, restarted. The
other node, previously the standby node, was now bringing the RG online, thus ensuring
service availability.
LABEL: CONFIGRM_PENDINGQUO
Description
The operational quorum state of the active peer domain has changed to
PENDING_QUORUM. This state usually indicates that exactly half of the nodes that
are defined in the peer domain are online. In this state cluster resources cannot
be recovered although none will be stopped explicitly.
LABEL: LVM_GS_RLEAVE
Description
Remote node Concurrent Volume Group failure detected
LABEL: CONFIGRM_HASQUORUM_
Description
The operational quorum state of the active peer domain has changed to HAS_QUORUM.
In this state, cluster resources may be recovered and controlled as needed by
management applications.
The disconnected or restarted node includes log entries that are presented in chronological
order with the older entries listed first, as shown in Example 3-37.
LABEL: CONFIGRM_PENDINGQUO
Description
The operational quorum state of the active peer domain has changed to
PENDING_QUORUM. This state usually indicates that exactly half of the nodes that
are defined in the peer domain are online. In this state cluster resources cannot
be recovered although none will be stopped explicitly.
LABEL: LVM_GS_RLEAVE
Description
Remote node Concurrent Volume Group failure detected
LABEL: CONFIGRM_NOQUORUM_E
Description
The operational quorum state of the active peer domain has changed to NO_QUORUM.
LABEL: CONFIGRM_REBOOTOS_E
Description
The operating system is being rebooted to ensure that critical resources are
stopped so that another subdomain that has operational quorum may recover
these resources without causing corruption or conflict.
LABEL: REBOOT_ID
Description
SYSTEM SHUTDOWN BY USER
LABEL: CONFIGRM_HASQUORUM_
Description
The operational quorum state of the active peer domain has changed to HAS_QUORUM.
In this state, cluster resources may be recovered and controlled as needed by
management applications.
LABEL: CONFIGRM_ONLINE_ST
Description
The node is online in the domain indicated in the detail data.
The restarted node’s log includes information that is relative to the surviving node’s log, and
information about the restart event.
This log also includes the information about the network_down event.
Attention: Do not change any of these tunables without the explicit permission of IBM
technical support.
In general, you must never modify these values because these values are modified and
managed by PowerHA.
To change the default values, use the chssys command. The -t option is used to specify the
threshold in % and the -i option is used to specify the interval:
chssys -s clconfd -a "-t 80 -i 10"
To check what values are currently used, you have two options: You can use the
ps -ef | grep clconfd or the odmget -q "subsysname='clconfd'" SRCsubsys command.
Example 4-2 shows the output of the two commands with default values. When using the
odmget command, the cmdargs line has no arguments that are listed. The same happens if
ps -ef is used because there are no arguments that are displayed after clconfd.
subsysname = "clconfd"
synonym = ""
cmdargs = ""
path = "/usr/sbin/clconfd"
uid = 0
auditid = 0
standin = "/dev/null"
standout = "/dev/null"
standerr = "/dev/null"
action = 1
multi = 0
contact = 2
svrkey = 0
svrmtype = 0
priority = 20
signorm = 2
sigforce = 9
display = 1
waittime = 20
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 69
grpname = "caa"
Example 4-3 shows what happens when you change the default values, and what the output
of odmget and ps -ef looks like after that change.
Important: You need to stop and start the subsystem to activate your changes.
SRCsubsys:
subsysname = "clconfd"
synonym = ""
cmdargs = "-t 80 -i 10"
path = "/usr/sbin/clconfd"
uid = 0
auditid = 0
standin = "/dev/null"
standout = "/dev/null"
standerr = "/dev/null"
action = 1
multi = 0
contact = 2
svrkey = 0
svrmtype = 0
priority = 20
signorm = 2
sigforce = 9
display = 1
waittime = 20
grpname = "caa"
If the threshold is exceeded, then you get an entry in the error log. Example 4-4 shows what
such an error entry can look like.
Description
/var filesystem is running low on space
Probable Causes
Unknown
Failure Causes
Unknown
Recommended Actions
RSCT could malfunction if /var gets full
Increase the filesystem size or delete unwanted files
Detail Data
Percent full
81
Percent threshold
80
Note: At the time of writing, this option was not available in AIX versions earlier than
AIX 7.1.4.
The lscluster -i command lists all of the seen communication paths by CAA but it does not
show if all of them can potentially be used for heartbeating. This is particularly the case if you
use a network that is set to private, or if you removed a network from the PowerHA
configuration.
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 71
c2svc service service_net ether public powerha-c2n1 172.16.150.125
255.255.0.0 16
n2adm boot adm_net ether public powerha-c2n2 10.17.1.110
en1 255.255.255.0 24
powerha-c2n2 boot service_net ether public powerha-c2n2 172.16.150.122
en0 255.255.0.0 16
c2svc service service_net ether public powerha-c2n2 172.16.150.125
255.255.0.0 16
#
In this case, the lscluster -i output looks like what is shown in Example 4-6.
Node powerha-c2n1.munich.de.ibm.com
Node UUID = 63b68a36-e61b-11e5-8016-4217e0ce7b02
Number of interfaces discovered = 3
Interface number 1, en0
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = 42:17:E0:CE:7B:02
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 172.16.150.121 broadcast 172.16.255.255 netmask
255.255.0.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.150.121
Interface number 2, en1
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = 42:17:E0:CE:7B:05
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 10.17.1.100 broadcast 10.17.1.255 netmask
255.255.255.0
Node powerha-c2n2.munich.de.ibm.com
Node UUID = 63b68a86-e61b-11e5-8016-4217e0ce7b02
Number of interfaces discovered = 3
Interface number 1, en0
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = 42:17:E4:E6:1B:02
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 172.16.150.122 broadcast 172.16.255.255 netmask
255.255.0.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.150.121
Interface number 2, en1
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = 42:17:E4:E6:1B:05
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 10.17.1.110 broadcast 10.17.1.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.150.121
Interface number 3, dpcom
IFNET type = 0 (none)
NDD type = 305 (NDD_PINGCOMM)
Smoothed RTT across interface = 750
Mean deviation in network RTT across interface = 1500
Probe interval for interface = 22500 ms
IFNET flags for interface = 0x00000000
NDD flags for interface = 0x00000009
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 73
Interface state = UP RESTRICTED AIX_CONTROLLED
root@powerha-c2n1:/>
Example 4-7 shows the output of the lscluster -g command. When you compare the output
of the lscluster -g command with the lscluster -i command, you should not find any
differences. There are no differences because all of the networks are allowed to potentially be
used for heartbeat in this example.
Example 4-7 The lscluster -g command output in relation to the cllsif output
# > lscluster -g
Network/Storage Interface Query
Node powerha-c2n1.munich.de.ibm.com
Node UUID = 63b68a36-e61b-11e5-8016-4217e0ce7b02
Number of interfaces discovered = 3
Interface number 1, en0
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = 42:17:E0:CE:7B:02
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 172.16.150.121 broadcast 172.16.255.255 netmask 255.255.0.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.150.121
Interface number 2, en1
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = 42:17:E0:CE:7B:05
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 10.17.1.100 broadcast 10.17.1.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.150.121
Interface number 3, dpcom
IFNET type = 0 (none)
NDD type = 305 (NDD_PINGCOMM)
Smoothed RTT across interface = 750
Mean deviation in network RTT across interface = 1500
Probe interval for interface = 22500 ms
IFNET flags for interface = 0x00000000
NDD flags for interface = 0x00000009
Node powerha-c2n2.munich.de.ibm.com
Node UUID = 63b68a86-e61b-11e5-8016-4217e0ce7b02
Number of interfaces discovered = 3
Interface number 1, en0
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = 42:17:E4:E6:1B:02
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 172.16.150.122 broadcast 172.16.255.255 netmask 255.255.0.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.150.121
Interface number 2, en1
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = 42:17:E4:E6:1B:05
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 10.17.1.110 broadcast 10.17.1.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.150.121
Interface number 3, dpcom
IFNET type = 0 (none)
NDD type = 305 (NDD_PINGCOMM)
Smoothed RTT across interface = 750
Mean deviation in network RTT across interface = 1500
Probe interval for interface = 22500 ms
IFNET flags for interface = 0x00000000
NDD flags for interface = 0x00000009
Interface state = UP RESTRICTED AIX_CONTROLLED
root@powerha-c2n1:/>
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 75
One network set to private
The following examples in this section describe the lscluster command output when you
decide to change one or more networks to private. Example 4-8 shows the starting point for
this example. In our testing environment, we changed one network to private.
Note: Private networks cannot be used for any services. When you want to use a service
IP address, the network must be public.
Because we did not change the architecture of our cluster, the output of the lscluster -i
command is still the same, as shown in Example 4-6 on page 72.
Remember: You must synchronize your cluster before the change to private is visible in
CAA.
Example 4-9 shows the lscluster -g command output after the synchronization. If you now
compare the output of the lscluster -g command with the lscluster -i command or with
the lscluster -g output from the previous example, you see that the entries about en1 (in our
example) do not appear any longer. The list of networks potentially allowed to be used for
heartbeat is shorter.
Node powerha-c2n1.munich.de.ibm.com
Node UUID = 55284db0-e6a7-11e5-8035-4217e0ce7b02
Number of interfaces discovered = 2
Interface number 1, en0
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = 42:17:E0:CE:7B:02
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 172.16.150.121 broadcast 172.16.255.255 netmask
255.255.0.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.150.121
Interface number 2, dpcom
IFNET type = 0 (none)
NDD type = 305 (NDD_PINGCOMM)
Smoothed RTT across interface = 750
Mean deviation in network RTT across interface = 1500
Probe interval for interface = 22500 ms
IFNET flags for interface = 0x00000000
NDD flags for interface = 0x00000009
Interface state = UP RESTRICTED AIX_CONTROLLED
Node powerha-c2n2.munich.de.ibm.com
Node UUID = 55284df6-e6a7-11e5-8035-4217e0ce7b02
Number of interfaces discovered = 2
Interface number 1, en0
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = 42:17:E4:E6:1B:02
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 172.16.150.122 broadcast 172.16.255.255 netmask
255.255.0.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.150.121
Interface number 2, dpcom
IFNET type = 0 (none)
NDD type = 305 (NDD_PINGCOMM)
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 77
Smoothed RTT across interface = 750
Mean deviation in network RTT across interface = 1500
Probe interval for interface = 22500 ms
IFNET flags for interface = 0x00000000
NDD flags for interface = 0x00000009
Interface state = UP RESTRICTED AIX_CONTROLLED
#
Because we did not change the architecture of our cluster, the output of the lscluster -i
command is still the same as listed in Example 4-6 on page 72.
You must synchronize your cluster before the change to private is visible in CAA.
Example 4-11 shows the lscluster -g output after the synchronization. If you now compare
the output of the lscluster -g command with the previous lscluster -i command, or with
the lscluster -g output in “Using all interfaces” on page 71, you see that the entries about
en1 (in our example) do not appear.
When you compare the content of Example 4-11 with the content of Example 4-9 on page 76
in “One network set to private” on page 76, you see that the output of the lscluster -g
commands is identical.
Node powerha-c2n1.munich.de.ibm.com
Node UUID = 63b68a36-e61b-11e5-8016-4217e0ce7b02
Number of interfaces discovered = 2
Interface number 1, en0
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = 42:17:E0:CE:7B:02
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 172.16.150.121 broadcast 172.16.255.255 netmask
255.255.0.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.150.121
Interface number 2, dpcom
IFNET type = 0 (none)
NDD type = 305 (NDD_PINGCOMM)
Smoothed RTT across interface = 750
Mean deviation in network RTT across interface = 1500
Probe interval for interface = 22500 ms
IFNET flags for interface = 0x00000000
NDD flags for interface = 0x00000009
Interface state = UP RESTRICTED AIX_CONTROLLED
Node powerha-c2n2.munich.de.ibm.com
Node UUID = 63b68a86-e61b-11e5-8016-4217e0ce7b02
Number of interfaces discovered = 2
Interface number 1, en0
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = 42:17:E4:E6:1B:02
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 172.16.150.122 broadcast 172.16.255.255 netmask
255.255.0.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.150.121
Interface number 2, dpcom
IFNET type = 0 (none)
NDD type = 305 (NDD_PINGCOMM)
Smoothed RTT across interface = 750
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 79
Mean deviation in network RTT across interface = 1500
Probe interval for interface = 22500 ms
IFNET flags for interface = 0x00000000
NDD flags for interface = 0x00000009
Interface state = UP RESTRICTED AIX_CONTROLLED
root@powerha-c2n1:/>
#
Note: The network_fdt tunable is also available in PowerHA V7.1.3. To get it for your
PowerHA V7.1.3 version, you must open a PMR and request the Tunable FDT interim fix
bundle.
The self-adjusting network heartbeating behavior (CAA) that was introduced with PowerHA
V7.1.0 is still there and is used. It has no impact in the network failure detection time.
The network_fdt tunable can be set to zero to maintain the default behavior. In newer
versions it may be enforced to be at least 5 seconds. The minimum delta between the
network_fdt tunable and the node_timeout tunable must be 10 second.
The default recognition time for a network problem is not affected by this tunable. It is 0 for
hard failures and 5 seconds for soft failures (since PowerHA V7.1.0). CAA continues to check
the network, but it waits until the end of the defined timeout to create a network down event.
For PowerHA nodes, when the effective level of CAA is 4, also known as the 2015 release,
CAA automatically sets the network_fdt to 20 seconds and the node_timeout to 30 seconds.
To check for the effective CAA level, use the lscluster -c command. The last two lines of the
lscluster -c output list the local CAA level and the effective CAA level. In normal situations,
these two show the same level. In case of an operating system update, it can temporarily
show different levels. Example 4-12 shows the numbers that you get when you are on CAA
level 4.
Note: Depending on the installed AIX Service Pack and fixes, the CAA level might not be
displayed.
In this case, the only way to know whether the CAA level is 4 is to check whether
AUTO_REPOS_REPLACE is listed for the effective cluster-wide capabilities in the output of
lscluster -c command.
Example 4-13 shows how to both check and change the CAA network tunable attribute for
PowerHA V7.1.3 by using the CAA native clctrl command. Keep in mind this is for PowerHA
V7.1.3 FDT iFix only. In this case the values are listed in milliseconds.
Attention: Do not use the clctrl command to change the network_fdt tunable.
Remember to use clmgr or smit.
Example 4-13 is only for old versions where clmgr or smit could not be used.
For PowerHA V7.2 and newer version of the FDT iFix bundle, the correct way to change is by
using clmgr or smit. Example 4-14 shows the SMIT screen where the network_fdt and the
node_timeout tunable can be changed. The value needs to be specified in seconds. Network
Failure Detection Time is the wording in smit for FDT. The path to it in smit is
smit sysmirror → Custom Cluster Configuration → Cluster Nodes and Networks →
Manage the Cluster → Cluster heartbeat settings. Or use the fast path
smit cm_chng_tunables.
Important: Only use clmgr or smit to change the network_fdt and/or the node_timeout
tunable.
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 81
Press Enter AFTER making all desired changes.
[Entry Fields]
* Network Failure Detection Time [20] #
* Node Failure Detection Timeout [30] #
* Node Failure Detection Grace Period [10] #
* Node Failure Detection Timeout during LPM [0] #
* LPM Node Policy [manage] +
Remember when you increase the FDT value you also have to increase value for the Node
Failure Detection Timeout. The minimum delta between these two values must be 10
seconds. So if we would like to change the FDT to 40 seconds and like to keep the minimum
delta the command would look like this:
With later versions of PowerHA, features were added to make the cluster more resilient if
there is a PowerHA repository disk failure. The ability to survive a repository disk failure, in
addition to the ability to manually replace a repository disk without an outage, increased the
resiliency of PowerHA. With PowerHA V7.2.0, a new feature to increase the resiliency further
was introduced, and this is called ARU.
If there is an active repository disk failure, the purpose of ARU is to automate the replacement
of a PowerHA repository disk without intervention from a system administrator and without
affecting the active cluster services. All that is needed is to point PowerHA to the backup
repository disks to use if there is an active repository disk failure.
If a repository disk fails, PowerHA detects the failure of the active repository disk. At that
point, it verifies that the active repository disk is not usable. If the active repository disk is
unusable, it attempts to switch to the backup repository disk. If it is successful, then the
backup repository disk becomes the active repository disk.
For more information about the PowerHA repository disk requirements, see IBM Knowledge
Center.
This section shows an example of ARU in a 2-site, 2-node cluster. The cluster configuration is
similar to Figure 4-1.
Figure 4-1 Storage example for PowerHA ARU showing linked and backup repository disks
For the purposes of this example, we configure a backup repository disk for each site of this
2-site cluster.
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 83
Configuring a backup repository disk
The following process details how to configure a backup repository disk. For our example, we
perform this process for each site in our cluster.
1. Using SMIT, run smitty sysmirror and select Cluster Nodes and Networks → Cluster
Nodes and Networks → Add a Repository Disk. You are prompted for a site due to the
fact that our example is a 2-site cluster, and then given a selection of possible repository
disks. The panels that are shown in the following sections provide more details.
When you select Add a Repository Disk, you are prompted to select a site, as shown in
Example 4-15.
+--------------------------------------------------------------------------+
| Select a Site |
| |
| Move cursor to desired item and press Enter. |
| |
| primary_site1 |
| standby_site2 |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
2. After selecting primary_site1, Example 4-16 shows the repository disk menu.
[Entry Fields]
Site Name primary_site1
* Repository Disk [] +
[Entry Fields]
Site Name primary_site1
* Repository Disk [] +
+--------------------------------------------------------------------------+
| Repository Disk |
| |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| hdisk3 (00f61ab295112078) on all nodes at site primary_site1 |
| hdisk4 (00f61ab2a61d5bc6) on all nodes at site primary_site1 |
| hdisk5 (00f61ab2a61d5c7e) on all nodes at site primary_site1 |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F7=Select F8=Image F10=Exit |
F5| Enter=Do /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
4. After selecting the appropriate disk, the choice is shown in Example 4-18.
[Entry Fields]
Site Name primary_site1
* Repository Disk [(00f61ab295112078)] +
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 85
5. Next, after pressing the Enter key to make the changes, the confirmation panel appears,
as shown in Example 4-19.
To induce a failure of the primary repository disk, we log in to the Virtual I/O Server (VIOS)
servers that present storage to the cluster LPARs and deallocate the disk LUN that
corresponds to the primary repository disk on one site of our cluster. This disables the
primary repository disk, and PowerHA ARU detects the failure and automatically activates the
backup repository disk as the active repository disk.
This section presents the following examples that are during this process:
1. Before disabling the primary repository disk, we look at the lspv command output and
note that the active repository disk is hdisk1, as shown in Example 4-20.
2. We then proceed to log in to the VIOS servers that present the repository disk to this
logical partition (LPAR) and de-allocate that logical unit (LUN) so that the cluster LPAR no
longer has access to that disk. This causes the primary repository disk to fail.
Example 4-21 The /var/adm/ras/syslog.caa file showing repository disk failure and recovery
Nov 12 09:13:29 primo_s2_n1 caa:info cluster[14025022]: caa_config.c
run_list 1377 1 = = END REPLACE_REPOS Op = = POST Stage = =
Nov 12 09:13:30 primo_s2_n1 caa:err|error cluster[14025022]: cluster_utils.c
cluster_repository_read 5792 1 Could not open cluster repository
device /dev/rhdisk1: 5
Nov 12 09:13:30 primo_s2_n1 caa:info cluster[14025022]: cluster_utils.c
cl_kern_repos_check 11769 1 Could not read the respository.
Nov 12 09:13:30 primo_s2_n1 caa:info cluster[14025022]: cluster_utils.c
cl_run_log_method 11862 1 START '/usr/sbin/importvg -y
caavg_private_t -O hdisk1'
Nov 12 09:13:32 primo_s2_n1 caa:info cluster[14025022]: cluster_utils.c
cl_run_log_method 11893 1 FINISH return = 1
Nov 12 09:13:32 primo_s2_n1 caa:info cluster[14025022]: cluster_utils.c
cl_run_log_method 11862 1 START '/usr/sbin/reducevg -df
caavg_private_t hdisk1'
Nov 12 09:13:32 primo_s2_n1 caa:info cluster[14025022]: cluster_utils.c
cl_run_log_method 11893 1 FINISH return = 1
Nov 12 09:13:33 primo_s2_n1 caa:err|error cluster[14025022]: cluster_utils.c
cluster_repository_read 5792 1 Could not open cluster repository
device /dev/rhdisk1: 5
Nov 12 09:13:33 primo_s2_n1 caa:info cluster[14025022]: cl_chrepos.c
destroy_old_repository 344 1 Failed to read repository data.
Nov 12 09:13:34 primo_s2_n1 caa:err|error cluster[14025022]: cluster_utils.c
cluster_repository_write 5024 1 return = -1, Could not open
cluster repository device /dev/rhdisk1: I/O error
Nov 12 09:13:34 primo_s2_n1 caa:info cluster[14025022]: cl_chrepos.c
destroy_old_repository 350 1 Failed to write repository data.
Nov 12 09:13:34 primo_s2_n1 caa:warn|warning cluster[14025022]: cl_chrepos.c
destroy_old_repository 358 1 Unable to destroy repository disk
hdisk1. Manual intervention is required to clear the disk of cluster
identifiers.
Nov 12 09:13:34 primo_s2_n1 caa:info cluster[14025022]: cl_chrepos.c
automatic_repository_update 2242 1 Replaced hdisk1 with hdisk2
Nov 12 09:13:34 primo_s2_n1 caa:info cluster[14025022]: cl_chrepos.c
automatic_repository_update 2255 1 FINISH rc = 0
Nov 12 09:13:34 primo_s2_n1 caa:info cluster[14025022]: caa_protocols.c
recv_protocol_slave 1542 1 Returning from Automatic Repository
replacement rc = 0
4. As an extra verification, the AIX error log has an entry showing that a successful
repository disk replacement occurred, as shown in Example 4-22.
Example 4-22 AIX error log showing successful repository disk replacement message
LABEL: CL_ARU_PASSED
IDENTIFIER: 92EE81A5
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 87
Node Id: primo_s2_n1
Class: H
Type: INFO
WPAR: Global
Resource Name: CAA ARU
Resource Class: NONE
Resource Type: NONE
Location:
Description
Automatic Repository Update succeeded.
Probable Causes
Primary repository disk was replaced.
Failure Causes
A hardware problem prevented local node from accessing primary repository disk.
Recommended Actions
Primary repository disk was replaced using backup repository disk.
Detail Data
Primary Disk Info
hdisk1 6c1b76e1-3e0a-ff3c-3c43-cb6c3881c3bf
Replacement Disk Info
hdisk2 5890b139-e987-1451-211e-24ba89e7d1df
It is safe to remove the failed repository disk and replace it. The replacement disk can
become the new backup repository disk by following the steps in “Configuring a backup
repository disk” on page 84.
Example 4-24 Output of the AIX errpt command showing failed repository disk replacement
LABEL: CL_ARU_FAILED
IDENTIFIER: F63D60A2
Description
Automatic Repository Update failed.
Probable Causes
Unknown.
Failure Causes
Unknown.
Recommended Actions
Try manual replacement of cluster repository disk.
Detail Data
Primary Disk Info
hdisk1 6c1b76e1-3e0a-ff3c-3c43-cb6c3881c3bf
6. In addition, we note that ARU verifies the primary repository disk and fails, as shown in the
CAA log /var/adm/ras/syslog.caa in Example 4-25.
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 89
Nov 12 09:13:20 primo_s2_n1 caa:info cluster[14025022]: cluster_utils.c
cl_run_log_method 11862 1 START '/usr/lib/cluster/caa_syslog '
Nov 12 09:13:20 primo_s2_n1 caa:info unix: kcluster_event.c find_event_disk 742
Find disk called for hdisk4
Nov 12 09:13:20 primo_s2_n1 caa:info unix: kcluster_event.c
ahafs_Disk_State_register 1504 diskState set opqId = 0xF1000A0150301A00
Nov 12 09:13:20 primo_s2_n1 caa:info cluster[14025022]: cluster_utils.c
cl_run_log_method 11893 1 FINISH return = 0
Nov 12 09:13:20 primo_s2_n1 caa:info cluster[14025022]: caa_message.c
inherit_socket_inetd 930 1 IPv6=::ffff:127.0.0.1
Nov 12 09:13:20 primo_s2_n1 caa:info cluster[14025022]: cluster_utils.c
cl_kern_repos_check 11769 1 Could not read the respository.
Nov 12 09:13:20 primo_s2_n1 caa:info cluster[14025022]: caa_message.c cl_recv_req
172 1 recv successful, sock = 0, recv rc = 32, msgbytes = 32
Nov 12 09:13:20 primo_s2_n1 caa:info cluster[14025022]: caa_protocols.c
recv_protocol_slave 1518 1 Automatic Repository Replacement request being
processed.
7. ARU attempts to activate the backup repository disk, but it fails due to the fact that an AIX
VG previously existed in this disk, as shown in Example 4-26.
Example 4-26 Messages from the /var/adm/ras/syslog.caa log file showing an ARU failure
Nov 12 09:11:26 primo_s2_n1 caa:info unix: kcluster_lock.c xcluster_lock 659
xcluster_lock: nodes which responded: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Nov 12 09:11:26 primo_s2_n1 caa:info cluster[8716742]: cluster_utils.c
cl_run_log_method 11862 1 START '/usr/sbin/mkvg -y caavg_private_t hdisk2'
Nov 12 09:11:26 primo_s2_n1 caa:info cluster[8716742]: cluster_utils.c
cl_run_log_method 11893 1 FINISH return = 1
Nov 12 09:11:26 primo_s2_n1 caa:err|error cluster[8716742]: cl_chrepos.c
check_disk_add 2127 1 hdisk2 contains an existing vg.
Nov 12 09:11:26 primo_s2_n1 caa:info cluster[8716742]: cl_chrepos.c
automatic_repository_update 2235 1 Failure to move to hdisk2
Nov 12 09:11:26 primo_s2_n1 caa:info cluster[8716742]: cl_chrepos.c
automatic_repository_update 2255 1 FINISH rc = -1
Nov 12 09:11:26 primo_s2_n1 caa:info cluster[8716742]: caa_protocols.c
recv_protocol_slave 1542 1 Returning from Automatic Repository replacement
rc = -1
Example 4-27 Site selection prompt after selecting “Replace the Primary Repository Disk”
Problem Determination Tools
[MORE...1]
View Current State
PowerHA SystemMirror Log Viewing and Management
2. In our example, we select standby_site2 and a panel opens with an option to select the
replacement repository disk, as shown in Example 4-28.
3. Pressing the F4 key shows the available backup repository disks, as shown in
Example 4-29.
[Entry Fields]
Site Name standby_site2
* Repository Disk [] +
+--------------------------------------------------------------------------+
| Repository Disk |
| |
| Move cursor to desired item and press Enter. |
| |
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 91
| 00f6f5d0ba49cdcc |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F8=Image F10=Exit Enter=Do |
F5| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
4. Selecting the backup repository disk opens the SMIT panel showing the selected disk, as
shown in Example 4-30.
[Entry Fields]
Site Name standby_site2
* Repository Disk [00f6f5d0ba49cdcc] +
5. Last, pressing the Enter key runs the repository disk replacement. After the repository disk
is replaced, the panel that is shown in Example 4-31 opens.
When this option is used, all cluster managers are initialized and then all RGs are started.
Under the covers the system does a manual cluster start first and then activates all resource
groups.
This option is very helpful if you have start dependencies defined and these dependencies
are across more than one node.
Figure 4-2 shows what can happen if RG dependencies are defined and the cluster gets
started by using automatic. The left side shows what we intend to achieve and the right side
shows what can happen. This example defines three RGs with a start after dependency. All
nodes have the same startup policy but different home nodes.
Figure 4-3 on page 94 shows how the cluster start happens when the new delayed option is
used. Compared to the example in Figure 4-2 nothing changes from a configuration point of
view. The only difference is that the cluster has started with the manage=delayed option.
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 93
Figure 4-3 Cluster start using manage option delayed
The RMC application programming interface (API) is the only interface that can be used by
applications to exchange data with the RSCT components. RMC manages the RMs and
receives data from them. Group Services is a client of RMC. Depending on whether
PowerHA V7 is installed, it connects to CAA. Otherwise, it connects to the RSCT Topology
Services.
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 95
RSCT domains
An RSCT management domain is a set of nodes with resources that can be managed and
monitored from one of the nodes, which is designated as the management control point
(MCP). All other nodes are considered to be managed nodes. Topology Services and Group
Services are not used in a management domain. Example 4-5 shows the high-level
architecture of an RSCT management domain.
An RSCT peer domain is a set of nodes that have a consistent knowledge of the existence of
each other, and of the resources shared among them. On each node within the peer domain,
RMC depends on a core set of cluster services, which include Topology Services, Group
Services, and cluster security services. Figure 4-6 shows the high-level architecture of an
RSCT peer domain.
Figure 4-7 shows the high-level architecture for how an RSCT-managed domain and RSCT
peer domains can be combined. In this example, Node Y is an RSCT management server.
You have three nodes as managed nodes (Node A, Node B, and Node C). Node B and Node
C are part of an RSCT peer domain.
You can have multiple peer domains within a managed domain. A node can be part of a
managed domain and a peer domain. A given node can belong to only a single peer domain,
as shown in Figure 4-7.
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 97
Example of a management and a peer domain
The example here is simplified. It just shows one Hardware Management Console (HMC) that
is managing three LPARS, where two of them are used for a 2-node PowerHA cluster.
In a Power Systems environment, the HMC is always the management server in the RSCT
management domain. The LPARs are clients to this server from an RSCT point of view. For
example, this management domain is used to do dynamic LPAR (DLPAR) operations on the
different LPARs.
The peer domain that represents a CAA cluster acquires configuration information and
liveness results from CAA. It introduces some differences in the mechanics of peer domain
operations, but few in the view of the peer domain that is available to the users.
Only one CAA cluster can be defined on a set of nodes. Therefore, if a CAA cluster is defined,
the peer domain that represents it is the only peer domain that can exist, and it exists and is
online for the life of the CAA cluster.
When your cluster is configured and synchronized, you can check the RSCT peer domain by
using the lsrpdomain command. To list the nodes in this peer domain, you can use the
lsrpnode command. Example 4-32 shows a sample output of these commands.
The RSCTActiveVersion number of the lsrpdomain output can show a back-level version
number. This is the lowest RSCT version that is required by a new joining node. In a PowerHA
environment, there is no need to modify this number.
The value of yes for MixedVersions means that you have at least one node with a higher
version than the displayed RSCT version. The lsrpdnode command lists the used RSCT
version by node.
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 99
To be clear, doing such an update does not give you any advantages in a PowerHA
environment. In fact, if you delete the cluster and then re-create it manually, or by using an
existing snapshot of the RSCT peer domain version, you are back to the original version,
which was 3.1.5.0 in our example.
Example 4-34 shows that in our case CAA is running on the local node where we used the
lscluster command. But, on the remote node CAA was stopped.
To stop CAA, we use the clmgr off node powerha-c2n2 STOP_CAA=yes command.
Example 4-35 shows what the RSCT looks like in our 2-node cluster.
Because we define each of our nodes to a different site, the lscluster -c command lists only
one node. Example 4-36 shows an example output from node 1.
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 101
Local node maximum capabilities: CAA_NETMON, AUTO_REPOS_REPLACE, HNAME_CHG,
UNICAST, IPV6, SITE
Effective cluster-wide capabilities: CAA_NETMON, AUTO_REPOS_REPLACE, HNAME_CHG,
UNICAST, IPV6, SITE
#
Figure 4-11 PowerHA, Reliable Scalable Clustering Technology, and Cluster Aware AIX overview
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 103
4.5.1 Configuring PowerHA, Reliable Scalable Clustering Technology, and
Cluster Aware AIX
There is no need to configure RSCT or CAA. You just need to configure or migrate PowerHA.
To set it up, use the smitty sysmirror panels or the clmgr command, as shown in
Figure 4-12. The different migration processes operate in a similar way.
Figure 4-12 Set up PowerHA, Reliable Scalable Clustering Technology, and Cluster Aware AIX
In traditional situations, there is no need to use CAA or RSCT commands because they are
all managed by PowerHA.
To check whether the services are up, you can use different commands. In the following
examples, we use the clmgr, clRGinfo, lsrpdomain, and lscluster commands. Example 4-38
shows the output of the clmgr and clRGinfo PowerHA commands.
To check whether RSCT is running, use the lsrpdomain command. Example 4-39 shows the
output of the command.
Example 4-39 Checking for RSCT when all components are running
# lsrpdomain
Name OpState RSCTActiveVersion MixedVersions TSPort GSPort
CL1_N1_cluster Online 3.1.5.0 Yes 12347 12348
#
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 105
To check whether CAA is running correctly, we use the lscluster command. You must
specify an option when using the lscluster command. We use the option -m in
Example 4-40. In most cases, any other valid option can be used as well. However, to be
sure, use the -m option.
In most cases, the general behavior is that when you get a valid output, CAA is running.
Otherwise, you get an error message informing you that the cluster services are not active.
The following examples use the same commands as in “All PowerHA components are up” on
page 105 to check the status of the different components. Example 4-41 shows the output of
the clmgr and clRGinfo PowerHA commands.
As expected, the output of the lsrpdomain RSCT command shows that RSCT is still online
(see Example 4-42).
Also, as expected, checking for CAA shows that it is running, as shown in Example 4-43.
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 107
Again, we use the commands used in “All PowerHA components are up” on page 105 to
check the status of the different components. Example 4-44 shows the output of the PowerHA
commands clmgr and clRGinfo.
As expected, the clmgr command shows that PowerHA is offline, and clRGinfo returns an
error message.
The output of the RSCT lsrpdomain command shows that RSCT is still online
(Example 4-45).
The check for CAA shows that it is running, as shown in Example 4-46.
When RSCT is running, CAA must be up as well. This statement is only true for a PowerHA
cluster.
There are situations when you need to stop all three cluster components, for example, when
you must change the RSCT or CAA software, as shown in Figure 4-16 on page 109.
Example 4-47 shows the status of the cluster with all services stopped. As in the previous
examples, we use the clmgr and clRGinfo commands.
The lsrpdomain command shows that the RSCT cluster is offline, as shown in Example 4-48.
The output of the lscluster command creates an error message in this case, as shown in
Example 4-49.
Chapter 4. What is new with IBM Cluster Aware AIX and Reliable Scalable Clustering Technology 109
4.5.3 How to start and stop CAA and RSCT
CAA and RSCT are stopped and started together. CAA and RSCT are automatically started
as part of an operating system start (if it is configured by PowerHA).
If you want to stop CAA and RSCT, you must use the clmgr command (at the time of writing,
SMIT does not support this operation). To stop it, you must use the STOP_CAA=yes argument.
This argument can be used for both CAA and RSCT, and the complete cluster or a set of
nodes.
The information when you stopped CAA manually is preserved across an operating system
restart. So, if you want to start PowerHA on a node where CAA and RSCT were stopped
deliberately, you must use the START_CAA argument.
To start CAA and RSCT, you can use the clmgr command with the argument START_CAA=yes.
This command also starts PowerHA.
Example 4-50 shows how to stop or start CAA and RSCT. All of these examples stop all three
components or start all three components.
Example 4-50 Using clmgr to start and stop CAA and RSCT
To Stop CAA and RSCT:
- clmgr off cluster STOP_CAA=yes
- clmgr off node system-a STOP_CAA=yes
Starting with AIX 7.1 TL4 or AIX 7.2, you can use the clctrl command to stop or start CAA
and RSCT. To stop it, use the -stop option for the clctrl command. This also stops
PowerHA. To start CAA and RSCT, you can use the -start option. If -start is used, only
CAA and RSCT start. To start PowerHA, you must use the clmgr command, or use SMIT
afterward.
Chapter 5. Migration
This chapter covers the migration options from PowerHA V7.1.3 to PowerHA V7.2.
Before beginning the migration procedure, always have a contingency plan in case any
problems occur. Here are some general suggestions:
Create a backup of rootvg.
In some cases of upgrading PowerHA, depending on the starting point, updating or
upgrading the AIX base operating system is also required. Therefore, a preferred practice
is to save your existing rootvg. One method is to create a clone by using alt_disk_copy
on other available disks on the system. That way, a simple change to the bootlist and a
restart can easily return the system to the beginning state.
Other options are available, such as mksysb, alt_disk_install, and multibos.
Save the existing cluster configuration.
Create a cluster snapshot before the migration. By default, it is stored in the following
directory; make a copy of it and also save a copy from the cluster nodes for additional
safety.
/usr/es/sbin/cluster/snapshots
Save any user-provided scripts.
This most commonly refers to custom events, pre- and post-events, application controller,
and application monitoring scripts.
Save common configuration files needs for proper functioning, such as:
/etc/hosts
/etc/cluster/rhosts
/usr/es/sbin/cluster/netmon.cf
Verify, by using the lslpp -h cluster.* command, that the current version of PowerHA is in
the COMMIT state and not in the APPLY state. If not, run smit install_commit before you
install the most recent software version.
Software requirements
The software requirements are as follows:
IBM AIX 7.1 with Technology Level 3 with Service Pack 5, or later
IBM AIX 7.1 with Technology Level 4 with Service Pack 2, or later
IBM AIX 7.2 with Service Pack 1, or later
IBM AIX 7.2 with Technology Level 1, or later
Hardware
Support is available only for POWER5 technologies and later.
Important: Always start with the latest service packs that are available for PowerHA, AIX,
and Virtual I/O Server (VIOS).
Rolling method
A rolling migration provides the least amount of downtime by upgrading one node at a time.
Important: Always start with the latest service packs that are available for PowerHA, AIX,
and VIOS.
Snapshot method
Some of these steps can often be performed in parallel because the entire cluster is offline.
Additional specifics when migrating from PowerHA 6.1, including crucial interim fixes, can be
found at PowerHA SystemMirror interim fix Bundles information.
Important: Always start with the latest service packs that are available for PowerHA, AIX,
and VIOS.
Nondisruptive upgrade
This method applies only when the AIX level is already at appropriate levels to support
PowerHA V7.2.1 or later. Complete the following steps on one node:
1. Stop cluster services by unmanaging the RGs.
2. Upgrade PowerHA (update_all).
3. Start cluster services with an automatic manage of the RGs.
Important: When restarting cluster services with the Automatic option for managing RGs,
the application start scripts are invoked. Make sure that the application scripts can detect
that the application is already running, or copy them and put a dummy blank executable
script in their place and then copy them back after start.
Important: Migrating from PowerHA V6.1 to V7.2.1 is not supported. You must upgrade to
either V7.1.x or V7.2.0 first.
From V6.1 Update to SP17 then Rb,S, O are all viable options Rb , S, O N/A
From Rb , S, O Rb , S, O Rb , S, O Rb , S, O N/A
V7.1.0
From R, S, O, Nb R, S, O, Nb R, S, O, Nb N/A
V7.1.1
From R, S, O, Nb R, S, O, Nb N/A
V7.1.2
From R, S, O, Nb R, S, O, Nb
V7.1.3
From R, S, O, Nb
V7.2.0
a. R: Rolling, S: Snapshot, O: Offline, and N: Nondisruptive.
b. This option is available only if the beginning AIX level is high enough to support the newer
version.
[Entry Fields]
* Stop now, on system restart or both now +
Stop Cluster Services on these nodes [Cass] +
BROADCAST cluster shutdown? true +
* Select an Action on Resource Groups Bring Resource Groups> +
2. Upgrade AIX.
In our scenario, we have supported AIX levels for PowerHA V7.2.1 and do not need to
perform this step. But if you do, a restart is required before continuing.
3. Verify that the clcomd daemon is active, as shown in Figure 5-3.
[MORE...6]
5. Ensure that the file /usr/es/sbin/cluster/netmon.cf exists and that it contains at least
one pingable IP address because the installation or upgrade of PowerHA filesets can
overwrite this file with an empty one.
6. Start cluster services on node Cass by running smitty clstart or clmgr start
node=Cass.
A message displays about cluster verification being skipped because of mixed versions,
as shown in Figure 5-5 on page 119.
Important: While the cluster is a mixed cluster state, do not make any cluster changes
or attempt to synchronize the cluster.
After starting, validate that the cluster is stable before continuing by running the following
command:
lssrc -ls clstrmgrES |grep -i state.
7. Repeat the previous steps for node Jess. However, when stopping cluster services, choose
the Move Resource Groups option, as shown in Figure 5-6.
[Entry Fields]
* Stop now, on system restart or both now +
Stop Cluster Services on these nodes [Jess] +
BROADCAST cluster shutdown? true +
* Select an Action on Resource Groups Move Resource Groups +
Important: If upgrading to AIX 7.2.0, see the AIX 7.2 Release Notes regarding RSCT
filesets when upgrading.
In our scenario, we have supported AIX levels for PowerHA V7.2 and do not need to
perform this step. But if you do, a restart is required before continuing.
9. Verify that the clcomd daemon is active, as shown in Figure 5-7.
10.Upgrade PowerHA on node Jess. To upgrade PowerHA, run smitty update_all, as shown
in Figure 5-4 on page 118, or run the following command from within the directory in which
the updates are:
install_all_updates -vY -d .
11.Ensure that the file /usr/es/sbin/cluster/netmon.cf exists and that it contains at least
one pingable IP address because the installation or upgrade of PowerHA filesets can
overwrite this file with an empty one.
Important: Both nodes must show version=17; otherwise, the migration did not complete
successfully. Call IBM Support.
Although the migration is complete, the resource is running on node Cass. If you want, move
the RG back to node Jess, as shown in Example 5-2.
Waiting for the cluster to process the resource group movement request....
[Entry Fields]
* Stop now, on system restart or both now +
Stop Cluster Services on these nodes [Jess,Cass] +
BROADCAST cluster shutdown? true +
* Select an Action on Resource Groups Bring Resource Groups> +
Important: If upgrading to AIX 7.2.0, see the AIX 7.2 Release Notes regarding RSCT
filesets when upgrading.
In our scenario, we have supported AIX levels for PowerHA v7.2.1 and do not need to
perform this step. But if you do, a restart is required before continuing.
3. Verify that the clcomd daemon is active on both nodes, as shown in Figure 5-9.
-------------------------------
NODE Jess
-------------------------------
Subsystem Group PID Status
clcomd caa 20775182 active
-------------------------------
NODE Cass
-------------------------------
Subsystem Group PID Status
clcomd caa 5177840 active
Figure 5-9 Verify that clcomd is active
[Entry Fields]
* Stop now, on system restart or both now +
Stop Cluster Services on these nodes [Jess,Cass] +
BROADCAST cluster shutdown? true +
* Select an Action on Resource Groups Bring Resource Groups> +
[Entry Fields]
* Cluster Snapshot Name [pre721migration] /
Custom-Defined Snapshot Methods [] +
* Cluster Snapshot Description [713 SP5 cluster]
Important: If upgrading to AIX 7.2.0, see the AIX 7.2 Release Notes regarding RSCT
filesets when upgrading.
In our scenario, we have supported AIX levels for PowerHA V7.2 and do not need to
perform this step. But if you do, a restart is required before continuing.
4. Verify that the clcomd daemon is active on both nodes, as shown in Figure 5-12.
-------------------------------
NODE Jess
-------------------------------
Subsystem Group PID Status
clcomd caa 20775182 active
-------------------------------
NODE Cass
-------------------------------
Subsystem Group PID Status
clcomd caa 5177840 active
Figure 5-12 Verify that clcomd is active
5. Next, uninstall PowerHA 6.1 on both nodes Jess and Cass by running smitty remove on
cluster.*.
6. Install PowerHA V7.2.1 by running smitty install_all on both nodes.
7. Convert the previously created snapshot as follows:
/usr/es/sbin/cluster/conversion/clconvert_snapshot -v 7.1.3 -s pre721migration
Extracting ODM's from snapshot file... done.
Converting extracted ODM's... done.
Rebuilding snapshot file... done.
[Entry Fields]
Cluster Snapshot Name pre721migration>
Cluster Snapshot Description 713 SP5 cluster>
Un/Configure Cluster Resources? [Yes] +
Force apply if verify fails? [No] +
Warning: "WHEN" must be specified. Since it was not, a default of "now" will be
used.
Broadcast message from root@Cass (tty) at 11:39:28 ...
2. Upgrade PowerHA (update_all) by running the following command from within the
directory in which the updates are:
install_all_updates -vY -d .
3. Start cluster services by using an automatic manage of the RGs on Cass, as shown in
Example 5-4.
Example 5-4 Start the cluster node with the automatic manage option
# clmgr start node=Cass
Warning: "WHEN" must be specified. Since it was not, a default of "now" will be
used.
Warning: "MANAGE" must be specified. Since it was not, a default of "auto" will
be used.
Important: Restarting cluster services with the Automatic option for managing RGs
invokes the application start scripts. Make sure that the application scripts can detect
that the application is already running, or copy and put a dummy blank executable script
in their place and then copy them back after start.
Example 5-5 Stop the cluster node with the unmanage option
# clmgr stop node=Jess manage=unmanage
Warning: "WHEN" must be specified. Since it was not, a default of "now" will be
used.
Broadcast message from root@Jess (tty) at 11:52:48 ...
PowerHA SystemMirror on Jess shutting down. Please exit any cluster applications...
Jess: 0513-044 The clevmgrdES Subsystem was requested to stop.
.
"Jess" is now unmanaged.
Example 5-6 Start a cluster node with the automatic manage option
# clmgr start node=Jess
Warning: "WHEN" must be specified. Since it was not, a default of "now" will be
used.
Warning: "MANAGE" must be specified. Since it was not, a default of "auto" will
be used.
Important: Restarting cluster services with the Automatic option for managing RGs
invokes the application start scripts. Make sure that the application scripts can detect
that the application is already running, or copy and put a dummy blank executable script
in their place and then copy them back after start.
7. Verify that the version numbers show correctly, as shown in Example 5-1 on page 120.
8. Ensure that the file /usr/es/sbin/cluster/netmon.cf exists on all nodes and that it
contains at least one pingable IP address because the installation or upgrade of PowerHA
filesets can overwrite this file with an empty one.
Although the version level is different, the steps are identical as though starting from
Version 7.2.0.
[Entry Fields]
* Stop now, on system restart or both now +
Stop Cluster Services on these nodes [Cass] +
BROADCAST cluster shutdown? true +
* Select an Action on Resource Groups Bring Resource Groups> +
You can also stop cluster services by using the clmgr command:
clmgr stop node=Cass
2. Upgrade AIX.
In our scenario, we have supported AIX levels for PowerHA V7.2.1 and do not need to
perform this step. But if you do, a restart is required before continuing.
3. Verify that the clcomd daemon is active, as shown in Figure 5-16.
cluster.es.client.clcomd 7.2.1.0
cluster.es.client.lib 7.2.1.0
cluster.es.client.rte 7.2.1.0
cluster.es.client.utils 7.2.1.0
cluster.es.cspoc.cmds 7.2.1.0
cluster.es.cspoc.rte 7.2.1.0
cluster.es.migcheck 7.2.1.0
cluster.es.server.diag 7.2.1.0
cluster.es.server.events 7.2.1.0
cluster.es.server.rte 7.2.1.0
cluster.es.server.testtool 7.2.1.0
cluster.es.server.utils 7.2.1.0
cluster.license 7.2.1.0
cluster.es.client.clcomd 7.2.1.0
cluster.es.client.lib 7.2.1.0
cluster.es.client.rte 7.2.1.0
cluster.es.client.utils 7.2.1.0
cluster.es.cspoc.cmds 7.2.1.0
cluster.es.cspoc.rte 7.2.1.0
cluster.es.migcheck 7.2.1.0
cluster.es.server.diag 7.2.1.0
cluster.es.server.events 7.2.1.0
cluster.es.server.rte 7.2.1.0
cluster.es.server.testtool 7.2.1.0
cluster.es.server.utils 7.2.1.0
cluster.license 7.2.1.0
SUCCESSES
---------
Filesets listed in this section passed pre-installation verification
and will be installed.
+-----------------------------------------------------------------------------+
BUILDDATE Verification ...
+-----------------------------------------------------------------------------+
Verifying build dates...done
FILESET STATISTICS
------------------
13 Selected to be installed, of which:
13 Passed pre-installation verification
----
13 Total to be installed
+-----------------------------------------------------------------------------+
Installing Software...
+-----------------------------------------------------------------------------+
5765H3900
Copyright International Business Machines Corp. 2001, 2016.
5765H3900
Copyright International Business Machines Corp. 2010, 2016.
5765H3900
Copyright International Business Machines Corp. 1985, 2016.
5765H3900
Copyright International Business Machines Corp. 1985, 2016.
5765H3900
Copyright International Business Machines Corp. 2008, 2016.
5765H3900
Copyright International Business Machines Corp. 1985, 2016.
Some configuration files could not be automatically merged into the system
during the installation. The previous versions of these files have been
saved in a configuration directory as listed below. Compare the saved files
and the newly installed files to determine whether you need to recover
configuration data. Consult product documentation to determine how to
merge the data.
Please wait...
/usr/sbin/rsct/install/bin/ctposti
0513-071 The ctrmc Subsystem has been added.
0513-059 The ctrmc Subsystem has been started. Subsystem PID is 12583318.
0513-059 The IBM.ConfigRM Subsystem has been started. Subsystem PID is
11665748.
cthagsctrl: 2520-208 The cthags subsystem must be stopped.
0513-029 The cthags Subsystem is already active.
Multiple instances are not supported.
0513-095 The request for subsystem refresh was completed successfully.
done
+-----------------------------------------------------------------------------+
Summaries:
+-----------------------------------------------------------------------------+
Installation Summary
--------------------
Name Level Part Event Result
5. Ensure that the file /usr/es/sbin/cluster/netmon.cf exists and that it contains at least
one pingable IP address because the installation or upgrade of PowerHA filesets can
overwrite this file with an empty one.
6. Start cluster services on node Cass by running smitty clstart or clmgr start
node=Cass.
During the start, a message displays about cluster verification being skipped because of
mixed versions, as shown in Figure 5-17.
Important: While the cluster is this mixed cluster state, do not make any cluster
changes or attempt to synchronize the cluster.
[Entry Fields]
* Stop now, on system restart or both now +
Stop Cluster Services on these nodes [Jess] +
BROADCAST cluster shutdown? true +
* Select an Action on Resource Groups Move Resource Groups +
Important: If upgrading to AIX 7.2.0, see the AIX 7.2 Release Notes regarding RSCT
filesets when upgrading.
In our scenario, we have supported AIX levels for PowerHA V7.2.1 and do not need to
perform this step. But if you do, a restart is required before continuing.
9. Verify that the clcomd daemon is active, as shown in Figure 5-19.
10.Upgrade PowerHA on node Jess. To upgrade PowerHA, run smitty update_all, as shown
in Figure 5-4 on page 118, or run the following command from within the directory in which
the updates are:
install_all_updates -vY -d .
11.Ensure that the file /usr/es/sbin/cluster/netmon.cf exists and that it contains at least
one pingable IP address because the installation or upgrade of PowerHA filesets can
overwrite this file with an empty one.
12.Start cluster services on node Jess by running smitty clstart or clmgr start node=Jess.
Important: Both nodes must show version=17, otherwise the migration did not
complete. Call IBM Support.
14.Although the migration is complete, the resource is running on node Cass. If you want,
move the RG back to node Jess, as shown in Example 5-10.
Waiting for the cluster to process the resource group movement request....
Although the version level is different, the steps are identical as though starting from
Version 7.2.0.
[Entry Fields]
* Stop now, on system restart or both now +
Stop Cluster Services on these nodes [Jess,Cass] +
BROADCAST cluster shutdown? true +
* Select an Action on Resource Groups Bring Resource Groups> +
Important: If upgrading to AIX 7.2.0, see the AIX 7.2 Release Notes regarding RSCT
filesets when upgrading.
In our scenario, we have supported AIX levels for PowerHA V7.2.1 and do not need to
perform this step. But if you do, a restart is required before continuing.
3. Verify that the clcomd daemon is active on both nodes, as shown in Figure 5-21.
-------------------------------
NODE Jess
-------------------------------
Subsystem Group PID Status
clcomd caa 20775182 active
-------------------------------
NODE Cass
-------------------------------
Subsystem Group PID Status
clcomd caa 5177840 active
Figure 5-21 Verify that clcomd is active
Warning: "WHEN" must be specified. Since it was not, a default of "now" will be
used.
PowerHA SystemMirror on Jessica shutting down. Please exit any cluster applications...
Cass: 0513-004 The Subsystem or Group, clinfoES, is currently inoperative.
Cass: 0513-044 The clevmgrdES Subsystem was requested to stop.
Jess: 0513-004 The Subsystem or Group, clinfoES, is currently inoperative.
Jess: 0513-044 The clevmgrdES Subsystem was requested to stop.
...
Make sure that the cluster node is in the ST_INIT state by reviewing the clcmd lssrc -ls
clstrmgrES|grep state output.
[Entry Fields]
* Cluster Snapshot Name [720cluster] /
Custom-Defined Snapshot Methods [] +
* Cluster Snapshot Description [720 SP1 cluster]
Important: If upgrading to AIX 7.2.0, see the AIX 7.2 Release Notes regarding RSCT
filesets when upgrading.
In our scenario, we have supported AIX levels for PowerHA V7.2.1 and do not need to
perform this step. But if you do, a restart is required before continuing.
4. Verify that the clcomd daemon is active on both nodes, as shown in Figure 5-24.
-------------------------------
NODE Jess
-------------------------------
Subsystem Group PID Status
clcomd caa 2102992 active
-------------------------------
NODE Cass
-------------------------------
Subsystem Group PID Status
clcomd caa 5110698 active
Figure 5-24 Verifying that clcomd is active
5. Uninstall PowerHA 6.1 on both nodes Jess and Cass by running smitty remove on
cluster.*.
6. Install PowerHA V7.2.1 by running smitty install_all on both nodes.
7. Convert the previously created snapshot:
/usr/es/sbin/cluster/conversion/clconvert_snapshot -v 7.2 -s 720cluster
Extracting ODM's from snapshot file... done.
Converting extracted ODM's... done.
Rebuilding snapshot file... done.
[Entry Fields]
Cluster Snapshot Name 720cluster>
Cluster Snapshot Description 720 SP1 cluster>
Un/Configure Cluster Resources? [Yes] +
Force apply if verify fails? [No] +
Although the version level is different, the steps are identical as though starting from
Version 7.2.0.
1. Stop cluster services by performing an unmanage of the RGs on node Cass, as shown in
Example 5-11.
Example 5-11 Stop the cluster node with the unmanage option
# clmgr stop node=Cass manage=unmanage
Warning: "WHEN" must be specified. Since it was not, a default of "now" will be
used.
Broadcast message from root@Cass (tty) at 14:27:38 ...
2. Upgrade PowerHA (update_all) by running the following command from within the
directory in which the updates are (see Example 5-8 on page 128):
install_all_updates -vY -d .
3. Start the cluster services with an automatic manage of the RGs on Cass, as shown in
Example 5-12.
Example 5-12 Start the cluster node with the automatic manage option
# clmgr start node=Cass
Warning: "WHEN" must be specified. Since it was not, a default of "now" will be
used.
Warning: "MANAGE" must be specified. Since it was not, a default of "auto" will
be used.
Important: Restarting cluster services with the Automatic option for managing RGs
invokes the application start scripts. Make sure that the application scripts can detect
that the application is already running, or copy and put a dummy blank executable script
in their place and then copy them back after start.
Example 5-13 Stop the cluster node with the unmanage option
# clmgr stop node=Jess manage=unmanage
Warning: "WHEN" must be specified. Since it was not, a default of "now" will be
used.
Broadcast message from root@Jess (tty) at 14:52:58 ...
5. Upgrade PowerHA (update_all) by running the following command from within the
directory in which the updates are:
install_all_updates -vY -d .
A summary of the PowerHA filesets update is shown in Example 5-14.
+-----------------------------------------------------------------------------+
Summaries:
+-----------------------------------------------------------------------------+
Installation Summary
--------------------
Name Level Part Event Result
-------------------------------------------------------------------------------
cluster.license 7.2.1.0 USR APPLY SUCCESS
cluster.es.migcheck 7.2.1.0 USR APPLY SUCCESS
cluster.es.migcheck 7.2.1.0 ROOT APPLY SUCCESS
cluster.es.cspoc.rte 7.2.1.0 USR APPLY SUCCESS
cluster.es.cspoc.cmds 7.2.1.0 USR APPLY SUCCESS
cluster.es.cspoc.rte 7.2.1.0 ROOT APPLY SUCCESS
cluster.es.client.rte 7.2.1.0 USR APPLY SUCCESS
cluster.es.client.utils 7.2.1.0 USR APPLY SUCCESS
cluster.es.client.lib 7.2.1.0 USR APPLY SUCCESS
cluster.es.client.clcomd 7.2.1.0 USR APPLY SUCCESS
cluster.es.client.rte 7.2.1.0 ROOT APPLY SUCCESS
cluster.es.client.lib 7.2.1.0 ROOT APPLY SUCCESS
cluster.es.client.clcomd 7.2.1.0 ROOT APPLY SUCCESS
cluster.es.server.testtool 7.2.1.0 USR APPLY SUCCESS
cluster.es.server.rte 7.2.1.0 USR APPLY SUCCESS
cluster.es.server.utils 7.2.1.0 USR APPLY SUCCESS
cluster.es.server.events 7.2.1.0 USR APPLY SUCCESS
cluster.es.server.diag 7.2.1.0 USR APPLY SUCCESS
cluster.es.server.rte 7.2.1.0 ROOT APPLY SUCCESS
cluster.es.server.utils 7.2.1.0 ROOT APPLY SUCCESS
cluster.es.server.events 7.2.1.0 ROOT APPLY SUCCESS
cluster.es.server.diag 7.2.1.0 ROOT APPLY SUCCESS
Example 5-15 Start the cluster node with the automatic manage option
# clmgr start node=Jess
Warning: "WHEN" must be specified. Since it was not, a default of "now" will be
used.
Warning: "MANAGE" must be specified. Since it was not, a default of "auto" will
be used.
Important: Restarting cluster services with the Automatic option for managing RGs
invokes the application start scripts. Make sure that the application scripts can detect
that the application is already running, or copy and put a dummy blank executable script
in their place and then copy them back after start.
7. Verify that the version numbers show correctly, as shown in Example 5-1 on page 120.
8. Ensure that the file /usr/es/sbin/cluster/netmon.cf exists on all nodes and that it
contains at least one pingable IP address because the installation or upgrade of PowerHA
filesets can overwrite this file with an empty one.
By integrating with DLPAR and CoD resources, PowerHA SystemMirror ensures that each
node can support the application with reasonable performance at a minimum cost. This way,
you can tune the capacity of the logical partition flexibly when your application requires more
resources, without having to pay for idle capacity until you need it (for On/Off CoD), or without
keeping acquired resources if you do not use them (for Enterprise Pool CoD).
You can configure cluster resources so that the logical partition with minimally allocated
resources serves as a standby node, and the application is on another LPAR node that has
more resources than the standby node. This way, you do not use any additional resources
that the frames have until the resources are required by the application.
PowerHA SystemMirror uses the system-connected HMC to perform DLPAR operation and
manage CoD resources.
Table 6-1 displays all available types of the CoD offering. Only two of them are dynamically
managed and controlled by PowerHA SystemMirror: EPCoD and On/Off CoD.
Utility CoD (temporary) Utility CoD automatically is performed at the PHYP/System level.
Memory and Processor PowerHA cannot play a role in the same system.
Trial CoD
Trial CoD are temporary resources, but they are not set to On or Off to follow dynamic needs.
When Trial CoD standard or exception code is entered into the HMC, these resources are On
immediately, and elapsed time starts immediately. The amount of resources that is granted by
Trial CoD directly enters the available DLPAR resources. It is as though these were
configured as DLPAR resources.
Therefore, PowerHA SystemMirror can dynamically control the Trial CoD resource after
customer manually enters a code to activate the resource through HMC.
Figure 6-1 shows a summary of the SMIT menu navigation for all new ROHA panels. For the
new options of clmgr command, see 6.14.1, “The clmgr interface to manage Resource
Optimized High Availability” on page 237.
Table 6-2 Context-sensitive help for the Resource Optimized High Availability entry point
Name and fast path Context-sensitive help (F1)
Resource Optimized Choose this option to configure ROHA. ROHA performs dynamic
High Availability management of hardware resources (memory and CPU) for the account
# smitty cm_cfg_roha of PowerHA SystemMirror. This dynamic management of resources uses
three types of mechanism: DLPAR, On/Off CoD, and Enterprise Pool CoD.
If the resources that are available on the central electronic complex are not
sufficient, and cannot be obtained through a DLPAR operation, it is
possible to fetch external pools of resources that are provided by CoD:
Either On/Off or Enterprise Pool. On/Off CoD can result in extra costs, and
a formal agreement from the user is required. The user must configure
Hardware Management Consoles (HMC) to for acquisition/release of
resources.
HMC Configuration
Hardware Resource Provisioning for Application Controller
Change/Show Default Cluster Tunables
Figure 6-3 Resource Optimized High Availability panel
Table 6-3 shows the help information for the ROHA panel.
Table 6-3 Context-sensitive help for the Resource Optimized High Availability panel
Name and fast path Context-sensitive help (F1)
HMC Configuration This option configures the HMC that is used by your cluster configuration,
# smitty cm_cfg_hmc and to optionally associate the HMC to your cluster’s nodes. If no HMC is
associated with a node, PowerHA SystemMirror uses the default cluster
configuration.
Change/Show This option changes or shows CPU and memory resource requirements
Hardware Resource for any Application Controller that runs in a cluster that uses DLPAR, CoD,
Provisioning for or Enterprise Pool CoD capable nodes, or a combination.
Application Controller
# smitty
cm_cfg_hr_prov
Change/Show Default This option modifies or views the DLPAR, CoD, and Enterprise Pool CoD
Cluster Tunables configuration parameters.
# smitty
cm_cfg_def_cl_tun
HMC Configuration
Table 6-4 shows the help information for the HMC configuration.
Add HMC Definition Choose this option to add an HMC and its communication parameters, and
# smitty add this new HMC to the default list. All the nodes of the cluster use by
cm_cfg_add_hmc default these HMC definitions to perform DLPAR operations, unless you
associate a particular HMC to a node.
Change/Show HMC Choose this option to modify or view an HMC host name and
Definition communication parameters.
# smitty
cm_cfg_ch_hmc
Remove HMC Definition Choose this option to remove an HMC, and then remove it from the default
# smitty list.
cm_cfg_rm_hmc
Change/Show HMC List Choose this option to modify or view the list of an HMC of a node.
for a Node
# smitty
cm_cfg_hmcs_node
Change/Show HMC List Choose this option to modify or view the list of an HMC of a site.
for a Site
# smitty
cm_cfg_hmcs_site
Change/Show Default Choose this option to modify or view the HMC default communication
HMC Tunables tunables.
# smitty
cm_cfg_def_hmc_tun
Change/Show Default Choose this option to modify or view the default HMC list that is used by
HMC List default by all nodes of the cluster. Nodes that define their own HMC list do
# smitty not use this default HMC list.
cm_cfg_def_hmcs
Note: Before you add HMC, you must build password-less communication from AIX nodes
to the HMC. For more information, see 6.4.1, “Consideration before Resource Optimized
High Availability configuration” on page 163.
To add HMC, select Add HMC Definition. The next panel is a dialog window with a title dialog
header and several dialog command options. Its fast path is cm_cfg_add_hmc. Each item has a
context-sensitive help window that you access by pressing F1, and can have an associated
list (press F4).
Figure 6-5 shows the menu to add the HMC definition and its entry fields.
[Entry Fields]
* HMC name [] +
Table 6-5 Context-sensitive help and associated list for Add HMC Definition menu
Name Context-sensitive help (F1) Associated list (F4)
HMC name Enter the host name for the HMC. An IP address Yes (single-selection).
is also accepted here. Both IPv4 and IPv6 The list is obtained by
addresses are supported. running the following
command:
/usr/sbin/rsct/bin/rmcdo
mainstatus -s ctrmc -a IP
Nodes Enter the list of nodes that use this HMC. Yes (multiple-selection).
A list of nodes to be
proposed can be obtained
by running the following
command:
odmget HACMPnode
Sites Enter the sites that use this HMC. All nodes of Yes (multiple-selection).
the sites then use this HMC by default, unless A list of sites to be proposed
the node defines an HMC as its own level. can be obtained by running
the following command:
odmget HACMPsite
Check connectivity Select Yes to check communication links <Yes>|<No>. The default is
between the HMC between nodes and HMC. Yes.
and nodes
If Domain Name Service (DNS) is configured in your environment and DNS can do resolution
for HMC IP and host name, then you can use F4 to select one HMC to perform the add
operation.
HMC name
e16hmc1 is 9.3.207.130
e16hmc3 is 9.3.207.133
PowerHA SystemMirror also supports entering the HMC IP address to add the HMC.
Figure 6-7 shows an example of entering one HMC IP address to add the HMC.
| HMC name
|
| Move cursor to desired item and press Enter.
|
| e16hmc1
| e16hmc3
|
| F1=Help F2=Refresh F3=Cancel
| Esc+8=Image Esc+0=Exit Enter=Do
| /=Find n=Find Next
Figure 6-8 Select an HMC from a list during a change or show HMC configuration
[Entry Fields]
* HMC name e16hmc1
HMC name
e16hmc1
e16hmc3
[Entry Fields]
* HMC name e16hmc1
Select a Node
ITSO_rar1m3_Node1
ITSO_r1r9m1_Node1
Press Enter on an existing node to modify it. The next panel (Figure 6-13) is a dialog window
with a title dialog header and two dialog command options.
You cannot add or remove an HMC from this list. You can only reorder (set in the correct
precedence order) the HMCs that are used by the node.
[Entry Fields]
* Node name ITSO_rar1m3_Node1
HMC list [e16hmc1 e16hmc3]
Figure 6-13 Change/Show HMC list for a Node
Table 6-6 shows the help information to change or show the HMC list for a node.
Table 6-6 Context-sensitive help for Change or Show HMC list for a Node
Name and fast path Context-sensitive help (F1)
Node name This is the node name to associate with one or more HMCs.
HMC list The precedence order of the HMCs that are used by this node. The first in
the list is tried first, then the second, and so on. You cannot add or remove
any HMC. You can modify only the order of the already set HMCs.
Select a Site
site1
site2
Press Enter on an existing site to modify it. The next panel (Figure 6-15) is a dialog window
with a title dialog header and two dialog command options.
[Entry Fields]
* Site Name site1
HMC list [e16hmc1 e16hmc3]
Figure 6-15 Change/Show HMC List for a Site menu
You cannot add or remove an HMC from the list. You can only reorder (set in the correct
precedence order) the HMCs used by the site. See Table 6-7.
Site name This is the site name to associate with one or more HMCs.
HMC list The precedence order of the HMCs that are used by this site. The first in
the list is tried first, then the second, and so on. You cannot add or remove
any HMC. You can modify only the order of the already set HMCs.
[Entry Fields]
Resources Optimized High Availability management No +
can take advantage of On/Off CoD resources.
On/Off CoD use would incur additional costs.
Do you agree to use On/Off CoD and be billed
for extra costs?
Figure 6-19 On/Off CoD Agreement menu
This option can be modified later in the Change/Show Default Cluster Tunables panel, as
shown in Figure 6-22 on page 160.
App1
App2
To create a Hardware Resource Provisioning for an Application Controller, the list displays
only application controllers that do not already have hardware resource provisioning, as
shown in Figure 6-21.
To modify or remove a Hardware Resource Provisioning for an Application Controller, the list
displays application controllers that already have hardware resource provisioning.
Press Enter on an existing application controller to modify it. The next panel is a dialog
window with a title dialog header and three dialog command options. Each item has a
context-sensitive help window (press F1) and can have an associated list (press F4).
[Entry Fields]
* Application Controller Name App1
Application Controller This is the application controller for which you configure DLPAR and CoD
Name resource provisioning.
Use desired level from There is no default value. You must make one of the following choices:
the LPAR profile Enter Yes if you want the LPAR hosting your node to reach only the
level of resources that is indicated by the desired level of the LPAR’s
profile. By choosing Yes, you trust the desired level of LPAR profile to
fit the needs of your application controller.
Enter No if you prefer to enter exact optimal values for memory,
processor (CPU), or both. These optimal values match the needs of
your application controller, and enable you to better control the level of
resources that are allocated to your application controller.
Enter nothing if you do not need to provision any resource for your
application controller.
For all application controllers having this tunable set to Yes, the allocation
that is performed lets the LPAR reach the LPAR desired value of the
profile.
Suppose that you have a mixed configuration, in which some application
controllers have this tunable set to Yes, and other application controllers
have this tunable set to No with some optimal level of resources specified.
In this case, the allocation that is performed lets the LPAR reach the
desired value of the profile that is added to the optimal values.
Optimal number of Enter the amount of memory that PowerHA SystemMirror attempts to
gigabytes of memory acquire for the node before starting this application controller.
This Optimal number of gigabytes of memory value can be set only if the
Used desired level from the LPAR profile value is set to No.
Enter the value in multiples of ¼, ½, ¾, or 1 GB. For example, 1 represents
1 GB or 1024 MB, 1.25 represents 1.25 GB or 1280 MB, 1.50 represents
1.50 GB or 1536 MB, and 1.75 represents 1.75 GB or 1792 MB.
If this amount of memory is not satisfied, PowerHA SystemMirror takes
resource group (RG) recovery actions to move the RG with this application
to another node. Alternatively, PowerHA SystemMirror can allocate less
memory depending on the Start RG even if resources are
insufficient cluster tunable.
Optimal number of Enter the number of processors that PowerHA SystemMirror attempts to
dedicated processors allocate to the node before starting this application controller.
This attribute is only for nodes running on LPAR with Dedicated
Processing Mode.
This Optimal number of dedicated processors value can be set only if
the Used desired level from the LPAR profile value is set to No.
If this number of CPUs is not satisfied, PowerHA SystemMirror takes RG
recovery actions to move the RG with this application to another node.
Alternatively, PowerHA SystemMirror can allocate fewer CPUs depending
on the Start RG even if resources are insufficient cluster tunable.
For more information about how to acquire mobile resources at the RG
onlining stage, see 6.6, “Introduction to resource acquisition” on page 175.
For more information about how to release mobile resources at the RG
offlining stage, see 6.7, “Introduction to release of resources” on page 185.
Optimal number of Enter the number of processing units that PowerHA SystemMirror
processing units attempts to allocate to the node before starting this application controller.
This attribute is only for nodes running on LPAR with Shared Processing
Mode.
This Optimal number of processing units value can be set only if the
Used desired level from the LPAR profile value is set to No.
Processing units are specified as a decimal number with two decimal
places, 0.01 - 255.99.
This value is used only on nodes that support allocation of processing
units.
If this number of CPUs is not satisfied, PowerHA SystemMirror takes RG
recovery actions to move the RG with this application to another node.
Alternatively, PowerHA SystemMirror can allocate fewer CPUs depending
on the Start RG even if resources are insufficient cluster tunable.
For more information about how to acquire mobile resources at the RG
onlining stage, see 6.6, “Introduction to resource acquisition” on page 175.
For more information about how to release mobile resources at the RG
offlining stage, see 6.7, “Introduction to release of resources” on page 185.
Optimal number of Enter the number of virtual processors that PowerHA SystemMirror
virtual processors attempts to allocate to the node before starting this application controller.
This attribute is only for nodes running on LPAR with Shared Processing
Mode.
This Optimal number of dedicated or virtual processors value can be
set only if the Used desired level from the LPAR profile value is set to
No.
If this number of virtual processors is not satisfied, PowerHA SystemMirror
takes RG recovery actions to move the RG with this application to another
node. Alternatively, PowerHA SystemMirror can allocate fewer CPUs
depending on the Start RG even if resources are insufficient cluster
tunable.
To modify an application controller configuration, select Change/Show. The next panel is the
same selector window, as shown in Figure 6-21 on page 157. Press Enter on an existing
application controller to modify it. The next panel is the same dialog window shown in
Figure 6-21 on page 157 (except the title, which is different).
To delete an application controller configuration, select Remove. The next panel is the same
selector window that was shown previously. Press Enter on an existing application controller
to remove it.
If Use desired level from the LPAR profile is set to No, then at least the memory (Optimal
number of gigabytes of memory) or CPU (Optimal number of dedicated or virtual
processors) setting is mandatory.
[Entry Fields]
Dynamic LPAR
Always start Resource Groups Yes +
Adjust Shared Processor Pool size if required No +
Force synchronous release of DLPAR resources No +
On/Off CoD
I agree to use On/Off CoD and be billed for Yes +
extra costs
Number of activating days for On/Off CoD requests [30]
Figure 6-22 Change/Show Default Cluster Tunables menu
Always start Resource Enter Yes to have PowerHA SystemMirror start RGs even if there is any
Groups error in ROHA resources activation. This can occur when the total
requested resources exceed the LPAR profile’s maximum or the combined
available resources, or if there is a total loss of HMC connectivity. Thus,
the best-can-do allocation is performed.
Enter No to prevent starting Resources Groups if any error during ROHA
resources acquisition.
The default is Yes.
Adjust Shared Enter Yes to authorize PowerHA SystemMirror to dynamically change the
Processor Pool size if user-defined Shared-Processors Pool boundaries, if necessary. This
required change can occur only at takeover, and only if CoD resources are
activated for the central electronic complex so that changing the maximum
size of a particular Shared-Processors Pool is not done to the detriment of
other Shared-Processors Pools.
The default is No.
Force synchronous Enter Yes to have PowerHA SystemMirror release CPU and memory
release of DLPAR resources synchronously. For example, if the client must free resources on
resources one side before they can be used on the other side. By default, PowerHA
SystemMirror automatically detects the resource release mode by looking
at whether Active and Backup nodes are on the same or different CECs.
A leading practice is to have asynchronous release in order to not delay
the takeover.
The default is No.
I agree to use On/Off Enter Yes to have PowerHA SystemMirror use On/Off Capacity On
CoD and be billed for Demand (On/Off CoD) to obtain enough resources to fulfill the optimal
extra costs amount that is requested. Using On/Off CoD requires an activation code
to be entered on the HMC and can result in extra costs due to the usage
of the On/Off CoD license.
The default is No.
Number of activating Enter a number of activating days for On/Off CoD requests. If the
days for On/Off CoD requested available resources are insufficient for this duration, then the
requests longest-can-do allocation is performed. Try to allocate the amount of
resources that is requested for the longest duration. To do that, consider
the overall resources that are available. This number is the sum of the
On/Off CoD resources that are already activated but not yet used, and the
On/Off CoD resources not yet activated.
The default is 30.
Customers can use the verification tool to ensure that their environment is correct regarding
their ROHA setup. Discrepancies are called out by PowerHA SystemMirror, and the tool
assists customers to correct the configuration if possible.
The user is actively notified of critical errors. A distinction can be made between errors that
are raised during configuration and errors that are raised during cluster synchronization.
As a general principal, any problems that are detected at configuration time are presented as
warnings instead of errors.
Another general principle is that PowerHA SystemMirror checks only what is being configured
at configuration time and not the whole configuration. PowerHA SystemMirror checks the
whole configuration at verification time.
For example, when adding an HMC, you check only the new HMC (verify that it is pingable, at
an appropriate software level, and so on) and not all of the HMCs. Checking the whole
configuration can take some time and is done at verify and sync time rather than each
individual configuration step.
Check that all RG active and standby nodes are on different Info Warning
CECs. This enables the asynchronous mode of releasing
resources.
Check that all HMCs share the same level (the same version Warning Warning
of HMC).
Check that all HMCs administer the central electronic Warning Warning
complex hosting the current node. Configure two HMCs
administering the central electronic complex hosting the
current node. If not, PowerHA gives a warning message.
Check whether the HMC level supports FSP Lock Queuing. Info Info
Check that all CECs are Enterprise Pool capable. Info Info
Determine which HMC is the master, and which HMC is the Info Info
non-master.
Check that the nodes of the cluster are on different pools. This Info Info
enables the asynchronous mode of releasing resources.
Check that all HMCs are at level 7.8 or later. Info Warning
Check that the central electronic complex has unlicensed resources. Info Warning
Check that for one given node the total of optimal memory (of RG Warning Error
on this node) that is added to the profile’s minimum does not
exceed the profile’s maximum.
Check that for one given node the total of optimal CPU (of RG on Warning Error
this node) that is added to the profile’s minimum does not exceed
the profile’s maximum.
Check that for one given node the total of optimal PU (of RG on Warning Error
this node) that is added to the profile’s minimum does not exceed
the profile’s maximum.
Check that the total processing units do not break the minimum Error Error
processing units per virtual processor ratio.
The configuration XML file is used to enable and generate mobile resources.
The deactivation code is used to deactivate some of the permanent resources to inactive
mode. The number is the same independent of how many mobile resources are on the
server’s order.
For example, in one order, there are two Power Systems servers. Each one has 16 static
CPUs, eight mobile CPUs, and eight inactive CPUs, for a total of 32 CPUs. When you power
them on the first time, you see that each server has 24 permanent CPUs, 16 static CPUs,
plus 8 mobile CPUs.
After you create the Enterprise Pool with the XML configuration file, you see that there are 16
mobile CPUs that are generated in the Enterprise Pool, but the previous eight mobile CPUs
are still in permanent status in each server. This results in the server’s status being different
from its original order. This brings some issues in future post-sales activities.
After you finish these two steps, each server has 16 static CPUs and 16 inactive CPUs, and
the Enterprise Pool has 16 mobile CPUs. Then, the mobile CPUs can be assigned to each of
the two servers through the HMC GUI or the command-line interface.
Note: These two steps will be combined into one step in the future. At the time of writing,
you must perform each step separately.
Note: This de-activation code updates the IBM CoD website after you receive the note.
This de-activation code has RPROC and RMEM. RPROC is for reducing processor
resources, and RMEM is for reducing memory resources.
4. After entering the de-activation code, you must send a listing of the updated Vital Product
Data (VPD) output to the CoD Project office at [email protected].
Collect the VPD by using the HMC command line, as shown in Example 6-1.
5. With the receipt of the lscod profile, the Project Office updates the CoD database records
and closes out your request.
For more information about how to use the configuration XML file to create Power Enterprise
Pool and some management concept, see Power Enterprise Pools on IBM Power Systems,
REDP-5101.
If there is a Power Enterprise Pool that is configured, configure a backup HMC for Enterprise
Pool and add both of them into PowerHA SystemMirror by running the clmgr add hmc <hmc>’
command or through the SMIT menu. Thus, PowerHA SystemMirror can provide the fallover
function if the master HMC fails. Section 6.12.1, “Switching to the backup HMC for the Power
Enterprise Pool” on page 228 introduces some prerequisites when you set up the Power
Enterprise Pool.
Note: At the time of writing, Power Systems Firmware supports a pair of HMCs to manage
one Power Enterprise Pool: One is in master mode, and the other one is in backup mode.
Note: At the time of writing, for one Power Systems server, IBM only supports at most two
HMCs to manage it.
Example 6-2 Show HMC information with the clmgr view report ROHA through AIX
...
Enterprise pool 'DEC_2CEC'
State: 'In compliance'
Master HMC: 'e16hmc1' --> Master HMC name of EPCoD
Backup HMC: 'e16hmc3' --> Backup HMC name of EPCoD
Enterprise pool memory
Activated memory: '100' GB
Available memory: '100' GB
Unreturned memory: '0' GB
Enterprise pool processor
Activated CPU(s): '4'
Available CPU(s): '4'
Unreturned CPU(s): '0'
Used by: 'rar1m3-9117-MMD-1016AAP'
Activated memory: '0' GB
Unreturned memory: '0' GB
Activated CPU(s): '0' CPU(s)
Unreturned CPU(s): '0' CPU(s)
Used by: 'r1r9m1-9117-MMD-1038B9P'
Activated memory: '0' GB
Unreturned memory: '0' GB
Activated CPU(s): '0' CPU(s)
Unreturned CPU(s): '0' CPU(s)
Example 6-3 Show EPCoD HMC information with lscodpool through the HMC
hscroot@e16hmc1:~> lscodpool -p DEC_2CEC --level pool
name=DEC_2CEC,id=026F,state=In
compliance,sequence_num=41,master_mc_name=e16hmc1,master_mc_mtms=7042-CR5*06K0040,
backup_master_mc_name=e16hmc3,backup_master_mc_mtms=7042-CR5*06K0036,mobile_procs=
4,avail_mobile_procs=1,unreturned_mobile_procs=0,mobile_mem=102400,avail_mobile_me
m=60416,unreturned_mobile_mem=0
Before PowerHA SystemMirror acquires the resource from EPCoD or releases the resource
back to EPCoD, PowerHA tries to check whether the HMC is accessible by using the ping
command. So, AIX must be able to perform the resolution between the IP address and the
host name. You can use /etc/hosts, the DNS, or other technology to achieve resolution. For
example, on AIX, run ping e16hmc1 and ping e16hmc3 to check whether the resolution works.
If the HMCs are in the DNS configuration, configure these HMCs into PowerHA SystemMirror
by using their names, and not their IPs.
In Figure 6-24, before the application starts, PowerHA SystemMirror checks the current LPAR
processor mode. If it is dedicated, then two available CPUs are its target. If it is shared mode,
then 1.5 available CPUs and three available VPs are its target.
[Entry Fields]
* Application Controller Name AppController1
PowerHA SystemMirror gets the LPAR name from the uname -L command’s output and uses
this name to do DLPAR operations through the HMC. LPAR names of the LPAR hosting
cluster node are collected and persisted into HACMPdynresop so that this information is
always available.
Setting up SSH for password-less communication with the HMC requires that the user run
ssh-keygen on each LPAR node to generate a public and private key pair. The public key must
then be copied to the HMC’s public authorized keys file. Then, the ssh from the LPAR can
contact the HMC without you needing to type in a password. Example 6-4 shows an example
to set up HMC password-less communication.
# cd /.ssh/
# ls
id_rsa id_rsa.pub
# export MYKEY=`cat /.ssh/id_rsa.pub`
# ssh [email protected] mkauthkeys -a \"$MYKEY\"
The authenticity of host '172.16.15.42 (172.16.15.42)' can't be established.
RSA key fingerprint is b1:47:c8:ef:f1:82:84:cd:33:c2:57:a1:a0:b2:14:f0.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.16.15.42' (RSA) to the list of known hosts.
The stated minimum values of the resources must be available when an LPAR node starts. If
more resources are available in the free pool on the frame, an LPAR can allocate up to the
stated wanted values. During dynamic allocation operations, the system does not allow the
values for CPU and memory to go below the minimum or above the maximum amounts that
are specified for the LPAR.
In the planning stage, you must consider how many resources are needed to satisfy all the
RGs online carefully and set LPAR’s minimal and maximum parameters correctly.
Note: When you deal with EPCoD or On/Off CoD resources, it does not matter if there
is one or two frames. For case scenarios with EPCoD or On/Off CoD, you activate (for
On/Off) and acquire (for EPCoD), and modify the portion of code that deals with On/Off
activation and EPCoD acquisition.
While initially bringing the RG online, PowerHA SystemMirror must wait for all the resources
acquisition to complete before it can start the user’s application.
While performing a takeover (fallover to the next priority node, for example), PowerHA
SystemMirror tries to perform some operations (DLPAR or adjust CoD and EPCoD resource)
in parallel to the release of resources on the source node and the acquisition of resources on
target node if the user allows it in the tunables (the value of Force synchronous release of
DLPAR resources is No).
Table 6-15 on page 171 shows the testing results of the DLPAR operation. The result might
be different in other environments, particularly if the resource is being used.
There is one LPAR, its current running CPU resource size is 2C, and the running memory
resource size is 8 GB. The DLPAR operation includes add and remove.
2C and 8 GB 5.5 s 8s 6s 88 s (1 m 28 s)
8C and 32 GB 13 s 27 s 23 s 275 s (4 m 35 s)
For example, you configure one profile for an LPAR with 8 GB (minimum) and 40 GB
(wanted). When you activate this LPAR, the maximum pinned memory of ProbVue is set to 4
GB (10% of system running memory), as shown in Example 6-5.
From AIX 7.1 TL4 onward, the tunables are derived based on the available system memory.
MAX pinned memory is set to 10% of the system memory. It cannot be adjusted when you
restart the operating system or adjust the memory size with the DLPAR operation.
Now, if you want to reduce the memory 40 - 8 GB, run the following command:
chhwres -r mem -m r1r9m1-9117-MMD-1038B9P -o r -p ITSO_S2Node1 -q 32768
The command fails with the error that is shown in Example 6-6.
Example 6-6 Error information when you reduce the memory through the DLPAR
hscroot@e16hmc3:~> chhwres -r mem -m r1r9m1-9117-MMD-1038B9P -o r -p ITSO_S2Node1
-q 32768
HSCL2932 The dynamic removal of memory resources failed: The operating system
prevented all of the requested memory from being removed. Amount of memory
removed: 0 MB of 32768 MB. The detailed output of the OS operation follows:
0930-023 The DR operation could not be supported by one or more kernel extensions.
From AIX, the error report also generates some error information, as shown in Example 6-7
and Example 6-8.
Example 6-7 AIX error information when you reduce the memory through the DLPAR
47DCD753 1109140415 T S PROBEVUE DR: memory remove failed by ProbeVue rec
252D3145 1109140415 T S mem DR failed by reconfig handler
Description
DR: memory remove failed by ProbeVue reconfig handler
Probable Causes
Exceeded one or more ProbeVue Configuration Limits or other
Failure Causes
Max Pinned Memory For Probevue tunable would cross 40% limit
Recommended Actions
Reduce the Max Pinned Memory For Probevue tunable
Detail Data
DR Phase Name
PRE
Current System Physical Memory
42949672960 -->> This is 40 GB, which is the current running memory
size.
Memory that is requested to remove
34359738368 -->> This is 32 GB, which you want to remove.
ProbeVue Max Pinned Memory tunable value
4294967296 -->> This is 4 GB, which is current maximum pinned memory
for ProbeVue.
Set it to 3276 MB, which is less than 3276.8 (8 GB*40%). This change takes effect
immediately. But if you want this change to take effect after the next start, you need to run
/usr/sbin/bosboot -a before the restart.
If you do not want the ProbeVue component online, you can turn it off with the command
that is shown in Example 6-10.
This change takes effect immediately. But if you want this change to take effect after the
next start, you need to run /usr/sbin/bosboot -a before the restart.
First, you must configure some generic elements for the PowerHA SystemMirror cluster:
Cluster name
Nodes in the cluster
CAA repository disk
Shared VG
Application controller
Service IP
RG
Other user-defined contents, such as pre-event or post-event
For the resource releasing process, in some cases, PowerHA SystemMirror tries to return
EPCoD resources before doing the DLPAR remove operation from the LPAR, and this
generates unreturned resource on this server. This is an asynchronous process and is helpful
to speed up RG takeover. The unreturned resource is reclaimed after the DLPAR remove
operation is completed.
There are many reasons for success. The script immediately returns whether the applications
are not configured with optimal resources. The script also exits if there are already enough
resources allocated. Finally, the script exits when the entire process of acquisition succeeds.
In a shared processor partition, more operations must be done. For example, account for both
virtual CPUs and processing units instead of only a number of processors. To activate On/Off
CoD resources or acquire Enterprise Pool CoD resources, decimal processing units are
converted to integers and decimal gigabytes of memory should be converted to integers.
On shared processor pool partitions, the maximum pool size can be automatically adjusted, if
necessary and if authorized by the user.
6.6.1 Query
In the query step, PowerHA SystemMirror gets the information that is listed in the following
sections.
Tip: The lshwres commands are given in Table 6-16 as examples, but it is not necessary
for the user to run these commands. These commands are embedded in to the ROHA run
time, and run as part of the ROHA acquisition and release steps.
Figure 6-28 Describing the difference between available resources and free resources
Figure 6-29 Formula to calculate free resources of one central electronic complex
Note: You read the level of configured resources (configurable_sys_mem in the formula),
and you remove from that the level of reserved resources (sys_firmware_mem in the
formula), then you end up with the level of resources that is needed to run one started
partition.
Moreover, when computing the free processing units of a CEC, you consider the reserved
processing units of any used Shared Processor Pool (the reserved in the formula).
Tip: The lscod commands are given in Table 6-17 as examples, but it is not necessary for
the user to run these commands. These commands are embedded in to the ROHA run
time, and run as part of the ROHA acquisition and release steps.
Table 6-18 Get the EPCoD available resources from the HMC
Memory lscodpool -p <pool> --level pool -F avail_mobile_mem
Tip: The lscodpool commands are given in Table 6-18 as examples, but it is not necessary
for the user to run these commands. These commands are embedded in to the ROHA run
time, and run as part of the ROHA acquisition and release steps.
Note: If the execution of this command fails (either because the link is down or other
errors), after the last retry but before trying another HMC, PowerHA SystemMirror changes
the master HMC for its Enterprise Pool.
In Figure 6-30, case 2b is the normal case. The currently allocated resources level matches
the blue level, which is the level of resources for the application controllers currently running.
PowerHA SystemMirror adds the yellow amount to the blue amount.
But in some cases, where these two levels do not match, consider having a “start fresh”
policy. This policy performs a readjustment of the allocation to the exact needs of the currently
running application controllers that are added to the application controllers that are being
brought online (always provides an optimal amount of resources to application controllers).
Those alternative cases can occur when the user has manually released (case 2a) or
acquired (case 2c) resources.
In shared processor partitions, both virtual CPUs and processing units are computed. In
shared processor partitions that are part of a Shared Processor Pool, the need for
computation is checked against the PU/VP ratio and adjusted as needed. If it is less than
need, everything is fine and the process continues. If it is greater than needed, set the Adjust
SPP size if required tunable to No. The process stops and returns an error. Otherwise, it
raises a warning, changes the pools size to the new size, and goes on.
When the correct strategy is chosen, there are three types of resource allocations to be done:
1. Release on other CECs: You might need to release EPCoD resources on other CEC so
that these resources are made available on the local CEC.
2. Acquisition/Activation to the CEC: Resources can come from the Enterprise Pool CoD or
the On/Off CoD pools.
3. Allocation to the partition: Resources come from the CEC to the LPAR.
Figure 6-31 shows the identified policy in the resource acquisition process.
In shared processor partitions, PowerHA SystemMirror accounts for the minimum ratio of
assigned processing units to assigned virtual processors for the partition that is supported by
the CEC. In an IBM POWER6® server, the ratio is 0.1 and in an IBM POWER7® server, the
ratio is 0.05.
For example, if the current assigned processing unit in the partition is 0.6 and the current
assigned virtual processor is 6, and PowerHA SystemMirror acquires virtual processors, it
raises an error because it breaks the minimum ratio rule. The same occurs when PowerHA
SystemMirror releases the processing units. PowerHA SystemMirror must compare the
expected ratio to the configured ratio.
Tip: The chcodpool commands are given in Table 6-19 as examples, but it is not necessary
for the user to run these commands. These commands are embedded in to the ROHA run
time, and run as part of the ROHA acquisition and release steps.
Tip: The chcod commands are given in Table 6-20 as examples, but it is not necessary for
the user to run these commands. These commands are embedded in to the ROHA run
time, and run as part of the ROHA acquisition and release steps.
Note: For acquiring the Power Enterprise Pool and the On/Off CoD resources, every
amount of memory resources is expressed in MB but aligned in GB of memory (for
example, 1024 or 4096), and every number of processing units is aligned on the whole
upper integer.
All Power Enterprise Pool and On/Off CoD resources that are acquired are in the CEC’s
free pool, and these are automatically added to the target LPAR by using DLPAR.
Table 6-21 Assign resources from the server’s free pool to target LPAR
Dedicate Memory chhwres -m <cec> -p <lpar> -o a -r mem -q <mb_of_memory>
Shared CPU chhwres -m <cec> -p <lpar> -o a -r proc --procs <vp> --proc_units <pu>
Tip: The chhwres commands are given in Table 6-21 as examples, but it is not necessary
for the user to run these commands. These commands are embedded in to the ROHA run
time, and run as part of the ROHA acquisition and release steps.
For shared processor partitions in a Shared-Processors Pool that is not the default pool, it
might be necessary to adjust the maximum processing units of the Shared Processor Pool. To
do so, use the operation that is shown in Example 6-11, which uses the HMC chhwres
command. The enablement of this adjustment is authorized or not by a tunable.
Example 6-11 shows the command that PowerHA SystemMirror uses to change the
Shared-Processor Pool’s maximum processing units. You do not need to run this command.
6.7.1 Query
In the query step, PowerHA SystemMirror gets the information that is described in the
following sections for the compute step.
Table 6-22 Get On/Off active resources in this server from the HMC
Memory lscod -m <cec> -t cap -c onoff -r mem -F activated_onoff_mem
Tip: The lscod commands are given in Table 6-22 as examples, but it is not necessary for
the user to run these commands. These commands are embedded in to the ROHA run
time, and run as part of the ROHA acquisition and release steps.
Tip: The lscodpool commands are given in Table 6-23 as examples, but it is not necessary
for the user to run these commands. These commands are embedded in to the ROHA run
time, and run as part of the ROHA acquisition and release steps.
Resource computation
The level of resources to be left on the LPAR is computed by using the fit to remaining RGs
policy. What is above this level is released, and it accounts for the following information:
1. The configuration of the LPAR (minimum, current, and maximum amount of resources).
2. The optimal resources that are configured for the applications currently running on the
LPAR. PowerHA SystemMirror tries to fit to the level of remaining RGs running on the
node.
3. The optimal amount of resources of the stopping RGs because you do not de-allocate
more than this.
Table 6-24 Release resources from the LPAR to the CEC through the HMC
Dedicate memory chhwres -m <cec> -p <lpar> -o r -r mem -q <mb_of_memory>
Shared CPU chhwres -m <cec> -p <lpar> -o r -r proc --procs <vp> --proc_units <pu>
A timeout is given with the -w option and this timeout is set to the configured value at the
cluster level (DLPAR operations timeout) added with 1 minute per GB. So, for example, to
release 100 GB, if the default timeout value is set to 10 minutes, the timeout is set to 110
minutes (10 + 100).
For large memory releases, for example, instead of making one 100 GB release request,
make 10 requests of a 10 GB release. You can see the logs in the hacmp.out log file.
At release, the de-allocation order is reversed, On/Off CoD resources are preferably released,
preventing the user from paying for extra costs. Figure 6-34 shows the process.
This asynchronous process happens only under the following two conditions:
1. If there are only two nodes in the cluster and those two nodes are on different managed
systems, or if there are more than two nodes in the cluster and the operation is a move to
target node and the source node is on another managed system.
2. If you set the Force synchronous release of DLPAR resources as the default, which is No,
see 6.2.5, “Change/Show Default Cluster Tunable” on page 160.
When an unreturned resource is generated, a grace period timer starts for the unreturned
Mobile CoD resources on that server, and EPCoD is in Approaching out of compliance
(within server grace period) status. After the releasing operation completes physically on
the primary node, the unreturned resource is reclaimed automatically, and the EPCoD’s
status is changed back to In compliance.
Note: For more information about the Enterprise Pool’s status, see the IBM Knowledge
Center.
Releasing resources
This section describes the release resource concept.
By default, the release is asynchronous. This default behavior can be changed with a cluster
tunable.
For example, if one PowerHA SystemMirror cluster includes two nodes, the two nodes are
deployed on different servers and the two servers share one Power Enterprise Pool. In this
case, if you are keeping asynchronous mode, you can benefit from the RG move scenarios
because EPCoD’s unreturned resource feature and asynchronous release mode can reduce
takeover time.
During RG offline, operations to release resources to EPCoD pool can be done even if
physical resources are not free on the server at that time. The freed resources are added
back to the EPCoD pool as available resources immediately so that the backup partition can
use these resources to bring the RG online at once.
A history of what was allocated for the partition is kept in the AIX ODM object database, and
PowerHA SystemMirror uses it to release the same amount of resources at boot time.
Note: You do not need to start PowerHA SystemMirror service to activate this process after
an operating system restart because this operation is triggered by the
/usr/es/sbin/cluster/etc/rc.init script, which is in the /etc/inittab file.
6.8.1 Requirement
We have two IBM Power 770 D model servers, and they are in one Power Enterprise Pool.
We want to deploy one PowerHA SystemMirror cluster with two nodes that are in different
servers. We want the PowerHA SystemMirror cluster to manage the server’s free resources
and EPCoD mobile resource to automatically satisfy the application’s hardware requirements
before we start it.
There are two HMCs to manage the EPCoD named e16hmc1 and e16hmc3. Here, e16hmc1
is the master and e16hmc3 is the backup. There are two applications in this cluster and the
related resource requirement.
CAA Unicast
Primary disk: repdisk1
Backup disk: repdisk2
HMC configuration
There are two HMCs to add, as shown in Table 6-26 and Table 6-27.
Number of retries 2
Sites N/A
Number of retries 2
Sites N/A
Additionally, in /etc/hosts, there are resolution details between the HMC IP and the HMC
host name, as shown in Example 6-12.
I agree to use On/Off CoD and be billed for extra costs No (default)
I agree to use On/Off CoD and be billed for extra costs No (default)
Cluster-wide tunables
All the tunables use the default values, as shown in Table 6-30.
I agree to use On/Off CoD and be billed for extra costs No (default)
Perform the PowerHA SystemMirror Verify and Synchronize Cluster Configuration process
after finishing the previous configuration.
Section 6.6, “Introduction to resource acquisition” on page 175 introduces four steps for
PowerHA SystemMirror to acquire resources. In this case, the following section provides the
detailed description for the four steps.
Query step
PowerHA SystemMirror queries the server, the EPCoD, the LPARs, and the current RG
information. The data is shown in yellow in Figure 6-36.
Compute step
In this step, PowerHA SystemMirror computes how many resources to be added through
DLPAR. It needs 7C and 46 GB. The purple table shows the process in Figure 6-36. For
example:
The expected total CPU number is as follows: 1 (Min) + 2 (RG1 requires) + 6 (RG2
requires) + 0 (running RG requires, there is no running RG) = 9C.
Take this value to compare with LPAR’s profile needs less than or equal to the Maximum
and more than or equal to the Minimum value.
If the requirement is satisfied, and takes this value minus the current running CPU, 9 - 2 =
7, we get the CPU number to add through the DLPAR.
Note: During this process, PowerHA SystemMirror adds mobile resources from EPCoD to
the server’s free pool first, then adds all the free pool’s resources to the LPAR through
DLPAR. To describe the process clearly, the free pool means only the available resources
of one server before adding the EPCoD’s resources to it.
The orange tables (Figure 6-36 on page 199) show the result after the resource acquisition,
and include the LPAR’s running resource, EPCoD, and the server’s resource status.
Example 6-14 The hacmp.out log shows the resource acquisition process for example 1
# egrep "ROHALOG|Close session|Open session" /var/hacmp/log/hacmp.out
+RG1 RG2:clmanageroha[roha_session_open:162] roha_session_log 'Open session
Open session 22937664 at Sun Nov 8 09:11:39 CST 2015
INFO: acquisition is always synchronous.
=== HACMProhaparam ODM ====
--> Cluster-wide tunables display
ALWAYS_START_RG = 0
ADJUST_SPP_SIZE = 0
FORCE_SYNC_RELEASE = 0
AGREE_TO_COD_COSTS = 0
ONOFF_DAYS = 30
===========================
------------------+----------------+
HMC | Version |
------------------+----------------+
9.3.207.130 | V8R8.3.0.1 |
9.3.207.133 | V8R8.3.0.1 |
------------------+----------------+
------------------+----------------+----------------+
MANAGED SYSTEM | Memory (GB) | Proc Unit(s) |
------------------+----------------+----------------+
Name | rar1m3-9117-MMD-1016AAP | --> Server name
State | Operating |
Region Size | 0.25 | / |
VP/PU Ratio | / | 0.05 |
Installed | 192.00 | 12.00 |
Configurable | 52.00 | 8.00 |
Example 6-15 The update in the Resource Optimized High Availability report shows the resource
acquisition process for example 1
# clmgr view report roha
...
Managed System 'rar1m3-9117-MMD-1016AAP' --> this is P770D-01 server
Hardware resources of managed system
Installed: memory '192' GB processing units '12.00'
Configurable: memory '93' GB processing units '11.00'
Inactive: memory '99' GB processing units '1.00'
Available: memory '0' GB processing units '0.00'
...
In this case, we split this move into two parts: One is the RG offline at the primary node, and
the other is the RG online at the standby node.
Figure 6-37 Resource group offline procedure at the primary node during the resource group move
Query step
PowerHA SystemMirror queries the server, EPCoD, the LPARs, and the current RG
information. The data is shown in the yellow tables in Figure 6-37.
Compute step
In this step, PowerHA SystemMirror computes how many resources must be removed by
using the DLPAR. PowerHA SystemMirror needs 2C and 30 GB. The purple tables show the
process, as shown in Figure 6-37:
In this case, RG1 is released and RG2 is still running. PowerHA calculates how many
resources it can release based on whether RG2 has enough resources to run. So, the
formula is: 9 (current running) - 1 (Min) - 6 (RG2 still running) = 2C. Two CPUs can be
released.
PowerHA accounts for that sometimes you adjust your current running resources by using
a manual DLPAR operation. For example, you add some resources to satisfy another
application that was not started with PowerHA. To avoid removing this kind of resource,
PowerHA must check how many resources it allocated before.
Figure 6-38 HMC message shows that there are unreturned resources that are generated
Example 6-17 Displaying unreturned resources from the AIX command line
# clmgr view report roha
...
Enterprise pool 'DEC_2CEC'
State: 'Approaching out of compliance (within server grace period)'
Master HMC: 'e16hmc1'
Backup HMC: 'e16hmc3'
Enterprise pool memory
Activated memory: '100' GB
Available memory: '89' GB -->the 30 GB has been changed to EPCoD
available status
Unreturned memory: '30' GB -->the 30 GB is marked 'unreturned'
Enterprise pool processor
Activated CPU(s): '4'
Available CPU(s): '3' --> the 2CPU has been changed to EPCoD
available status
Unreturned CPU(s): '2' --> the 2CPU is marked 'unreturned'
Used by: 'rar1m3-9117-MMD-1016AAP' -->show unreturned resource from
server’s view
Activated memory: '11' GB
Unreturned memory: '30' GB
Activated CPU(s): '1' CPU(s)
Unreturned CPU(s): '2' CPU(s)
Used by: 'r1r9m1-9117-MMD-1038B9P'
Activated memory: '0' GB
Unreturned memory: '0' GB
Activated CPU(s): '0' CPU(s)
Unreturned CPU(s): '0' CPU(s)
From the HMC command line, you can see the unreturned resources that are generated, as
shown in Example 6-18.
Example 6-18 Showing the unreturned resources and the status from the HMC command line
hscroot@e16hmc1:~> lscodpool -p DEC_2CEC --level sys
name=rar1m3-9117-MMD-1016AAP,mtms=9117-MMD*1016AAP,mobile_procs=1,non_mobile_procs
=8,unreturned_mobile_procs=2,inactive_procs=1,installed_procs=12,mobile_mem=11264,
non_mobile_mem=53248,unreturned_mobile_mem=30720,inactive_mem=101376,installed_mem
=196608
name=r1r9m1-9117-MMD-1038B9P,mtms=9117-MMD*1038B9P,mobile_procs=0,non_mobile_procs
=16,unreturned_mobile_procs=0,inactive_procs=16,installed_procs=32,mobile_mem=0,no
n_mobile_mem=97280,unreturned_mobile_mem=0,inactive_mem=230400,installed_mem=32768
0
hscroot@e16hmc1:~> lscodpool -p DEC_2CEC --level pool
name=DEC_2CEC,id=026F,state=Approaching out of compliance (within server grace
period),sequence_num=41,master_mc_name=e16hmc1,master_mc_mtms=7042-CR5*06K0040,bac
kup_master_mc_name=e16hmc3,backup_master_mc_mtms=7042-CR5*06K0036,mobile_procs=4,a
vail_mobile_procs=3,unreturned_mobile_procs=2,mobile_mem=102400,avail_mobile_mem=9
1136,unreturned_mobile_mem=30720
When the DLPAR operation completes, the unreturned resource is reclaimed immediately,
and some messages are shown on the HMC (Figure 6-39). The Enterprise Pool’s status is
changed back to In compliance.
Figure 6-39 The unreturned resource is reclaimed after the DLPAR operation
You can see the changes from HMC command line, as shown in Example 6-19.
Example 6-19 Showing the unreturned resource that is reclaimed from the HMC command line
hscroot@e16hmc1:~> lscodpool -p DEC_2CEC --level sys
name=rar1m3-9117-MMD-1016AAP,mtms=9117-MMD*1016AAP,mobile_procs=1,non_mobile_procs=8,unretu
rned_mobile_procs=0,inactive_procs=3,installed_procs=12,mobile_mem=11264,non_mobile_mem=532
48,unreturned_mobile_mem=0,inactive_mem=132096,installed_mem=196608
name=r1r9m1-9117-MMD-1038B9P,mtms=9117-MMD*1038B9P,mobile_procs=0,non_mobile_procs=16,unret
urned_mobile_procs=0,inactive_procs=16,installed_procs=32,mobile_mem=0,non_mobile_mem=97280
,unreturned_mobile_mem=0,inactive_mem=230400,installed_mem=327680
hscroot@e16hmc1:~> lscodpool -p DEC_2CEC --level pool
name=DEC_2CEC,id=026F,state=In compliance,sequence_num=41,master_mc_name=e16hmc1,
master_mc_mtms=7042-CR5*06K0040,backup_master_mc_name=e16hmc3,backup_master_mc_mtms=7042-CR
5*06K0036,mobile_procs=4,avail_mobile_procs=3,unreturned_mobile_procs=0,mobile_mem=102400,a
vail_mobile_mem=91136,unreturned_mobile_mem=0
Example 6-20 The hacmp.out log file information about the resource group offline process
#egrep "ROHALOG|Close session|Open session" /var/hacmp/log/hacmp.out
...
===== Compute ROHA Memory =====
minimum + running = total <=> current <=> optimal <=> saved
8.00 + 40.00 = 48.00 <=> 78.00 <=> 30.00 <=> 46.00 : => 30.00 GB
============ End ==============
===== Compute ROHA CPU(s) =====
minimal + running = total <=> current <=> optimal <=> saved
1 + 6 = 7 <=> 9 <=> 2 <=> 7 : => 2 CPU(s)
============ End ==============
===== Identify ROHA Memory ====
Total Enterprise Pool memory to return back: 30.00 GB
Total On/Off CoD memory to de-activate: 0.00 GB
Total DLPAR memory to release: 30.00 GB
============ End ==============
=== Identify ROHA Processor ===
Total Enterprise Pool CPU(s) to return back: 2.00 CPU(s)
Total On/Off CoD CPU(s) to de-activate: 0.00 CPU(s)
Total DLPAR CPU(s) to release: 2.00 CPU(s)
============ End ==============
clhmccmd: 30.00 GB of Enterprise Pool CoD have been returned.
clhmccmd: 2 CPU(s) of Enterprise Pool CoD have been returned.
The following resources were released for application controllers App1Controller.
DLPAR memory: 30.00 GB On/Off CoD memory: 0.00 GB Enterprise Pool
memory: 30.00 GB.
DLPAR processor: 2.00 CPU(s) On/Off CoD processor: 0.00 CPU(s)
Enterprise Pool processor: 2.00 CPU(s)Close session 22937664 at Sun Nov 8
09:12:32 CST 2015
..
During the releasing process, the de-allocation order is EPCoD and then the local server’s
free pool. Because EPCoD is shared between different servers, the standby node on other
servers always needs this resource to bring the RG online in a takeover scenario.
Note: Before acquiring the process start, the 2C and 30 GB resources were available in
the Enterprise Pool, so this kind of resource can also be used by standby node.
This acquisition process differs from the scenario that is described in 6.9.1, “Bringing two
resource groups online” on page 198. The expected resources to add to the LPAR is 1C and
6 GB and the system’s free pool can satisfy it, so it does not need to acquire resources from
EPCoD.
Removing the resource (2C and 30 GB) from the LPAR to a free pool on the primary node
costs 257 seconds, from 10:52:51 to 10:57:08, but we are not concerned with this time
because it is an asynchronous process.
Example 6-21 The key time stamp in hacmp.out on the primary node (ITSO_S1Node1)
# egrep "EVENT START|EVENT COMPLETED" hacmp.out
Nov 8 10:52:27 EVENT START: external_resource_state_change ITSO_S2Node1
Nov 8 10:52:27 EVENT COMPLETED: external_resource_state_change ITSO_S2Node1 0
Nov 8 10:52:27 EVENT START: rg_move_release ITSO_S1Node1 1
Nov 8 10:52:27 EVENT START: rg_move ITSO_S1Node1 1 RELEASE
Nov 8 10:52:27 EVENT START: stop_server App1Controller
Nov 8 10:52:28 EVENT COMPLETED: stop_server App1Controller 0
Nov 8 10:52:53 EVENT START: release_service_addr
Nov 8 10:52:54 EVENT COMPLETED: release_service_addr 0
Nov 8 10:52:56 EVENT COMPLETED: rg_move ITSO_S1Node1 1 RELEASE 0
Nov 8 10:52:56 EVENT COMPLETED: rg_move_release ITSO_S1Node1 1 0
Nov 8 10:52:58 EVENT START: rg_move_fence ITSO_S1Node1 1
Nov 8 10:52:58 EVENT COMPLETED: rg_move_fence ITSO_S1Node1 1 0
Nov 8 10:53:00 EVENT START: rg_move_fence ITSO_S1Node1 1
Nov 8 10:53:00 EVENT COMPLETED: rg_move_fence ITSO_S1Node1 1 0
Nov 8 10:53:00 EVENT START: rg_move_acquire ITSO_S1Node1 1
Nov 8 10:53:00 EVENT START: rg_move ITSO_S1Node1 1 ACQUIRE
Nov 8 10:53:00 EVENT COMPLETED: rg_move ITSO_S1Node1 1 ACQUIRE 0
Nov 8 10:53:00 EVENT COMPLETED: rg_move_acquire ITSO_S1Node1 1 0
Nov 8 10:53:18 EVENT START: rg_move_complete ITSO_S1Node1 1
Nov 8 10:53:19 EVENT COMPLETED: rg_move_complete ITSO_S1Node1 1 0
Nov 8 10:53:50 EVENT START: external_resource_state_change_complete ITSO_S2Node1
Nov 8 10:53:50 EVENT COMPLETED: external_resource_state_change_complete
ITSO_S2Node1 0
Example 6-23 The key time stamp in hacmp.out on the standby node (ITSO_S1Node1)
#egrep "EVENT START|EVENT COMPLETED" hacmp.out
Nov 8 10:52:24 EVENT START: rg_move_release ITSO_S1Node1 1
Nov 8 10:52:24 EVENT START: rg_move ITSO_S1Node1 1 RELEASE
Nov 8 10:52:25 EVENT COMPLETED: rg_move ITSO_S1Node1 1 RELEASE 0
Nov 8 10:52:25 EVENT COMPLETED: rg_move_release ITSO_S1Node1 1 0
Nov 8 10:52:55 EVENT START: rg_move_fence ITSO_S1Node1 1
Nov 8 10:52:55 EVENT COMPLETED: rg_move_fence ITSO_S1Node1 1 0
Nov 8 10:52:57 EVENT START: rg_move_fence ITSO_S1Node1 1
Nov 8 10:52:57 EVENT COMPLETED: rg_move_fence ITSO_S1Node1 1 0
Nov 8 10:52:57 EVENT START: rg_move_acquire ITSO_S1Node1 1
Nov 8 10:52:57 EVENT START: rg_move ITSO_S1Node1 1 ACQUIRE
Nov 8 10:52:57 EVENT START: acquire_takeover_addr
Nov 8 10:52:58 EVENT COMPLETED: acquire_takeover_addr 0
Nov 8 10:53:15 EVENT COMPLETED: rg_move ITSO_S1Node1 1 ACQUIRE 0
Nov 8 10:53:15 EVENT COMPLETED: rg_move_acquire ITSO_S1Node1 1 0
Nov 8 10:53:15 EVENT START: rg_move_complete ITSO_S1Node1 1
Nov 8 10:53:43 EVENT START: start_server App1Controller
Nov 8 10:53:43 EVENT COMPLETED: start_server App1Controller 0
Nov 8 10:53:45 EVENT COMPLETED: rg_move_complete ITSO_S1Node1 1 0
Nov 8 10:53:47 EVENT START: external_resource_state_change_complete ITSO_S2Node1
Nov 8 10:53:47 EVENT COMPLETED: external_resource_state_change_complete
ITSO_S2Node1 0
6.9.3 Restarting with the current configuration after the primary node crashes
This case introduces the Automatic Release After a Failure (ARAF) process. We simulate a
primary node that crashed immediately. We do not describe how the RG is online on standby
node; we describe only what PowerHA SystemMirror does after the primary node restarts.
Assume that we activate this node with the current configuration, which means that this LPAR
still can hold the same amount of resources as before the crash.
As described in 6.7.3, “Automatic resource release process after an operating system crash”
on page 191, after the primary node restarts, the /usr/es/sbin/cluster/etc/rc.init script
is triggered by /etc/inittab and performs the resource releasing operation.
The process is similar to “Resource group offline at the primary node (ITSO_S1Node1)” on
page 206. In this process, PowerHA SystemMirror tries to release all the resources that were
held by the two RGs before.
Testing summary
If a resource was not released because of a PowerHA SystemMirror service crash or an AIX
operating system crash, PowerHA SystemMirror can do the release operation automatically
after this node starts. This operation occurs before you start the PowerHA SystemMirror
service through the smitty clstart or the clmgr start cluster commands.
6.10.1 Requirements
We have two Power 770 D model servers in one Power Enterprise Pool, and each server has
an On/Off CoD license. We want to deploy one PowerHA SystemMirror cluster, and include
two nodes that are in different servers. We want the PowerHA SystemMirror cluster to
manage the server’s free resources, EPCoD mobile resources, and On/Off CoD resources
automatically to satisfy the application’s hardware requirement before starting it.
There are two HMCs to manage the EPCoD that are named e16hmc1 and e16hmc3. Here,
e16hmc1 is the master and e16hmc3 is the backup. There are two applications in this cluster
and related resource requirements.
If set to 30, for example, it means that we want to activate the resources for 30 days, so the
tunable allocates 20 GB of memory only, and so we have 20 GB On/Off CoD only, even if we
have 600 GB.Days available.
I agree to use On/Off CoD and be billed for extra costs Yes
I agree to use On/Off CoD and be billed for extra costs Yes
Cluster-wide tunables
All the tunables are at the default value, as shown in Table 6-33.
I agree to use On/Off CoD and be billed for extra costs Yes
This configuration requires that you perform a Verify and Synchronize Cluster Configuration
after changing the previous configuration.
Example 6-24 Shows Resource Optimized High Availability data with the clmgr view report roha command
# clmgr view report roha
Cluster: ITSO_ROHA_cluster of NSC type --> NSC means No Site Cluster
Cluster tunables --> Following is the cluster tunables
Dynamic LPAR
Start Resource Groups even if resources are insufficient: '0'
Adjust Shared Processor Pool size if required: '0'
Force synchronous release of DLPAR resources: '0'
On/Off CoD
I agree to use On/Off CoD and be billed for extra costs: '1'
Number of activating days for On/Off CoD requests: '30'
Node: ITSO_S1Node1 --> Information of ITSO_S1Node1 node
HMC(s): 9.3.207.130 9.3.207.133
Managed system: rar1m3-9117-MMD-1016AAP
LPAR: ITSO_S1Node1
Current profile: 'ITSO_profile'
Memory (GB): minimum '8' desired '32' current '32' maximum
'160'
Processing mode: Shared
Shared processor pool: 'DefaultPool'
Processing units: minimum '0.5' desired '1.5' current '1.5' maximum
'9.0'
Virtual processors: minimum '1' desired '3' current '3' maximum '18'
ROHA provisioning for resource groups
No ROHA provisioning.
Section 6.6, “Introduction to resource acquisition” on page 175 introduces four steps for
PowerHA SystemMirror to acquire the resources. In this case, the following sections are the
detailed descriptions of the four steps.
Query step
PowerHA SystemMirror queries the server, EPCoD, the On/Off CoD, the LPARs, and the
current RG information. The data is shown in the yellow tables in Figure 6-43.
For the On/Off CoD resources, we do not display the available resources because there are
enough resources in our testing environment:
P770D-01 has 9959 CPU.days and 9917 GB.days.
P770D-02 has 9976 CPU.days and 9889 GB.days.
PowerHA SystemMirror gets the remaining 5 GB of this server, all 100 GB from EPCoD, and
21 GB from the On/Off CoD. The process is shown in the green table in Figure 6-43 on
page 220.
Note: During this process, PowerHA SystemMirror adds mobile resources from EPCoD to
the server’s free pool first, then adds all the free pool’s resources to the LPAR through
DLPAR. To describe this clearly, the free pool means the available resources of only one
server before adding the EPCoD’s resources to it.
The orange table shows (Figure 6-43 on page 220) the result of this scenario, including the
LPAR’s running resources, EPCoD, On/Off CoD, and the server’s resource status.
Example 6-25 The hacmp.out log shows the resource acquisition of example 2
===== Compute ROHA Memory =====
minimal + optimal + running = total <=> current <=> maximum
8.00 + 150.00 + 0.00 = 158.00 <=> 32.00 <=> 160.00 : => 126.00 GB
============ End ==============
=== Compute ROHA PU(s)/VP(s) ==
minimal + optimal + running = total <=> current <=> maximum
1 + 16 + 0 = 17 <=> 3 <=> 18 : => 14 Virtual
Processor(s)
minimal + optimal + running = total <=> current <=> maximum
0.50 + 8.00 + 0.00 = 8.50 <=> 1.50 <=> 9.00 : => 7.00 Processing
Unit(s)
============ End ==============
===== Identify ROHA Memory ====
Remaining available memory for partition: 5.00 GB
Total Enterprise Pool memory to allocate: 100.00 GB
Total Enterprise Pool memory to yank: 0.00 GB
Example 6-26 Resource Optimized High Availability data after acquiring resources in example 2
# clmgr view report roha
Cluster: ITSO_ROHA_cluster of NSC type
Cluster tunables
Dynamic LPAR
Start Resource Groups even if resources are insufficient: '0'
Adjust Shared Processor Pool size if required: '0'
Force synchronous release of DLPAR resources: '0'
On/Off CoD
I agree to use On/Off CoD and be billed for extra costs: '1'
Number of activating days for On/Off CoD requests: '30'
Node: ITSO_S1Node1
HMC(s): 9.3.207.130 9.3.207.133
Managed system: rar1m3-9117-MMD-1016AAP
LPAR: ITSO_S1Node1
Current profile: 'ITSO_profile'
Memory (GB): minimum '8' desired '32' current
'158' maximum '160'
Processing mode: Shared
Shared processor pool: 'DefaultPool'
Processing units: minimum '0.5' desired '1.5' current
'8.5' maximum '9.0'
Virtual processors: minimum '1' desired '3' current '17'
maximum '18'
ROHA provisioning for 'ONLINE' resource groups
No ROHA provisioning.
...
The clmgr view report roha command output (Example 6-26 on page 222) has some
updates about the resources of P770D-01, Enterprise Pool, and On/Off CoD.
After the RG is online, the status of the On/Off CoD resource is shown in Example 6-28.
For memory, PowerHA SystemMirror assigns 21 GB and the activation day is 30 days, so the
total is 630 GB.Day. (21*30=630), and the remaining available GB.Day in On/Off CoD is 9277
(9907 - 630 = 9277).
The process is similar to the one that is shown in 6.9.2, “Moving one resource group to
another node” on page 205.
In the release process, the de-allocation order is On/Off CoD, then EPCoD, and then the
server’s free pool because you always need to pay an extra cost for the On/Off CoD.
Example 6-29 The hacmp.out log information in the release process of example 2
===== Compute ROHA Memory =====
minimum + running = total <=> current <=> optimal <=> saved
8.00 + 80.00 = 88.00 <=> 158.00 <=> 70.00 <=> 126.00 : => 70.00 GB
============ End ==============
=== Compute ROHA PU(s)/VP(s) ==
minimal + running = total <=> current <=> optimal <=> saved
1 + 9 = 10 <=> 17 <=> 7 <=> 14 : => 7 Virtual
Processor(s)
minimal + running = total <=> current <=> optimal <=> saved
0.50 + 4.50 = 5.00 <=> 8.50 <=> 3.50 <=> 7.00 : => 3.50
Processing Unit(s)
============ End ==============
===== Identify ROHA Memory ====
Total Enterprise Pool memory to return back: 49.00 GB
Total On/Off CoD memory to de-activate: 21.00 GB
Total DLPAR memory to release: 70.00 GB
============ End ==============
=== Identify ROHA Processor ===
Total Enterprise Pool CPU(s) to return back: 1.00 CPU(s)
Total On/Off CoD CPU(s) to de-activate: 3.00 CPU(s)
Total DLPAR PU(s)/VP(s) to release: 7.00 Virtual Processor(s) and
3.50 Processing Unit(s)
============ End ==============
clhmccmd: 49.00 GB of Enterprise Pool CoD have been returned.
clhmccmd: 1 CPU(s) of Enterprise Pool CoD have been returned.
The following resources were released for application controllers App1Controller.
DLPAR memory: 70.00 GB On/Off CoD memory: 21.00 GB Enterprise Pool
memory: 49.00 GB.
DLPAR processor: 3.50 PU/7.00 VP On/Off CoD processor: 3.00 CPU(s)
Enterprise Pool processor: 1.00 CPU(s)
This section describes the mechanism that enables the HMC to switch from one HMC to
another HMC.
Suppose that you have, for a given node, three HMCs in the following order: HMC1, HMC2,
and HMC3. (These HMCs can be set either at the node level, at the site level, or at the cluster
level. What counts at the end is that you have an ordered list of HMCs for a given node).
A given node uses the first HMC in its list, for example, HMC1, and uses it while it works.
If the first HMC is not usable (HMC1), you are currently using the second HMC in the list
(HMC2), which helps prevent the ROHA function from trying again and failing again by using
the first HMC (HMC1). You can add (persistence) into the ODM for which HMC is being used
(for example, HMC2).
6.12.1 Switching to the backup HMC for the Power Enterprise Pool
For Enterprise Pool operations, querying operations can be run on the master or backup
HMC, but changing operations must run on the master HMC. If the master HMC fails, the
PowerHA SystemMirror actions are as follows:
For querying operations, PowerHA SystemMirror tries to switch to the backup HMC to
continue the operation, but does not set the backup HMC as the master.
For changing operations, PowerHA SystemMirror tries to set the backup HMC as the
master, and then continues the operation. Example 6-30 shows the command that
PowerHA SystemMirror performs to set the backup HMC as the master. This command is
triggered by PowerHA SystemMirror automatically.
There are some prerequisites in PowerHA SystemMirror before switching to the backup HMC
when the master HMC fails:
Configure the master HMC and the backup HMC for your Power Enterprise Pool.
For more information about how to configure the backup HMC for the Power Enterprise
Pool, see the IBM Knowledge Center and Power Enterprise Pools on IBM Power Systems,
REDP-5101.
Both HMCs are configured in PowerHA SystemMirror.
Establish password-less communication between the PowerHA SystemMirror nodes to the
two HMCs.
Ensure reachability (pingable) from PowerHA SystemMirror nodes to the master and
backup HMCs.
Ensure that all of the servers that participate in the pool are connected to the two HMCs.
Ensure that the participating servers are in either the Standby state or the Operating state.
There are two HMCs to manage the EPCoD, named HMCEP1 and HMCEP2.
HMCEP1 is the master and HMCEP2 is the backup, as shown in Example 6-31.
In the AIX /etc/hosts file, define the resolution between the HMC IP address, and the HMC’s
host name, as shown in Example 6-32.
Example 6-32 Define the resolution between the HMC IP and HMC name in /etc/hosts
172.16.50.129 P780_09_Lpar1
172.16.50.130 P780_10_Lpar1
172.16.51.129 testservice1
172.16.51.130 testservice2
172.16.50.253 HMCEP1
172.16.50.254 HMCEP2
Figure 6-46 Resource Acquisition process during the start of the PowerHA SystemMirror service
In this process, HMCEP1 acts as the primary HMC and does all the query and resource
acquisition operations. Example 6-33 and Example 6-34 on page 232 show the detailed
commands that are used in the acquisition step.
Note: We do not display the DLPAR and EPCoD operations in the query step in the
previous examples.
6.13.2 Bringing one resource group offline when the primary HMC fails
After the RG is online, we bring the RG offline. During this process, we shut down HMCEP1 to
see how PowerHA SystemMirror handles this situation.
Query step
In this step, PowerHA SystemMirror must query the server’s data and the EPCoD’s data.
To get the server’s information, PowerHA SystemMirror uses the default primary HMC
(172.16.50.253, HMCEP1). At first, HMCEP1 is alive, and the operation succeeds. But after
the HMCEP1 shutdown, the operation fails and PowerHA SystemMirror uses 172.16.50.254
as the primary HMC to continue. Example 6-35 shows the takeover process.
Compute step
This step does not require an HMC operation. For more information, see Figure 6-47 on
page 232.
At this time, PowerHA SystemMirror checks whether the master HMC is available. If not, it
switches to the backup HMC automatically. Example 6-36 shows the detailed process.
Example 6-36 The EPCoD master and backup HMC switch process
+testRG2:clhmccmd[clhmcexec:3388] cmd='chcodpool -p 0019 -m SVRP7780-09-SN060C0AT
-r mem -o remove -q 23552 --force'
-->PowerHA SystemMirror try to do chcodpool operation
...
+testRG2:clhmccmd[clhmcexec:3401] : If working on an EPCoD Operation, we need
master
-->PowerHA SystemMirror want to check whether master HMC is accessible
...
ctionAttempts=3 -o TCPKeepAlive=no $'[email protected] \'lscodpool -p 0019
--level pool -F master_mc_name:backup_master_mc_name 2>&1\''
+testRG2:clhmccmd[clhmcexec:1] ssh -o StrictHostKeyChecking=no -o LogLevel=quiet
-o AddressFamily=any -o BatchMode=yes -o ConnectTimeout=3 -o ConnectionAttempts=3
-o TCPKeepAlive=no [email protected] 'lscodpool -p 0019 --level pool -F
master_mc_name:backup_master_mc_name 2>&1'
+testRG2:clhmccmd[clhmcexec:1] LC_ALL=C
+testRG2:clhmccmd[clhmcexec:3415] res=HMCEP1:HMCEP2
-->Current HMC is 172.16.50.254, so PowerHA SystemMirror query current master and
backup HMC name from it. At this time, HMCEP1 is master and HMCEP2 is backup.
...
+testRG2:clhmccmd[clhmcexec:3512] ping -c 1 -w 3 HMCEP1
+testRG2:clhmccmd[clhmcexec:3512] 1> /dev/null 2>& 1
+testRG2:clhmccmd[clhmcexec:3512] ping_output=''
+testRG2:clhmccmd[clhmcexec:3513] ping_rc=1
+testRG2:clhmccmd[clhmcexec:3514] (( 1 > 0 ))
+testRG2:clhmccmd[clhmcexec:3516] : Cannot contact this HMC. Ask following HMC in
list.
+testRG2:clhmccmd[clhmcexec:3518] dspmsg scripts.cat -s 38 500 '%1$s: WARNING:
unable to ping HMC at address %2$s.\n' clhmccmd HMCEP1
-->PowerHA SystemMirror try to ping HMCEP1, but fails
...
+testRG2:clhmccmd[clhmcexec:3510] : Try to ping the HMC at address HMCEP2.
+testRG2:clhmccmd[clhmcexec:3512] ping -c 1 -w 3 HMCEP2
+testRG2:clhmccmd[clhmcexec:3512] 1> /dev/null 2>& 1
+testRG2:clhmccmd[clhmcexec:3512] ping_output=''
+testRG2:clhmccmd[clhmcexec:3513] ping_rc=0
+testRG2:clhmccmd[clhmcexec:3514] (( 0 > 0 ))
-->PowerHA SystemMirror try to verify HMCEP2 and it is available
Example 6-37 shows the update that is performed from the EPCoD view.
Example 6-38 EPCoD status that is restored after the DLPAR operation completes
hscroot@HMCEP1:~> lscodpool -p 0019 --level pool
name=0019,id=0019,state=In
compliance,sequence_num=5,master_mc_name=HMCEP1,master_mc_mtms=V017-ffe*d33e8a1,ba
ckup_master_mc_name=HMCEP2,backup_master_mc_mtms=V017-f93*ba3e3aa,mobile_procs=64,
avail_mobile_procs=64,unreturned_mobile_procs=0,mobile_mem=2048000,avail_mobile_me
m=2048000,unreturned_mobile_mem=0
HMC configuration
The following examples show how to configure HMC with the clmgr command.
Query/Add/Modify/Delete
Example 6-39 shows how to query, add, modify, and delete HMC with the clmgr command.
Example 6-40 Query/Modify node with a list of associated HMCs with the clmgr command
# clmgr query node -h
clmgr query node {<node>|LOCAL}[,<node#2>,...]
Example 6-41 Query/Modify a site with a list of the associated HMCs with the clmgr command
# clmgr query site -h
clmgr query site [<site> [,<site#2>,...]]
Example 6-43 Hardware resource provisioning configuration with the clmgr command
# clmgr query cod -h
clmgr query cod [<APP>[,<APP#2>,...]]
The resource allocation order specifies the order in which resources are allocated. The
resources are released in the reverse order in which they are allocated. The default value
for this field is Free Pool First.
Select Free Pool First to acquire resources from the free pool. If the amount of resources
in the free pool is insufficient, PowerHA SystemMirror first requests more resources from
the Enterprise pool and then from the CoD pool.
Select Enterprise Pool First to acquire the resources from the Enterprise pool. If the
amount of resources in the CoD pool is insufficient, PowerHA SystemMirror first requests
more resources from the free pool and then from the CoD pool.
The new configuration is not reflected until the next event that causes the application (hence
the RG) to be released and reacquired on another node. A change in the resource
requirements for CPUs, memory, or both does not cause the recalculation of the DLPAR
resources. PowerHA SystemMirror does not stop and restart the application controllers solely
for the purpose of making the application provisioning changes.
If another dynamic reconfiguration change causes the RGs to be released and reacquired,
the new resource requirements for DLPAR and CoD are used at the end of this dynamic
reconfiguration event.
There is an example of the report in 6.10.4, “Showing the Resource Optimized High
Availability configuration” on page 217.
Log files
There are several log files that you can use to track the ROHA operation process.
You can get detailed information to help you identify the errors’ root causes from the
clverify.log and the ver_collect_dlpar.log files, as shown in Example 6-46.
PowerHA SystemMirror simulates the resource acquisition process based on the current
configuration and generates the log in the ver_collect_dlpar.log file.
You can identify the root cause of the failure by using this information.
HMC commands
You can use the following commands on the HMC to do some monitor or maintenance. For a
detailed description of the commands, see the man page for the HMC.
GLVM for PowerHA SystemMirror Enterprise Edition provides automated disaster recovery
capability by using the AIX Logical Volume Manager (LVM) and GLVM subsystems to create
volume groups (VGs) and logical volumes that span across two geographically separated
sites.
You can use the GLVM technology as a stand-alone method, or use it in combination with
PowerHA SystemMirror Enterprise Edition.
The software increases data availability by providing continuing service during hardware or
software outages (or both), planned or unplanned, for a two-site cluster. The distance
between sites can be unlimited, and both sites can access the mirrored VGs serially over
IP-based networks.
Also, it enables your business application to continue running at the takeover system at a
remote site while the failed system is recovering from a disaster or a planned outage.
The software takes advantage of the following software components to reduce downtime and
recovery time during disaster recovery:
AIX LVM subsystem and GLVM
TCP/IP subsystem
PowerHA SystemMirror for AIX cluster management
GLVM example
Figure 7-1 on page 249 shows a relatively basic two-site GLVM implementation. It consists of
only one node at each site, although PowerHA does support multiple nodes within a site.
The New York site is considered the primary site because its node primarily hosts RPV
clients. The Texas site is the standby site because it primarily hosts RPV servers. However,
each site contains both RPV servers and clients based on where the resources are running.
Given this information, the GLVM wizard configures all of the following items:
GMVGs.
RPV servers.
RPV clients.
Mirror pools.
Resource Group.
Synchronizes the cluster.
This process can be used for the first GMVG, but additional GMVGs must be manually
created and added into an RG.
7.2 Prerequisites
Before you use the GLVM wizard, complete the following prerequisites:
Additional filesets from the PowerHA SystemMirror Enterprise Edition media:
– cluster.xd.base
– cluster.xd.glvm
– cluster.xd.license
– glvm.rpv.client
– glvm.rpv.server
A linked cluster is configured with sites.
A repository disk is defined at each site.
The verification and synchronization process completes successfully on the cluster.
XD_data networks with persistent IP labels are defined on the cluster.
The network communication between the local site and remote site is working.
All PowerHA SystemMirror services are active on both nodes in the cluster.
The /etc/hosts file on both sites contains all of the host IP, service IP, and persistent IP
labels that you want to use in the GLVM configuration.
The remote site must have enough free disks and enough free space on those disks to
support all of the local site VGs that are created for geographical mirroring.
#cllsif
Adapter Type Network Net Type Attribute Node
Jess boot net_ether_01 ether public Jess
Jess_glvm boot net_ether_02 XD_data public Jess
Jess_glvm boot net_ether_02 XD_data public Jess
Ellie boot net_ether_01 ether public Ellie
Ellie_glvm boot net_ether_02 XD_data public Ellie
Ellie_glvm_pers persistent net_ether_02 XD_data public Ellie
Run smitty sysmirror and select Cluster Applications and Resources → Make
Applications Highly Available (Use Smart Assists) → GLVM Configuration Assistant →
Configure Asynchronous GMVG.
The menu that is shown in Figure 7-3 opens. If not, then the previously mentioned
prerequisites have not been met, and you see a similar message as shown in Figure 7-4 on
page 253.
[Entry Fields]
* Enter the name of the VG [syncglvm]
* Select disks to be mirrored from the local site (00f92db138ef5aee) +
* Select disks to be mirrored from the remote site (00f92db138df5181) +
Define at least one node, and ideally all nodes, prior to defining
the repository disk/disks and cluster IP address. It is important that all
nodes in the cluster have access to the repository disk or respective
repository disks(in case of a linked cluster) and can be reached via the
cluster IP addresses, therefore you should define the nodes in the cluster
first
Figure 7-4 Synchronous GLVM prerequisites not met
Node Jess uses local disk hdisk9, and node Ellie uses local disk hdisk3 for the GMVG. Each
one is associated with a rpvserver, which in turn is linked to their respective rpvclients. The
rpvclients become hdisk1 on Jess and hdisk0 on Ellie, as shown in Figure 7-2 on page 251.
The rpvclients acquire these disk names because they are the first hdisk names that are
available on each node. The output from running the synchronous GLVM wizard is shown in
Example 7-3.
Creating VG syncglvmvg
Setting attributes for 0516-1804 chvg: The quorum change takes effect immediately.
Importing the VG
Setting attributes for 0516-1804 chvg: The quorum change takes effect immediately.
Retrieving data from available cluster nodes. This could take a few minutes.
WARNING: There are IP labels known to PowerHA SystemMirror and not listed in file
/usr/es/sbin/cluster/etc/clhosts.client on node: Jess. Clverify can automat
ically populate this file to be used on a client node, if executed in auto-corre
ctive mode.
WARNING: There are IP labels known to PowerHA SystemMirror and not listed in file
/usr/es/sbin/cluster/etc/clhosts.client on node: Ellie. Clverify can automati
cally populate this file to be used on a client node, if executed in auto-correc
tive mode.
WARNING: An XD_data network has been defined, but no additional
XD heartbeat network is defined. It is strongly recommended that
an XD_ip network be configured in order to help prevent
cluster partitioning if the XD_data network fails. Cluster partitioning
may lead to data corruption for your replicated resources.
Completed 30 percent of the verification checks
This cluster uses Unicast heartbeat
Completed 40 percent of the verification checks
Completed 50 percent of the verification checks
Completed 60 percent of the verification checks
Completed 70 percent of the verification checks
Verifying XD Solutions...
Jess# lsrpvclient -H
# RPV Client Physical Volume Identifier Remote Site
# -----------------------------------------------------------
hdisk1 00f92db138df5181 Chicago
Jess# lsrpvserver -H
# RPV Server Physical Volume Identifier Physical Volume
# -----------------------------------------------------------
rpvserver0 00f92db138ef5aee hdisk9
Jess# gmvgstat
GMVG Name PVs RPVs Tot Vols St Vols Total PPs Stale PPs Sync
--------------- ---- ---- -------- -------- ---------- ---------- ----
syncglvmvg 1 1 2 0 2542 0 100%
Ellie#lsvg syncglvmvg
0516-010 : Volume group must be varied on; use varyonvg command.
#
Ellie# lsmp -A syncglvmvg
0516-010 lsmp: Volume group must be varied on; use varyonvg command.
Ellie# lsrpvserver -H
# RPV Server Physical Volume Identifier Physical Volumer
# -----------------------------------------------------------
rpvserver0 00f92db138df5181 hdisk3
# lsrpvclient -H
# RPV Client Physical Volume Identifier Remote Site
# -----------------------------------------------------------
hdisk0 00f92db138ef5aee Unknown
# gmvgstat
GMVG Name PVs RPVs Tot Vols St Vols Total PPs Stale PPs Sync
--------------- ---- ---- -------- -------- ---------- ---------- ----
Important: This procedure does not configure the GMVG on the remote node; that action
must be done manually.
When creating logical volumes, ensure that two copies are created wit the superstrict
allocation policy and the mirror pools. This should be completed on the node in which the
GMVG is active. In our case, it is node Jess. An example of creating a mirrored logical volume
by running smitty mklv is shown in Example 7-7. Repeat as needed for every logical volume,
and add any file systems that use the logical volumes, if applicable.
After all logical volumes are created, it is necessary to take the VG offline on the primary
node, and then reimport the VG on the standby node by performing the following steps:
On primary node Jess:
a. Deactivate the GMVG by running varyoffvg syncglvmvg.
b. Deactivate the rpvclient, hdisk1 by running rmdev -l hdisk1.
c. Activate the rpvserver, rpvserver0 by running mkdev -l rpvserver0.
On standby node Ellie:
a. Deactivate the rpvserver, rpvserver0 by running rmdev -l rpvserver0.
b. Activate rpvclient, hdisk0 by running mkdev -l hdisk0.
c. Import the new VG information by running importvg -L syncglvmvg hdisk0.
d. Activate the VG by running varyonvg syncglvmg.
e. Verify the GMVG information by running lsvg -l syncglvmg.
After you are satisfied that the GMVG information is correct, reverse these procedures to
return the GMVG back to the primary node as follows:
On standby node Ellie:
a. Deactivate the VG by running varyoffvg syncglvmg.
b. Deactivate the rpvclient, hdisk0 by running rmdev -l hdisk0.
c. Activate the rpvserver by running mkdev -l rpvserver0.
On primary node Jess:
a. Deactivate the rpvserver, rpvserver0 by running rmdev -l rpvserver0.
b. Activate the rpvclient, hdisk1 by running mkdev -l hdisk1.
c. Activate the GMVG by running varyonvg syncglvmvg.
Run a cluster verification, and if there are no errors, then the cluster can be tested.
To begin, run smitty sysmirror and select Cluster Applications and Resources → Make
Applications Highly Available (Use Smart Assists) → GLVM Configuration Assistant →
Configure Synchronous GMVG.
The menu that is shown in Figure 7-5 on page 261 opens. If not, then the previously
mentioned prerequisites have not been met, and you see a similar message as shown in
Figure 7-6 on page 261.
[Entry Fields]
* Enter the name of the VG [asyncglvm]
* Select disks to be mirrored from the local site (00f92db138ef5aee) +
* Select disks to be mirrored from the remote site (00f92db138df5181) +
* Enter the size of the ASYNC cache [2] #
Figure 7-5 Asynchronous GLVM wizard menu
COMMAND STATUS
Define at least one node, and ideally all nodes, prior to defining
the repository disk/disks and cluster IP address. It is important that all
nodes in the cluster have access to the repository disk or respective
repository disks(in case of a linked cluster) and can be reached via the
cluster IP addresses, therefore you should define the nodes in the cluster
first
Figure 7-6 Async GLVM prerequisites not met
Creating VG asyncglvmvg
Setting attributes for 0516-1804 chvg: The quorum change takes effect immediately.
Setting attributes for 0516-1804 chvg: The quorum change takes effect immediately.
Retrieving data from available cluster nodes. This could take a few minutes.
WARNING: There are IP labels known to PowerHA SystemMirror and not listed in file
/usr/es/sbin/cluster/etc/clhosts.client on node: Jess. Clverify can automat
ically populate this file to be used on a client node, if executed in auto-corre
ctive mode.
WARNING: There are IP labels known to PowerHA SystemMirror and not listed in file
/usr/es/sbin/cluster/etc/clhosts.client on node: Ellie. Clverify can automati
cally populate this file to be used on a client node, if executed in auto-correc
tive mode.
WARNING: An XD_data network has been defined, but no additional
XD heartbeat network is defined. It is strongly recommended that
an XD_ip network be configured in order to help prevent
cluster partitioning if the XD_data network fails. Cluster partitioning
may lead to data corruption for your replicated resources.
Completed 30 percent of the verification checks
This cluster uses Unicast heartbeat
Completed 40 percent of the verification checks
Verifying XD Solutions...
Jess# lsrpvclient -H
# RPV Client Physical Volume Identifier Remote Site
# -----------------------------------------------------------
hdisk1 00f92db138df5181 Chicago
Jess# lsrpvserver -H
# RPV Server Physical Volume Identifier Physical Volume
# -----------------------------------------------------------
rpvserver0 00f92db138ef5aee hdisk9
Jess# gmvgstat
GMVG Name PVs RPVs Tot Vols St Vols Total PPs Stale PPs Sync
--------------- ---- ---- -------- -------- ---------- ---------- ----
syncglvmvg 1 1 2 0 2542 0 100%
Ellie#lsvg syncglvmvg
0516-010 : Volume group must be varied on; use varyonvg command.
#
Ellie# lsmp -A syncglvmvg
0516-010 lsmp: Volume group must be varied on; use varyonvg command.
Ellie# lsrpvserver -H
# RPV Server Physical Volume Identifier Physical Volumer
# -----------------------------------------------------------
rpvserver0 00f92db138df5181 hdisk3
# lsrpvclient -H
# RPV Client Physical Volume Identifier Remote Site
# -----------------------------------------------------------
hdisk0 00f92db138ef5aee Unknown
# gmvgstat
GMVG Name PVs RPVs Tot Vols St Vols Total PPs Stale PPs Sync
--------------- ---- ---- -------- -------- ---------- ---------- ----
gmvgstat: Failed to obtain geographically mirrored volume group information using
lsglvm -v.
Important: This procedure does not configure the GMVG on the remote node. That
procedure must be done manually.
After all logical volumes are created, it is necessary to take the VG offline on the primary node
and then reimport the VG on the standby node by completing the following steps:
On primary node Jess:
a. Deactivate the GMVG by running varyoffvg asyncglvmvg.
b. Deactivate the rpvclient, hdisk1 by running rmdev -l hdisk1.
c. Activate the rpvserver, rpvserver0 by running mkdev -l rpvserver0.
On standby node Ellie:
a. Deactivate the rpvserver, rpvserver0 by running rmdev -l rpvserver0.
b. Activate rpvclient, hdisk0 by running mkdev -l hdisk0.
After you are satisfied that the GMVG information is correct, reverse these procedures to
return the GMVG back to the primary node:
On standby node Ellie:
a. Deactivate the VG by running varyoffvg asyncglvmg.
b. Deactivate the rpvclient, hdisk0 by running rmdev -l hdisk0.
c. Activate the rpvserver by running mkdev -l rpvserver0.
On primary node Jess:
a. Deactivate the rpvserver, rpvserver0 by running rmdev -l rpvserver0.
b. Activate the rpvclient, hdisk1 by running mkdev -l hdisk1.
c. Activate the GMVG by running varyonvg asyncglvmvg.
Run a cluster verification, and if there are no errors, then the cluster can be tested.
Before PowerHA SystemMirror V7.2, if customers wanted to implement the LPM operation for
one AIX LPAR that is running the PowerHA service, they had to perform a manual operation,
which is illustrated at IBM Knowledge Center.
This feature plugs into the LPM infrastructure to maintain awareness of LPM events and
adjusts the clustering related monitoring as needed for the LPM operation to succeed without
disruption. This feature reduces the burden on the administrator to perform manual
operations on the cluster node during LPM operations. For more information about this
feature, see IBM Knowledge Center.
This chapter introduces the necessary operations to ensure that the LPM operation for the
PowerHA node completes successfully. This chapter uses both PowerHA V7.1 and PowerHA
V7.2 cluster environments to illustrate the scenarios.
LPM provides the facility for no downtime for planned hardware maintenance. However, LPM
does not offer the same facility for software maintenance or unplanned downtime. You can
use PowerHA SystemMirror within a partition that is capable of LPM. However, this does not
mean PowerHA SystemMirror uses LPM, and PowerHA treats LPM as another application
within the partition.
LPAR freeze time is a part of LPM operational time, and it occurs when the LPM tries to
reestablish the memory state. During this time, no other processes can operate in the LPAR.
As part of this memory reestablishment process, memory pages from the source system can
be copied to the target system over the network connection. If the network connection is
congested, this process of copying over the memory pages can increase the overall LPAR
freeze time.
For a description of their relationship, see 4.5, “PowerHA, Reliable Scalable Clustering
Technology, and Cluster Aware AIX” on page 103.
The PowerHA V7.2 default node failure detection time is 40 seconds, and 30 seconds for
node communication timeout plus a 10-second grace period. These values can be altered as
wanted.
Node A declares partner Node B to be dead if Node A did not receive any communication or
heartbeats for more than 40 seconds. This process works well when Node B is dead
(crashed, powered off, and so on). However, there are scenarios where Node B is not dead,
but cannot communicate for long periods.
Some scenarios can be handled in the cluster. For example, in scenario 2, when Node B is
allowed to run after the unfreeze, it recognizes the fact that it has not been able to
communicate to other nodes for a long time and takes evasive actions. Those types of action
are called Dead Man Switch (DMS) protection.
DMS involves timers that monitor various activities, such as I/O traffic and process health, to
recognize stray cases where there is potential for it (Node B) to be considered dead by its
peers in the cluster. In these cases, the DMS timers trigger just before the node failure
detection time and evasive action is initiated. A typical evasive action involves fencing the
node.
Note: This operation is done manually with a PowerHA SystemMirror V7.1 node.
Section 8.3, “Example: Live Partition Mobility scenario for PowerHA V7.1” on page 279
introduces the operation. This operation is done automatically with a PowerHA System
V7.2 node, as described in 8.4, “Live Partition Mobility SMIT panel” on page 296.
Note: The Group Service (cthags) DMS timeout with AIX 7.2.1, at the time of writing, is
60 seconds. For now, it is hardcoded, and cannot be changed.
Therefore, if the LPM freeze time is longer than the Group Service DMS timeout, Group
Service (cthags) reacts and halts the node.
Without these APARs in PowerHA V7.1.1, the change requires two steps to change the CAA
node_timeout variable. For more information, see “Increasing the Cluster Aware AIX
node_timeout parameter” on page 286.
If the PowerHA version is earlier than Version 7.2, then you must perform the operations
manually. If PowerHA version is Version 7.2 or later, the PowerHA performs the operations
automatically.
This section introduces pre-migration and post-migration operation flow during LPM.
2 Change the node to unmanage resource group status by running the following command:
clmgr stop node <node_name> WHEN=now MANAGE=unmanage
3 Add an entry to the /etc/inittab file, which is useful in a node crash before restoring the
managed state, by running the following command:
mkitab hacmp_lpm:2:once:/usr/es/sbin/cluster/utilities/cl_dr undopremigrate >
/dev/null 2>&1
4 Check whether RSCT DMS critical resource monitoring is enabled by running the following
command:
/usr/sbin/rsct/bin/dms/listdms -s cthags | grep -qw Enabled
5 Disable RSCT DMS critical resource monitoring by running the following commands:
/usr/sbin/rsct/bin/hags_disable_client_kill -s cthags
/usr/sbin/rsct/bin/dms/stopdms -s cthags
6 Check whether the current node_timeout value is equal to the value that you set by running
the following commands:
clodmget -n -f lpm_node_timeout HACMPcluster
clctrl -tune -x node_timeout
8 If SAN-based heartbeating is enabled, then disable this function by running the following
commands:
echo 'sfwcom' >> /etc/cluster/ifrestrict
clusterconf
1 Check whether the current resource group status is unmanaged. If Yes, go to 2; otherwise,
go to 4.
2 Change the node back to the manage resource group status by running the following
command:
clmgr start node <node_name> WHEN=now MANAGE=auto
3 Remove the entry from the /etc/inittab file that was added in the pre-migration process
by running the following command:
rmitab hacmp_lpm
4 Check whether the RSCT DMS critical resource monitoring function is enabled before the
LPM operation.
5 Enable RSCT DMS critical resource monitoring by running the following commands:
/usr/sbin/rsct/bin/dms/startdms -s cthags
/usr/sbin/rsct/bin/hags_enable_client_kill -s cthags
6 Check whether the current node_timeout value is equal to the value that you set before by
running the following commands:
clctrl -tune -x node_timeout
clodmget -n -f lpm_node_timeout HACMPcluster
8 If SAN-based heartbeating is enabled, then enable this function by running the following
commands:
rm -f /etc/cluster/ifrestrict
clusterconf
rmdev -l sfwcomm*
mkdev -l sfwcomm*
There are two Power Systems 780 servers. The first server is P780_09 and its serial number
is 060C0AT, and the second server is P780_10 and its machine serial number is 061949T.
The following list provides additional details about the testing environment:
Each server has one Virtual I/O Server (VIOS) partition and one AIX partition.
The P780_09 server has VIOSA and AIX720_LPM1 partitions.
The P780_10 server has VIOSB and AIX720_LPM2 partitions.
There is one storage that can be accessed by the two VIOSs.
The two AIX partitions access storage by way of the NPIV protocol.
The heartbeating method includes IP, SAN, and dpcom.
The AIX version is AIX 7.2 SP1.
The PowerHA SystemMirror version is Version 7.1.3 SP4.
-------------------------------
NODE AIX720_LPM1
-------------------------------
7200-00-01-1543
PowerHA configuration
Table 8-3 shows the cluster’s configuration.
CAA Unicast
Primary disk: hdisk1
-------------------------------
NODE AIX720_LPM1
-------------------------------
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
testRG ONLINE AIX720_LPM1
OFFLINE AIX720_LPM2
Example 8-3 Cluster Aware AIX heartbeating status and the value of the node_timeout parameter
AIX720_LPM1:/ # clcmd lscluster -m
-------------------------------
NODE AIX720_LPM2
-------------------------------
Calling node query for all nodes...
Node query number of nodes examined: 2
----------------------------------------------------------------------------
-------------------------------
NODE AIX720_LPM1
-------------------------------
Calling node query for all nodes...
Node query number of nodes examined: 2
----------------------------------------------------------------------------
AIX720_LPM2:/ # prtconf
System Model: IBM,9179-MHD
Machine Serial Number: 061949T --> this server is P780_10
Example 8-6 Change the cluster service to unmanage resource groups through the SMIT menu
Stop Cluster Services
[Entry Fields]
* Stop now, on system restart or both now
Stop Cluster Services on these nodes [AIX720_LPM1]
BROADCAST cluster shutdown? true
* Select an Action on Resource Groups Unmanage Resource Groups
The second method is through the clmgr command, as shown in Example 8-7.
Example 8-7 Change the cluster service to unmanage a resource group through the clmgr command
AIX720_LPM1:/ # clmgr stop node AIX720_LPM1 WHEN=now MANAGE=unmanage
Broadcast message from root@AIX720_LPM1 (tty) at 23:52:44 ...
PowerHA SystemMirror on AIX720_LPM1 shutting down. Please exit any cluster
applications...
AIX720_LPM1: 0513-044 The clevmgrdES Subsystem was requested to stop.
.
"AIX720_LPM1" is now unmanaged.
AIX720_LPM1: Jan 26 2016 23:52:43 /usr/es/sbin/cluster/utilities/clstop: called
with flags -N -f
-------------------------------
NODE AIX720_LPM1
-------------------------------
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
testRG UNMANAGED AIX720_LPM1
UNMANAGED AIX720_LPM2
Note: In this case, there are only two nodes in this cluster, so you must disable this
function on both nodes. Only one node is shown in the example, but the command is run
on both nodes.
Example 8-8 Disabling the RSCT cthgs critical resource monitoring function
AIX720_LPM1:/ # /usr/sbin/rsct/bin/hags_disable_client_kill -s cthags
AIX720_LPM1:/ # /usr/sbin/rsct/bin/dms/stopdms -s cthags
Note: With the previous configuration, if LPM’s freeze time is longer than 600 seconds,
CAA DMS is triggered because of the CAA’s deadman_mode=a (assert) parameter. The node
crashes and its RG is moved to another node.
Note: The -f option of the clmgr command means not to update the HACMPcluster ODM
because it updates the CAA parameter (node_timeout) directly with the clctrl command.
This function is included with the following interim fixes:
PowerHA SystemMirror Version 7.1.2 - IV79502 (SP8)
PowerHA SystemMirror Version 7.1.3 - IV79497 (SP5)
If you do not apply one of these interim fixes, then you must perform four steps to increase
the CAA node_timeout variable (Example 8-10):
1. Change the PowerHA service to online status (because cluster sync needs this status).
2. Change the HACMPcluster ODM.
3. Perform cluster verification and synchronization.
4. Change the PowerHA service to unmanage resource group status.
Example 8-10 Detailed steps to change the CAA node_timeout parameter without PowerHA interim fix
--> Step 1
AIX720_LPM1:/ # clmgr start node AIX720_LPM1 WHEN=now MANAGE=auto
Adding any necessary PowerHA SystemMirror entries to /etc/inittab and /etc/rc.net
for IPAT on node AIX720_LPM1.
AIX720_LPM1: start_cluster: Starting PowerHA SystemMirror
...
"AIX720_LPM1" is now online.
Starting Cluster Services on node: AIX720_LPM1
This may take a few minutes. Please wait...
AIX720_LPM1: Jan 27 2016 06:17:04 Starting execution of
/usr/es/sbin/cluster/etc/rc.cluster
AIX720_LPM1: with parameters: -boot -N -A -b -P cl_rc_cluster
AIX720_LPM1:
AIX720_LPM1: Jan 27 2016 06:17:04 Checking for srcmstr active...
AIX720_LPM1: Jan 27 2016 06:17:04 complete.
--> Step 2
AIX720_LPM1:/ # clmgr modify cluster HEARTBEAT_FREQUENCY="600"
--> Step 4
AIX720_LPM1:/ # clmgr stop node AIX720_LPM1 WHEN=now MANAGE=unmanage
Broadcast message from root@AIX720_LPM1 (tty) at 06:15:02 ...
PowerHA SystemMirror on AIX720_LPM1 shutting down. Please exit any cluster
applications...
AIX720_LPM1: 0513-044 The clevmgrdES Subsystem was requested to stop.
.
"AIX720_LPM1" is now unmanaged.
Note: When you stop the cluster with unmanage and when you start it with auto, it tries to
bring the RG online, which does not cause any problem with the VGs, file systems, and
IPs. However, it runs the application controller one more time. If you do not predict the
appropriate checks in its application controller before running the commands, it can cause
problems with the application. Therefore, the application controller start script checks if the
application is already online before starting it.
Note: In our scenario, SAN-based heartbeating is configured, so this step is required. You
do not need to do this step if SAN-based heartbeating is not configured.
-------------------------------
NODE AIX720_LPM2
-------------------------------
Calling node query for all nodes...
Node query number of nodes examined: 2
-------------------------------
NODE AIX720_LPM1
-------------------------------
Calling node query for all nodes...
Node query number of nodes examined: 2
----------------------------------------------------------------------------
AIX720_LPM1:/ # lscluster -i
Network/Storage Interface Query
Node AIX720_LPM1
Node UUID = 112552f0-c4b7-11e5-8014-56c6a3855d04
Number of interfaces discovered = 3
Interface number 1, en1
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = FA:97:6D:97:2A:20
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 2
IPv4 ADDRESS: 172.16.50.21 broadcast 172.16.50.255 netmask
255.255.255.0
Node AIX720_LPM2
Node UUID = 11255336-c4b7-11e5-8014-56c6a3855d04
Number of interfaces discovered = 3
Interface number 1, en1
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = FA:F2:D3:29:50:20
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 172.16.50.22 broadcast 172.16.50.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.50.21
Interface number 2, sfwcom
IFNET type = 0 (none)
NDD type = 304 (NDD_SANCOMM)
Smoothed RTT across interface = 7
Mean deviation in network RTT across interface = 3
Probe interval for interface = 990 ms
IFNET flags for interface = 0x00000000
NDD flags for interface = 0x00000009
Interface state = DOWN RESTRICTED SOURCE HARDWARE RECEIVE SOURCE
HARDWARE TRANSMIT
Interface number 3, dpcom
IFNET type = 0 (none)
real 1m6.269s
user 0m0.001s
sys 0m0.000s
AIX720_LPM1:/ # prtconf
System Model: IBM,9179-MHD
Machine Serial Number: 061949T --> this server is P780_10
AIX720_LPM2:/ # prtconf|more
System Model: IBM,9179-MHD
Machine Serial Number: 061949T --> this server is P780_10
AIX720_LPM2:/ # rm /etc/cluster/ifrestrict
AIX720_LPM2:/ # clusterconf
-------------------------------
NODE AIX720_LPM2
-------------------------------
Calling node query for all nodes...
Node query number of nodes examined: 2
----------------------------------------------------------------------------
-------------------------------
NODE AIX720_LPM1
-------------------------------
Calling node query for all nodes...
Node query number of nodes examined: 2
----------------------------------------------------------------------------
Then, you can check the sfwcom interface’s status again by running the lscluster
command.
Note: In this case, there are only two nodes in this cluster, so you disable the function on
both nodes before LPM. Only one node is shown in this example, but the command is run
on both nodes.
Example 8-17 Changing the PowerHA service back to the normal status
Start Cluster Services
[Entry Fields]
* Start now, on system restart or both now
Start Cluster Services on these nodes [AIX720_LPM1]
* Manage Resource Groups Automatically
BROADCAST message at startup? true
Startup Cluster Information Daemon? false
Ignore verification errors? false
Automatically correct errors found during Interactively
cluster start?
Note: When you stop the cluster with unmanage and when you start it with auto, it tries to
bring the RG online, which does not cause any problem with the VGs, file systems, and
IPs. However, it runs the application controller one more time. If you do not predict the
appropriate checks in its application controller before running the commands, it can cause
problems with the application. Therefore, the application controller start script checks
whether the application is already online before starting it.
-------------------------------
NODE AIX720_LPM2
-------------------------------
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
testRG ONLINE AIX720_LPM1
OFFLINE AIX720_LPM2
-------------------------------
NODE AIX720_LPM1
-------------------------------
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
testRG ONLINE AIX720_LPM1
OFFLINE AIX720_LPM2
PowerHA SystemMirror listens to LPM events and automates steps in PowerHA SystemMirror
to handle the LPAR freeze that can occur during the LPM process. As part of the automation,
PowerHA SystemMirror provides a few variables that can be changed based on the
requirements for your environment.
You can change the following LPM variables in PowerHA SystemMirror that provide LPM
automation:
Node Failure Detection Timeout during LPM
LPM Node Policy
Start smit sysmirror. Select Custom Cluster Configuration → Cluster Nodes and
Networks → Manage the Cluster → Cluster heartbeat settings. The next panel is a menu
window with a title menu option and seven item menu options.
Table 8-4 describes the context-sensitive help information for the cluster heartbeating setting.
Node Failure Detection If specified, this timeout value (in seconds) is used during an LPM instead
Timeout during LPM of the Node Failure Detection Timeout value.
You can use this option to increase the Node Failure Detection Timeout
during the LPM duration to ensure that it is greater than the LPM freeze
duration to avoid any risk of unwanted cluster events. The unit is second.
For PowerHA V7.2 GA Edition, the customer can enter a value 10 - 600.
For PowerHA V7.2 SP1 or later, the default is 600 and is unchangeable.
LPM Node Policy Specifies the action to be taken on the node during an LPM operation.
If unmanage is selected, the cluster services are stopped with the Unmanage
Resource Groups option during the duration of the LPM operation.
Otherwise, PowerHA SystemMirror continues to monitor the RGs and
application availability.
The default is manage.
8.5.1 Troubleshooting
The PowerHA log that is related to LPM operation is in /var/hacmp/log/clutils.log.
Example 8-20 and Example 8-21 on page 299 show the information in this log file, and
includes pre-migration and post-migration.
Note: During the operation, PowerHA SystemMirror stops the cluster with the unmanage
option in the pre-migration stage, and starts it with the auto option in the post-migration
stage automatically. PowerHA SystemMirror tries to bring the RG online in the
post-migration stage, which does not cause any problem with the VGs, file systems, and
IPs. However, it runs the application controller one more time.
If you do not perform the appropriate checks in the application controller before running the
commands, it can cause problems with the application. Therefore, the application controller
start script checks whether the application is already online before starting it.
Highlights
The SMUI provides the following advantages over the PowerHA SystemMirror command line:
Monitors the status for all clusters, sites, nodes, and resource groups (RGs) in your
environment.
Scans event summaries and read a detailed description for each event. If the event
occurred because of an error or issue in your environment, you can read suggested
solutions to fix the problem.
Searches and compares log files. There are predefined search terms along with the ability
to enter your own:
– Error
– Fail
– Could not
Also, the format of the log file is easy to read and identify important information. While
viewing any log that has multiple versions, such as hacmp.out and hacmp.out.1, they are
merged together into a single log.
The logs include:
– hacmp.out
– cluster.log
– clutils.log
– clstrmgr.debug
– syslog.caa
– clverify.log
– autoverify.log
View properties for a cluster, such as the PowerHA SystemMirror version, name of sites
and nodes, and repository disk information.
Filesets
The SMUI consists of the following filesets:
cluster.es.smui.agent This fileset installs the agent files. Installing this fileset does not
start the agent. This fileset is automatically installed when you
use the smit install_all command to install PowerHA
SystemMirror Version 7.2.1 or later. This fileset is automatically
installed when you add clusters to the PowerHA SystemMirror
GUI.
cluster.es.smui.common This fileset installs common files that are required by both the
agent and the PowerHA SystemMirror GUI server. The Node.js
files are an example of common files. This fileset is automatically
installed when you use the smit install_all command to install
PowerHA SystemMirror Version 7.2.1 or later.
9.2 Installation
Before installing the SMUI, your environment must meet certain requirements, as explained in
9.2.1, “Planning” on page 305.
9.2.1 Planning
To use the SMUI, proper planning is necessary. The cluster nodes and SMUI server must be
at one of the following levels of AIX:
IBM AIX 6.1 with Technology Level 9 with Service Pack 15 or later
IBM AIX 7.1 with Technology Level 3 or later
IBM AIX Version 7.2 or later
Also, these additional cluster filesets must be installed on all nodes in the cluster:
cluster.es.smui.agent
cluster.es.smui.common
All of these filesets are available in the PowerHA SystemMirror 7.2.1 installation media.
The PowerHA SystemMirror GUI server should have internet access to run the smuiinst.ksh
command. However, if the server does not have internet access, complete the following
steps:
1. Copy the smuiinst.ksh file from the node to a system that is running the AIX operating
system that has internet access. In our case, we copy it to our NIM server.
2. Run the smuiinst.ksh -d /directory command, where /directory is the location where
you want to the download the files. We saved it a directory that was also NFS exported to
our SMUI server.
The following additional packages were downloaded:
– bash-4.2-5.aix5.3.ppc.rpm
– cpio-2.11-2.aix6.1.ppc.rpm
– gettext-0.17-6.aix5.3.ppc.rpm
– info-4.13-3.aix5.3.ppc.rpm
– libgcc-4.9.2-1.aix6.1.ppc.rpm
– libgcc-4.9.2-1.aix7.1.ppc.rpm
– libiconv-1.13.1-2.aix5.3.ppc.rpm
– libstdc++-4.9.2-1.aix6.1.ppc.rpm
– libstdc++-4.9.2-1.aix7.1.ppc.rpm
– readline-6.2-2.aix5.3.ppc.rpm
3. Copy the downloaded files to a directory on the PowerHA SystemMirror GUI server. In our
case, we use an NFS-mounted directory, so we skip this step.
4. From the PowerHA SystemMirror GUI server, run the smuiinst.ksh -i /directory
command, where /directory is the location where you copied the downloaded files.
# /usr/es/sbin/cluster/ui/server/bin/smuiinst.ksh -i ./
https://shawnssmui.cleartechnologies.net:8080/#/login
After you log in, you can add existing clusters in your environment to the
PowerHA SystemMirror GUI.
Figure 9-1 SMUI server installation script output
Attention: During our testing, we had mixed results when using the host name to gather all
the cluster data. However, the IP address seemed to be reliable. At the time of writing, this
was a known issue by development. If you experience the same issue, then contact IBM
PowerHA support.
Upon successful discovery, the cluster information is shown, and you can close the window,
as shown in Figure 9-4 on page 309.
To remove a cluster:
1. Click the keypad icon in the top center of the window.
2. Click Remove clusters.
3. Check the box next to the correct cluster.
4. Click Remove.
9.3 Navigating
The SMUI provides a web browser interface that can monitor your PowerHA SystemMirror
environment. The following sections explain and show examples of the SMUI.
9.3.3 General
This window gives an overall view of the cluster configuration. A small portion of it is shown in
Figure 9-11. You can expand or condense any or all sections as wanted. Currently, it can only
be viewed and not saved in a report. IBM intends to deliver it as a possible enhancement.
9.4 Troubleshooting
This section provides details about how to troubleshoot issues with your cluster.
Figure 10-1 A healthy two-node PowerHA Cluster with heartbeat messages exchanged
When both the active and backup nodes fail to receive heartbeat messages, each node
falsely declares the other node to be down, as shown in Figure 10-2. When this happens, the
backup node attempts to takeover the shared resources, including shared data volumes. As a
result, both nodes might be writing to the shared data and caused data corruption.
Figure 10-2 Cluster that is partitioned when nodes failed to communicate through heartbeat message
exchange
When a set of nodes fails to communicate with the remaining set of nodes in a cluster, the
cluster is said to be partitioned. This is also known as node isolation, or more commonly, split
brain.
Although increasing the number of communication paths for heartbeating can minimize the
occurrence of cluster partitioning due to communication path failure, the possibility cannot be
eliminated completely.
10.1.2 Terminology
Here is the terminology that is used throughout this chapter:
Cluster split When the nodes in a cluster fail to communicate with each
other for a period, each node declares the other node as down.
The cluster is split into partitions. A cluster split is said to have
occurred.
Split policy A PowerHA split policy defines the behavior of a cluster when a
cluster split occurs.
Cluster merge A PowerHA cluster merge policy defines the bahavior of a
cluster when a cluster merge occurs.
The PowerHA split policy was first introduced in PowerHA V7.1.3 with two options:
None
This is the default option where the primary and backup nodes operate independently of
each other after a split occurs, resulting in the same behavior as earlier versions during a
split-brain situation.
Tie breaker
This option is applicable to only clusters with sites configured. When a split occurs, the
partition that fails to acquire the SCSI reservation on the tie-breaker disk has its nodes
restarted. For a two-node cluster, one node is restarted, as shown in Figure 10-3 on
page 323.
The PowerHA merge policy was first introduced in PowerHA V7.1.3 with two options:
Majority
This is the default option. The partition with the highest number of nodes remains online. If
each partition has the same number of nodes, then the partition that has the lowest node
ID is chosen. The partition that does not remain online is restarted, as specified by the
chosen action plan. This behavior is similar to previous versions, as shown in Figure 10-5.
Tie breaker
Each partition attempts to acquire a SCSI reserve on the tie-breaker disk. The partition
that cannot reserve the disk is restarted, or has cluster services that are restarted, as
specified by the chosen action plan. If this option is selected, the split-policy configuration
must also use the tie-breaker option.
Before configuring a disk as tie breaker, you can check its current reservation policy by using
the AIX command devrsrv, as shown in Example 10-1.
When cluster services are started, the first time after a disk tie breaker is configured on a
node, the reservation policy of the tie-breaker disk is properly set to PR_exclusive with a
persistent reserve key, as shown in Example 10-2.
When the Tie Breaker option of the split policy is selected, the merge policy is automatically
set with the same tie-breaker option.
Here the NFS server is used as tie breaker for two clusters, redbookcluster and RBcluster,
as shown in Example 10-3 on page 329.
a. Input the host name of NFS server exporting the tie-breaker directory for the NFS tie
breaker, for example, tiebreakers.
b. Add the IP entry for the host name of the NFS server to /etc/hosts:
• Full path name of local mount point for mounting the NFS tiebreaker directory. For
example, /tiebreaker.
• Full path name of the directory that is exported from the NFS server. In this case
/tiebreaker.
Figure 10-13 on page 331 shows an example of the NFS tie-breaker configuration.
2. Sync cluster
When cluster services are started on each node, tie-breaker files are created on the NFS
server, as shown in Example 10-4.
/tiebreakers/RBcluster:
total 0
-rwx------ 1 root system 0 Nov 23 20:50 PowerHA_NFS_Reserve
drwxr-xr-x 2 root system 256 Nov 23 20:50
PowerHA_NFS_ReserveviewFilesDir
/tiebreakers/RBcluster/PowerHA_NFS_ReserveviewFilesDir:
total 16
-rwx------ 1 root system 257 Nov 23 20:50 testnode3view
-rwx------ 1 root system 257 Nov 23 20:50 testnode4view
/tiebreakers/redbookcluster:
total 0
-rwx------ 1 root system 0 Nov 23 20:51 PowerHA_NFS_Reserve
drwxr-xr-x 2 root system 256 Nov 23 20:51
PowerHA_NFS_ReserveviewFilesDir
/tiebreakers/redbookcluster/PowerHA_NFS_ReserveviewFilesDir:
total 16
-rwx------ 1 root system 257 Nov 23 20:51 testnode1view
-rwx------ 1 root system 257 Nov 23 20:51 testnode2view
Quarantine policies were first introduced in PowerHA V7.2. A quarantine policy isolates the
previously active node that was hosting a critical RG after a cluster split event or node failure
occurs. The quarantine policy ensures that application data is not corrupted or lost.
With the active node halt policy (ANHP), in the event of a cluster split, the standby node for a
critical RG attempts to halt the active node before taking over the RG and any other related
RGs. This task is done by issuing command to the HMC, as shown in Figure 10-14.
If the backup node fails to halt the active node, for example, the communication failure with
HMC, the RG is not taken over. This policy prevents application data corruption due to the
same RGs being online on more than one node at the same time.
In the simplest configuration of a two-node cluster with one RG, there is no ambiguity as to
which node can be halted by the ANHP in the event of a cluster split. But, when there are
multiple RGs in a cluster, it is not as simple:
In a mutual takeover cluster configuration, different RGs are online on each cluster node
and the nodes back up each other. An active node for one RG also is a backup or standby
node for another RG. When a cluster split occurs, which node halts?
When a cluster with multiple nodes and RGs is partitioned or split, some of the nodes in
each partition might have RGs online, for example, there are multiple active nodes in each
partition. Which partition can have its nodes halted?
It is unwanted to have nodes halting one another, resulting in the cluster down as a whole.
PowerHA V7.2 introduces the Critical Resource Groups for a user to define which RG is the
most important one when multiple RGs are configured. The ANHP can then use the critical
RG to determine which node is halted or restarted. The node or the partition with the critical
RG online is halted/ restarted and quarantined, as shown in Figure 10-14 on page 332.
Because this policy only fences off disks from the active node without halt or restarting it, it is
configured together with a split and merge policy.
4. Configure the ANHP and specify the Critical Resource Group, as shown in Figure 10-20,
Figure 10-21 on page 337, and Figure 10-22 on page 337.
Example 10-5 The clmgr command displaying the current quarantine policy
root@testnode1[/]#clmgr query cluster | grep -i quarantine
QUARANTINE_POLICY="fencing"
Important: The disk fencing quarantine policy cannot be enabled or disabled if cluster
services are active.
When cluster services are started on a node after enabling the Disk Fencing quarantine
policy, the reservation policy and state of the shared volumes are set to PR Shared with the
PR keys of both nodes registered. This action can be observed by using the devrsrv
command, as shown in Example 10-6.
root@testnode3[/]#clRGinfo
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
rg ONLINE testnode3
OFFLINE testnode4
root@testnode4[/]#lspv
hdisk0 00f8806f26239b8c rootvg active
hdisk2 00f8806f909bc31a caavg_private active
hdisk3 00f8806f909bc357 vg1 concurrent
hdisk4 00f8806f909bc396 vg1 concurrent
The PR Shared reservation policy uses the SCSI-3 reservation of type WRITE EXCLUSIVE,
ALL REGISTRANTS, as shown in Example 10-7 on page 341. Only nodes that are registered
can write to the shared volumes. When a cluster split occurs, the standby node ejects the PR
registration of the active node on all shared volumes of the affected RGs. In Example 10-6 on
page 339, the only registrations that are left on hdisk3 and hdisk4 are of testnode4, effectively
fencing off testnode3 from the shared volumes.
Node testnode3 is again registered on hdisk3 and hdisk4 when it has successfully rejoins
testnode4 to form a cluster. You must perform a restart of cluster services on testnode3.
Table 10-1 Split and merge policies for all cluster types
Cluster Type Pre AIX 7.2.1 AIX 7.2.1
TieBreaker TieBreaker
Manual Manual
Split and merge policies are configured as a whole instead of separately. These options
can also vary a bit based on the exact AIX dependency.
The action plan for the split and merge policy is configurable.
An entry is added to the Problem Determination Tools menu for starting cluster services
on merged node after a cluster split.
Changes were added to clmgr for configuring the split and merge policy.
All three options, None, Tie Breaker, and Manual, are now available for all cluster types, which
includes standard, stretched, and linked clusters.
Before PowerHA V7.2.1, the split policy has a default setting of None and the merge policy
has default setting of Majority and the default action was Reboot (Figure 10-26). This
behavior has not changed.
Note: If you specify the Split-Merge policy as None-None, the action plan is not
implemented and a restart does not occur after the cluster split and merge events. This
option is only available in your environment if it is running IBM AIX 7.2 with Technology
Level 1, or later.
Figure 10-27 Disk tie breaker split and merge action plan
Figure 10-28 NFS tie breaker split and merge action plan
The configuration must be synchronized to make this change known across the cluster.
It is required to enable the cluster services after a split situation is healed. Until the user
resolves this enablement, the cluster services are not running on the losing partition nodes
even after the networks rejoin. The losing partition nodes join the existing CAA cluster after
the re-enable is performed. This is done by running smitty sysmirror and selecting Problem
Determination Tools → Start CAA on Merged Node, as shown in Figure 10-30.
Tie-breaker disk preemption does not work in the case of a TBGL hard restart or power off.
The merge events are not available in a stretched cluster with versions earlier to AIX 7.2.1, as
shown in Figure 10-31.
The quarantine policy does not require additional infrastructure resources, but the split and
merge policy does. Users select the appropriate policy or combination of policies that suit
their data center environments.
Figure 10-32 Using a single NFS server as a tie breaker for multiple clusters
Figure 10-34 Testing scenario for the split and merge policy
Our testing environment is a single PowerHA standard cluster. It includes two AIX LPARs with
nodes host names PHA170 and PHA171. Each node has two network interfaces. One
interface is used for communication with HMCs and NFS server, and the other is used in the
PowerHA cluster. Each node has three FC adapters. The first adapter is used for rootvg, the
second adapter is used for user shared data access, and the third one is used for tie-breaker
access.
The PowerHA cluster is a basic configuration with the specific configuration option for different
split and merge policies.
CAA Unicast
Repository disk: hdisk1
Shared VG sharevg:hdisk2
The following sections contain the detailed PowerHA configuration of each scenario.
Example 10-9 PowerHA basic configuration that is shown with the cltopinfo command
# cltopinfo
Cluster Name: PHA_Cluster
Cluster Type: Standard
Heartbeat Type: Unicast
Repository Disk: hdisk1 (00fa2342a1093403)
NODE PHA170:
Network net_ether_01
PHASvc 172.16.51.172
PHA170 172.16.51.170
NODE PHA171:
Network net_ether_01
PHASvc 172.16.51.172
PHA171 172.16.51.171
PowerHA service
Example 10-10 shows the PowerHA nodes status from each PowerHA node.
Example 10-10 PowerHA nodes status in each scenario before a cluster split
# clmgr -cv -a name,state,raw_state query node
# NAME:STATE:RAW_STATE
PHA170:NORMAL:ST_STABLE
PHA171:NORMAL:ST_STABLE
Example 10-11 shows the PowerHA RG status from each PowerHA node. The RG (testRG) is
online on PHA170 node.
Example 10-11 PowerHA Resource Group status in each scenario before the cluster split
# clRGinfo -v
Example 10-12 Showing the CAA cluster configuration with the lscluster -c command
# lscluster -c
Cluster Name: PHA_Cluster
Cluster UUID: 28bf3ac0-b516-11e6-8007-faac90b6fe20
Number of nodes in cluster = 2
Cluster ID for node PHA170: 1
Primary IP address for node PHA170: 172.16.51.170
Cluster ID for node PHA171: 2
Primary IP address for node PHA171: 172.16.51.171
Number of disks in cluster = 1
Disk = hdisk1 UUID = 58a286b2-fe51-5e39-98b1-43acf62025ab cluster_major = 0 cluster_minor = 1
Multicast for site LOCAL: IPv4 228.16.51.170 IPv6 ff05::e410:33aa
Communication Mode: unicast
Local node maximum capabilities: SPLT_MRG, CAA_NETMON, AUTO_REPOS_REPLACE, HNAME_CHG, UNICAST, IPV6, SITE
Effective cluster-wide capabilities: SPLT_MRG, CAA_NETMON, AUTO_REPOS_REPLACE, HNAME_CHG, UNICAST, IPV6, SITE
Local node max level: 50000
Effective cluster level: 50000
Example 10-13 shows the CAA configuration with the lscluster -d command.
Node PHA170
Node UUID = 28945a80-b516-11e6-8007-faac90b6fe20
Number of disks discovered = 1
hdisk1:
State : UP
uDid : 33213600507680284001D5800000000005C8B04214503IBMfcp
uUid : 58a286b2-fe51-5e39-98b1-43acf62025ab
Site uUid : 51735173-5173-5173-5173-517351735173
Type : REPDISK
Node PHA171
Node UUID = 28945a3a-b516-11e6-8007-faac90b6fe20
Number of disks discovered = 1
hdisk1:
State : UP
uDid : 33213600507680284001D5800000000005C8B04214503IBMfcp
uUid : 58a286b2-fe51-5e39-98b1-43acf62025ab
Site uUid : 51735173-5173-
PowerHA V7.2 supports up to six backup repository disks. It also supports automatic
repository disk replacement in the event of repository disk failure. For more information,
see IBM PowerHA SystemMirror V7.2 for IBM AIX Updates, SG24-8278.
Example 10-14 and Example 10-15 show output from PHA170 and PHA171 nodes with the
lscluster -m command. The current heartbeat channel is the network.
Example 10-16 shows the current heartbeat devices that are configured in the testing
environment. There is not a SAN-based heartbeat device.
Node PHA171
Node UUID = 28945a3a-b516-11e6-8007-faac90b6fe20
Number of interfaces discovered = 2
Interface number 1, en0
IFNET type = 6 (IFT_ETHER)
NDD type = 7 (NDD_ISO88023)
MAC address length = 6
MAC address = FA:9D:66:B2:87:20
Smoothed RTT across interface = 0
Mean deviation in network RTT across interface = 0
Probe interval for interface = 990 ms
IFNET flags for interface = 0x1E084863
NDD flags for interface = 0x0021081B
Interface state = UP
Number of regular addresses configured on interface = 1
IPv4 ADDRESS: 172.16.51.171 broadcast 172.16.51.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPv4 MULTICAST ADDRESS: 228.16.51.170
Interface number 2, dpcom
IFNET type = 0 (none)
NDD type = 305 (NDD_PINGCOMM)
Smoothed RTT across interface = 750
Mean deviation in network RTT across interface = 1500
Probe interval for interface = 22500 ms
IFNET flags for interface = 0x00000000
NDD flags for interface = 0x00000009
Interface state = UP RESTRICTED AIX_CONTROLLED
Node PHA170
Node UUID = 28945a80-b516-11e6-8007-faac90b6fe20
Number of interfaces discovered = 2
Interface number 1, en0
IFNET type = 6 (IFT_ETHER)
Note: To identify physical FC adapters that can be used in the PowerHA cluster as the
SAN-based heartbeat, go to the IBM Knowledge Center.
At the time of writing, there is no plan to support this feature for all 16-Gb FC adapters.
This scenario keeps the default configuration for the split and merge policy and does not set
the quarantine policy. To simulate a cluster split, break the network communication between
the two PowerHA nodes, and disable the repository disk access from the PHA170 node.
After a cluster split occurs, restore communications to generate a cluster merge event.
Example 10-18 The clmgr command displays the current split/merge settings
# clmgr view cluster SPLIT-MERGE
SPLIT_POLICY="none"
MERGE_POLICY="majority"
ACTION_PLAN="reboot"
<...>
None
TieBreaker
Manual
After pressing Enter, the menu shows the policy, as shown in Example 10-20.
[Entry Fields]
Split Handling Policy None
Merge Handling Policy Majority +
Split and Merge Action Plan Reboot
2. Keep the default values and upon pressing Enter, you see the summary that is shown in
Example 10-21.
The PowerHA SystemMirror split and merge policies have been updated.
Current policies are:
Split Handling Policy : None
Merge Handling Policy : Majority
3. Synchronize the cluster. After the synchronization operation is complete, the cluster can
be activated.
You see that PHA171 took over the RG while the RG is still online on the PHA170 node.
To change this value, either run the PowerHA clmgr command or use the SMIT menu:
From the SMIT menu, run smitty sysmirror, select Custom Cluster Configuration →
Cluster Nodes and Networks → Manage the Cluster → Cluster heartbeat settings,
and then change the Node Failure Detection Timeout parameter.
To use the clmgr command, run the following command:
clmgr modify cluster HEARTBEAT_FREQUENCY= <the value you want to set,
default is 30>
Displaying the resource group status from the PHA170 node after the
cluster split
Example 10-22 shows that the PHA170 node cannot get the PHA171 node’s status.
# df
Filesystem 512-blocks Free %Used Iused %Iused Mounted on
...
/dev/sharelv 1310720 1309864 1% 4 1% /sharefs
Displaying the resource group status from the PHA171 node after the
cluster split
Example 10-25 shows that the PHA171 node cannot get the PHA170 node’s status.
Example 10-27 shows that the VG sharevg is varied on and the file system /sharefs is
mounted on PHA171 node, and it is writable too.
# df
Filesystem 512-blocks Free %Used Iused %Iused Mounted on
<...>
/dev/sharelv 1310720 1309864 1% 4 1% /sharefs
As seen in Example 10-7 on page 341, the /sharefs file system is mounted on both nodes
and in writable mode. Applications on two nodes can write at the same time. This is risky and
easily can result in data corruption.
The default merge policy is Majority and the action plan is Reboot. However, in our case, the
rule in the cluster merge event is:
The node that has a lower node ID s survives, and the other node is restarted by RSCT.
This rule is also introduced in 10.2.2, “Merge policy” on page 324.
Example 10-28 shows how to display a PowerHA node’s node ID. You can see that PHA170
has the lower ID, so it is expected that PHA171 node is restarted.
# lscluster -c
Cluster Name: PHA_Cluster
Cluster UUID: 28bf3ac0-b516-11e6-8007-faac90b6fe20
Number of nodes in cluster = 2
Cluster ID for node PHA170: 1
Primary IP address for node PHA170: 172.16.51.170
Cluster ID for node PHA171: 2
Primary IP address for node PHA171: 172.16.51.171
Number of disks in cluster = 1
Disk = hdisk1 UUID = 58a286b2-fe51-5e39-98b1-43acf62025ab cluster_major = 0 cluster_minor = 1
Multicast for site LOCAL: IPv4 228.16.51.170 IPv6 ff05::e410:33aa
Example 10-29 shows that the PHA171 node was rebooted at 22:25:02.
Description
SYSTEM SHUTDOWN BY USER
Probable Causes
SYSTEM SHUTDOWN
Detail Data
USER ID
0
0=SOFT IPL 1=HALT 2=TIME REBOOT
0
TIME TO REBOOT (FOR TIMED REBOOT ONLY)
0
PROCESS ID
13959442
PARENT PROCESS ID
4260250
PROGRAM NAME
hagsd
PARENT PROGRAM NAME
srcmstr
There is one new shared disk, hdisk3, that is added in this scenario, which is used for the disk
tie breaker.
Note: When using a tie-breaker disk for split and merge recovery handling, the disk must
also be supported by the devrsrv command. This command is part of the AIX operating
system.
At the time of writing, the EMC PowerPath disks are not supported for use as a tie-breaker
disk.
Note: The tie-breaker disk is set to no_reserve for the reserve_policy with the chdev
command before the start of the PowerHA service on both nodes. Otherwise, the
tie-breaker policy cannot take effect in a cluster split event.
None
TieBreaker
Manual
2. After pressing Enter, select the Disk option, as shown in Example 10-31.
Disk
NFS
3. Pressing Enter shows the disk tie breaker configuration window, as shown in
Example 10-32. The merge handling policy is TieBreaker too, and you cannot change it.
Also, keep the default action plan as Reboot.
[Entry Fields]
Split Handling Policy TieBreaker
Merge Handling Policy TieBreaker
* Select Tie Breaker [] +
Split and Merge Action Plan Reboot
None
hdisk3 (00fa2342a10932bf) on all cluster nodes
hdisk3 changed
The PowerHA SystemMirror split and merge policies have been updated.
Current policies are:
Split Handling Policy : Tie Breaker
Merge Handling Policy : Tie Breaker
Tie Breaker : hdisk3
Split and Merge Action Plan : Reboot
The configuration must be synchronized to make this change known across the cluster.
6. Synchronize the cluster. After the synchronization operation is complete, the cluster can
be activated.
7. Run the clmgr command to query the current split and merge policy, as shown in
Example 10-35.
Example 10-35 Display the newly set split and merge policies
# clmgr view cluster SPLIT-MERGE
SPLIT_POLICY="tiebreaker"
MERGE_POLICY="tiebreaker"
ACTION_PLAN="reboot"
TIEBREAKER="hdisk3"
<...>
When the tie breaker split and merge policy is enabled, the rule is the TBGL node has higher
priority to the reserve tiebreaker device than other nodes. If this node reserves the tie-breaker
device successfully, then other nodes are restarted.
To change the TBGL manually, see 10.8.4, “How to change the tie breaker group leader
manually” on page 370.
In this case, we broke all communication between the two nodes at 01:36:12.
Example 10-38 shows output of the errpt command on the PHA170 node. The PHA170
node restarts at 01:37:00.
Description
The operating system is being rebooted to ensure that critical resources are
stopped so that another sub-domain that has operational quorum may recover
Probable Causes
Critical resources are active and the active sub-domain does not have
operational quorum.
Failure Causes
Critical resources are active and the active sub-domain does not have
operational quorum.
Recommended Actions
After node finishes rebooting, resolve problems that caused the
operational
quorum to be lost.
Detail Data
DETECTING MODULE
RSCT,PeerDomain.C,1.99.22.299,23992
ERROR ID
As shown in Example 10-38 on page 368 with the time stamp, PHA170 restarts at 01:37:00.
PHA171 starts the takeover of the RG at 01:37:04. There is no opportunity for both nodes to
mount the /sharefs file system at the same time so that the data integrity is maintained.
The PHA171 node holds the tiebreaker disk during as cluster split
Example 10-39 shows that the tiebreaker disk is reserved by the PHA171 node after the
cluster split event happens.
# lspath -l hdisk1
Missing hdisk1 fscsi1
Missing hdisk1 fscsi1
Within 1 minute of the repository disk being enabled, the CAA services start automatically.
You can monitor the process by viewing the /var/adm/ras/syslog.caa log file.
Using the lscluster -m command, check whether the CAA service started. When ready, start
the PowerHA service with the smitty clstart or clmgr start node PHA170 command.
You can also bring the CAA services and PowerHA services online together manually by
running the following command:
clmgr start node PHA170 START_CAA=yes
During the cluster merge process, the tiebreaker reservation is automatically released.
Figure 10-37 Split and merge topology scenario with the NFS tie breaker
In this scenario, there is one NFS server. Each PowerHA node has one network interface,
en1, which is used to communicate with the NFS server. The NFS tie breaker requires NFS
protocol version 4.
You can verify that the directory is exported by viewing the /etc/exports file, as shown in
Example 10-44.
On the NFS clients and PowerHA nodes, complete the following tasks:
Edit /etc/hosts and add the NFS server definition, as shown in Example 10-45.
172.16.15.222 nfsserver
Now, verify that the new NFS mount point can be mounted on all the nodes, as shown in
Example 10-46.
# df|grep mnt
nfsserver:/nfs_tiebreaker 786432 429256 46% 11704 20% /mnt
# umount /mnt
Disk
NFS
[Entry Fields]
Split Handling Policy NFS
Merge Handling Policy NFS
* NFS Export Server [nfsserver]
* Local Mount Directory [/nfs_tiebreaker]
* NFS Export Directory [/nfs_tiebreaker]
Split and Merge Action Plan Reboot
After pressing enter, Example 10-49 shows the NFS TieBreaker configuration summary.
The PowerHA SystemMirror split and merge policies have been updated.
Current policies are:
Split Handling Policy : NFS
Merge Handling Policy : NFS
NFS Export Server :
nfsserver
Local Mount Directory :
/nfs_tiebreaker
NFS Export Directory :
/nfs_tiebreaker
Split and Merge Action Plan : Reboot
The configuration must be synchronized to make this change known across the cluster.
HACMPsplitmerge:
id = 0
policy = "split"
value = "NFS"
HACMPsplitmerge:
id = 0
policy = "merge"
value = "NFS"
HACMPsplitmerge:
id = 0
HACMPsplitmerge:
id = 0
policy = "nfs_quorumserver"
value = "nfsserver"
HACMPsplitmerge:
id = 0
policy = "local_quorumdirectory"
value = "/nfs_tiebreaker"
HACMPsplitmerge:
id = 0
policy = "remote_quorumdirectory"
value = "/nfs_tiebreaker"
4. Synchronize the cluster. After the synchronization operation completes, the cluster can be
activated.
Upon the cluster start, the PowerHA nodes mount the NFS automatically on both nodes, as
shown in Example 10-51.
NODE PHA170
node mounted mounted over vfs date options
nfsserver /nfs_tiebreaker /nfs_tiebreaker nfs4 Dec 01 08:50 vers=4,fg,soft,retry=1,timeo=10
In our case, the PHA171 node is the current TBGL, as shown in Example 10-52 on page 377.
So, it is expected that the PHA171 node survives and the PHA170 node restarts. The RG on
the PHA170 node is taken to the PHA171 node.
To change the TBGL manually, see 10.8.4, “How to change the tie breaker group leader
manually” on page 370.
Example 10-53 shows the output of the errpt command on the PHA170 node. This node
restarts at 07:24:38.
Description
The operating system is being rebooted to ensure that critical resources are
stopped so that another sub-domain that has operational quorum may recover
these resources without causing corruption or conflict.
Probable Causes
Critical resources are active and the active sub-domain does not have
Failure Causes
Critical resources are active and the active sub-domain does not have
operational quorum.
Recommended Actions
After node finishes rebooting, resolve problems that caused the
operational
quorum to be lost.
Detail Data
DETECTING MODULE
RSCT,PeerDomain.C,1.99.22.299,23992
ERROR ID
REFERENCE CODE
From the time stamp information that is shown in Example 10-53 on page 377, PHA170
restarts at 07:24:38, and PHA171 starts to take over RGs at 07:24:43. There is no opportunity
for both nodes to mount the /sharefs file system at the same time, so the data integrity is
maintained.
Example 10-54 shows that the PHA171 node wrote its node name into the
PowerHA_NFS_Reserve file successfully.
Example 10-54 NFS file that is written with the node name
# hostname
PHA171
# pwd
/nfs_tiebreaker
# ls -l
total 8
-rw-r--r-- 1 nobody nobody 257 Nov 28 07:24 PowerHA_NFS_Reserve
drwxr-xr-x 2 nobody nobody 256 Nov 28 04:06
PowerHA_NFS_ReserveviewFilesDir
# cat PowerHA_NFS_Reserve
PHA171
After CAA services start successfully, the PowerHA_NFS_Reserve file is cleaned up for the next
cluster split event. Example 10-55 shows that the size of PowerHA_NFS_Reserve file is
changed to zero after the CAA service is restored.
Example 10-55 NFS file zeroed out after the CAA is restored
# ls -l
total 0
-rw-r--r-- 1 nobody nobody 0 Nov 28 09:05 PowerHA_NFS_Reserve
drwxr-xr-x 2 nobody nobody 256 Nov 28 09:05 PowerHA_NFS_ReserveviewFilesDir
During the cluster merge process, the NFS tiebreaker reservations are released
automatically.
None
TieBreaker
Manual
After pressing Enter, the configuration panel opens, as shown in Example 10-57.
[Entry Fields]
Split Handling Policy Manual
Merge Handling Policy Manual
Notify Method []
Notify Interval (seconds) []
Maximum Notifications []
Split and Merge Action Plan Reboot
When selecting Manual as the split handling policy, the merge handling policy also is Manual.
This setting is required and cannot be changed.
There are other options that can be changed. Table 10-3 shows the context-sensitive help for
these items. This scenario keeps the default values.
Table 10-3 Information table to help explain the split handling policy
Name Context-sensitive help (F1) Associated list (F4)
Example 10-58 shows the summary after confirming the manual policy configuration.
The PowerHA SystemMirror split and merge policies have been updated.
Current policies are:
Split Handling Policy : Manual
Merge Handling Policy : Manual
Notify Method :
Notify Interval (seconds) :
Maximum Notifications :
Split and Merge Action Plan : Reboot
The configuration must be synchronized to make this change known across the cluster.
The PowerHA clmgr command provides an option to display the cluster split and merge
policy, as shown in Example 10-59.
Synchronize the cluster. After the synchronization operation completes, the cluster can be
activated.
Then, every console on the PHA170 node receives the message that is shown in
Example 10-60.
/usr/es/sbin/cluster/utilities/cl_sm_continue
To have the recovery action - Reboot - taken on all nodes on this partition, enter
/usr/es/sbin/cluster/utilities/cl_sm_recover
LOCAL_PARTITION 1 PHA170 OTHER_PARTITION 2 PHA171
Also, in the hacmp.out log of the PHA170 node, there is a notification that is logged about a
prompt for a split notification, as shown in Example 10-61.
Every console of the PHA170 node also receives a message, as shown in Example 10-62.
To have the recovery action - Reboot - taken on all nodes on this partition, enter
/usr/es/sbin/cluster/utilities/cl_sm_recover
LOCAL_PARTITION 2 PHA171 OTHER_PARTITION 1 PHA170
Note: When the cl_sm_continue command is run on one node, this node continues to
survive and takes over the RG if needed. Typically, this command is run on only one of the
nodes.
When the cl_sm_recover command is run on one node, this node restarts. Typically, you
do not want to run this command on both nodes.
This scenario runs the cl_sm_recover command on the PHA170 node, as shown in
Example 10-63. We also run the cl_sm_continue command on the PHA171 node.
Example 10-64 is the output of the errpt -c command. The PHA170 node restarts after
running the cl_sm_recover command.
Example 10-64 The errpt output from the PHA170 post manual split
errpt -c
4D91E3EA 1202214416 P S cluster0 A split has been detected.
2B138850 1202214416 I O ConfigRM ConfigRM received Subcluster Split event
A098BF90 1202214416 P S ConfigRM The operational quorum state of the acti
<...>
B80732E3 1202214416 P S ConfigRM The operating system is being rebooted t
<...>
9DBCFDEE 1202214616 T O errdemon ERROR LOGGING TURNED ON
69350832 1202214516 T S SYSPROC SYSTEM SHUTDOWN BY USER
<...>
The ConfigRM service log that is shown in Example 10-65 indicates that this node restarts at
21:44:48.
Note: To generate the IBM.ConfigRM service logs, run the following commands:
# cd /var/ct/IW/log/mc/IBM.ConfigRM
# rpttr -o dct trace.* > ConfigRM.out
After the PHA170 node restarts, run the cl_sm_continue command operation on the PHA171
node, as shown in Example 10-66.
Then, the PHA171 node continues and proceeds to acquire the RG, as shown in the
cluster.log file in Example 10-67.
Example 10-67 Cluster.log file from the PHA171 acquiring the resource group
Dec 2 21:45:26 PHA171 local0:crit clstrmgrES[10027332]: Fri Dec 2 21:45:26 Removing 1 from ml_idx
Dec 2 21:45:26 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: split_merge_prompt quorum
YES@SEQ@145@QRMNT@9@DE@11@NSEQ@8@OLD@1@NEW@0
Dec 2 21:45:26 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: split_merge_prompt quorum
YES@SEQ@145@QRMNT@9@DE@11@NSEQ@8@OLD@1@NEW@
0 0
Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: node_down PHA170
Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: node_down PHA170 0
Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_release PHA171 1
Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move PHA171 1 RELEASE
Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move PHA171 1 RELEASE 0
Dec 2 21:45:27 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_release PHA171 1 0
Dec 2 21:45:28 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_fence PHA171 1
Dec 2 21:45:28 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_fence PHA171 1 0
Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_fence PHA171 1
Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_fence PHA171 1 0
Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_acquire PHA171 1
Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move PHA171 1 ACQUIRE
Dec 2 21:45:30 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: acquire_takeover_addr
Dec 2 21:45:31 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: acquire_takeover_addr 0
Dec 2 21:45:33 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move PHA171 1 ACQUIRE 0
Dec 2 21:45:33 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_acquire PHA171 1 0
Dec 2 21:45:33 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: rg_move_complete PHA171 1
Dec 2 21:45:34 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: rg_move_complete PHA171 1 0
Dec 2 21:45:36 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT START: node_down_complete PHA170
Dec 2 21:45:36 PHA171 user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED: node_down_complete PHA170 0
The steps are similar to the one that are described in 10.8.5, “Cluster merge” on page 370.
There are two HMCs in this scenario. Each HMC has two network interfaces: One is used to
connect to the server’s FSP adapter, and the other one is used to communicate with the
PowerHA nodes. In this scenario, one node tries to shut down another node through the HMC
by using the ssh protocol.
The two HMCs provide high availability functions. If one HMC fails, PowerHA uses another
HMC to continue operations.
Example 10-68 shows how to set up the HMC password-less access from the PHA170 node
to one HMC.
Example 10-69 shows how to set up HMC password-less access from the PHA170 node to
another HMC.
Note: The operation that is shown in Example 10-69 on page 387 is also repeated for the
PHA171 node.
HMC Configuration
Configure Active Node Halt Policy
2. Select Add HMC Definition, as shown in Example 10-71 and press Enter. Then, the
detailed definition menu opens, as shown in Example 10-72 on page 389.
[Entry Fields]
* HMC name [HMC55]
Table 10-4 shows the help and information list for adding the HMC definition.
Table 10-4 Context-sensitive help and associated list for adding an HMC definition
Name Context-sensitive help (F1) Associated list (F4)
HMC name Enter the host name for the HMC. An IP Yes (single-selection).
address is also accepted here. IPv4 and IPv6 Obtained by running the
addresses are supported. following command:
/usr/sbin/rsct/bin/rmcd
omainstatus -s ctrmc –a
IP
Number of retries Enter a number of times one HMC command None. The default value is
is retried before the HMC is considered as 5.
non-responding. The next HMC in the list is
used after this number of retries fails. Setting
no value means that you use the default value,
which is defined in the Change/Show Default
HMC Tunables panel. When -1 is displayed in
this field, it indicates that the default value is
used.
Delay between Enter a delay in seconds between two None. The default value is
retries (in successive retries. Setting no value means 10s.
seconds) that you use the default value, which is defined
in Change/Show Default HMC Tunables panel.
When -1 is displayed in this field, it indicates
that the default value is used.
Checking HMC connectivity between "PHA171" node and "HMC55" HMC : success!
Checking HMC connectivity between "PHA170" node and "HMC55" HMC : success!
[Entry Fields]
* HMC name [HMC239]
You can use the clmgr commands to show the current setting of the HMC, as shown in
Example 10-75.
NAME="HMC239"
TIMEOUT="-1"
RETRY_COUNT="-1"
RETRY_DELAY="-1"
NODES="PHA171 PHA170"
STATUS="UP"
VERSION="V8R8.6.0.0"
HMC Configuration
Configure Active Node Halt Policy
2. The window in Example 10-77 is shown. Enable the Active Node Halt Policy and set the
RG testRG as the Critical Resource Group.
Example 10-77 Enabling the active node halt policy and setting and critical resource group
Active Node Halt Policy
[Entry Fields]
* Active Node Halt Policy Yes +
* Critical Resource Group [testRG] +
In this scenario, there is only one RG, so we set it as the critical RG. For a description
about the critical RG, see 10.3.1, “Active node halt quarantine policy” on page 332.
Example 10-78 shows the summary after pressing Enter.
The PowerHA SystemMirror split and merge policies have been updated.
Current policies are:
Note: If the split and merge policy is tiebreaker or manual, then the ANHP policy does
not take effect. Make sure to set the Split Handling Policy to None before setting the
ANHP policy.
3. Use the clmgr command to check the current configuration, as shown in Example 10-79.
4. When the HMC and ANHP configuration is complete, verify and synchronize the cluster.
During the verification and synchronization process, the LPAR name and system
information of the PowerHA nodes are added into the HACMPdynresop ODM database.
They are used when ANHP is triggered, as shown in Example 10-80.
HACMPdynresop:
key = "PHA170_LPAR_NAME"
value = "T_PHA170" -> LPAR name can be different with hostname,
hostname is PHA170
HACMPdynresop:
key = "PHA170_MANAGED_SYSTEM"
value = "8284-22A*844B4EW" -> This value is System Model * Machine
Serial Number
HACMPdynresop:
key = "PHA171_LPAR_NAME"
value = "T_PHA171"
HACMPdynresop:
key = "PHA171_MANAGED_SYSTEM"
value = "8408-E8E*842342W"
This scenario sets the Split Handling Policy to None and sets the Quarantine Policy to ANHP.
The Critical Resource Group is testRG and is online on the PHA170 node at this time. When
the cluster split occurs, it is expected that a backup node of this RG (PHA171) takes over the
RG. During this process, PowerHA tries to shut down the PHA170 node through the HMC.
Example 10-82 shows the PowerHA hacmp.out file on the PHA171 node. The log indicates
that PowerHA triggers a shutdown of the PHA170 node command at 02:44:55. This operation
is in the PowerHA rg_move_acquire event.
Note: PowerHA on the PHA171 node shuts down the PHA170 node before acquiring the
service IP and varyonvg share VG. Only when this operation completes successfully does
PowerHA continue other operations. If this operation fails, PowerHA is in the error state
and does not continue. So, the data in the share VG is safe.
In this scenario, the quarantine policy is disk fencing. There is one RG (testRG) in this cluster,
so this RG is also marked as a Critical in Disk Fencing in the configuration.
Note: If the ANHP policy is also enabled, in case of a cluster split, ANHP takes effect first.
HMC Configuration
Configure Active Node Halt Policy
[Entry Fields]
* Active Node Halt Policy No +
* Critical Resource Group [testRG]
Example 10-85 on page 397 shows that disk fencing is enabled and the Critical Resource
Group is testRG.
[Entry Fields]
* Disk Fencing Yes +
* Critical Resource Group [testRG]
After pressing Enter, Example 10-86 shows the summary of the split and merge policy
setting.
The PowerHA SystemMirror split and merge policies have been updated.
Current policies are:
Split Handling Policy : None
Merge Handling Policy : Majority
Split and Merge Action Plan : Reboot
The configuration must be synchronized to make this change known across the
cluster.
Disk Fencing : Yes
Critical Resource Group : testRG
Note: If you want to enable only the disk fencing policy, you also must set the split handling
policy to None.
HACMPsplitmerge:
id = 0
policy = "split"
value = "None"
HACMPsplitmerge:
id = 0
policy = "merge"
value = "Majority"
HACMPsplitmerge:
id = 0
policy = "anhp"
value = "No" -->> Important, make sure ANHP is disable.
HACMPsplitmerge:
id = 0
policy = "critical_rg"
value = "testRG"
HACMPsplitmerge:
id = 0
policy = "scsi"
value = "Yes"
Note: Before you perform a cluster verification and synchronization, check whether the
reserve_policy for the shared disks are set to no_reserve.
After the verification and synchronization, you can see that the reserve_policy of hdisk2
changed to PR_shared and also generated one PR_key_value on each node.
Example 10-89 shows the PR_key_value and reserve_policy setting in the PHA170 node.
This scenario sets the split handling policy to None and sets the quarantine policy to disk
fencing. The Critical Resource Group is testRG and is online on the PHA170 node at this
time. When the cluster split occurs, it is expected that the backup node of this RG (PHA171)
takes over the RG. During this process, PowerHA on the PHA171 node fences out PHA170
node from accessing the disk and allows itself to access it. PowerHA tries to use this method
to keep the data safe.
Example 10-92 shows the output of the PowerHA hacmp.out file. It indicates that PowerHA
triggers the preempt operation in the cl_scsipr_preempt script.
After some time, at 04:15:16, the /sharefs file system is fenced out and the application on
the PHA170 node cannot perform an update operation to it, but the application can still
perform read operations from it.
Example 10-93 shows the PowerHA cluster.log file of the PHA171 node.
Description
USER DATA I/O ERROR
Probable Causes
ADAPTER HARDWARE OR MICROCODE
DISK DRIVE HARDWARE OR MICROCODE
SOFTWARE DEVICE DRIVER
STORAGE CABLE LOOSE, DEFECTIVE, OR UNTERMINATED
Recommended Actions
CHECK CABLES AND THEIR CONNECTIONS
INSTALL LATEST ADAPTER AND DRIVE MICROCODE
INSTALL LATEST STORAGE DEVICE DRIVERS
IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
JFS2 MAJOR/MINOR DEVICE NUMBER
0064 0001
FILE SYSTEM DEVICE AND MOUNT POINT
/dev/sharelv, /sharefs
Example 10-95 shows the output of the devrsrv command on the PHA170 node. It indicates
that hdisk2 was held by the 9007287067281030 PR key, and this key belongs to the PHA171
node.
Example 10-96 shows the output of the devrsrv command on the PHA171 node.
Note: From the above description, you can see that the PHA171 node takes over the RG
and the data in /sharefs file system is safe, and the service IP is attached on PHA171 node
too. But the service IP is also online in the PHA170 node. So there is a risk that there is an
IP conflict. So, you need to do some manual operations to avoid this risk, including
rebooting the PHA170 node manually.
In this scenario, restart the PHA170 node and restore all communication between the two
nodes. After checking that the CAA service is up by running the lscluster -m command, start
the PowerHA service on the PHA170 node.
During the start of the PowerHA service, in the node_up event, PowerHA on the PHA170 node
resets the reservation for the shared disks.
Example 10-98 shows the output of the node_up event in PHA170. The log indicates that
PowerHA registers its key into the shared disks of the sharevg.
Example 10-99 shows that the PR key value of PHA170 node is registered to hdisk2. Thus, it
is ready for the next cluster split event.
Note: This new option is backlevel ported to PowerHA V7.2 and 7.1.3. At the time of
writing, you can obtain it by opening a Problem Management Report (PMR) and asking for
an interim fix for defect 1008628.
The startup takes longer than expected. Depending on your timeout settings, the start can
have a significant time delay.
This behavior happens due to the fact that the first cluster manager does not see the partner
at its initialization. This situation always happens even if there is a small time delay. Therefore,
the second cluster manager must wait until all RGs are processed by the first cluster
manager.
Starting the whole cluster with the first available node settings
A solution might be to change all the RGs to start on the first available node. The Start After
settings are still the same as shown in Figure 11-2 on page 409. The assumption is that you
still use the command clmgr on cluster command.
Figure 11-4 Start sequence with the setting on first available node
Figure 11-5 RG start sequence when all the cluster managers are running
Attention: You do not run these commands in your systems. By running these commands,
this section shows you how disk reservations work, especially in a clustered environment,
which demands more care while managing disk reservations.
This appendix describes SCSI reservations. This appendix covers the following topics:
SCSI reservations
Persistent Reserve IN
Storage
More about PR reservations
Persistent reservation commands
SCSI 3 Persistent Reservations provide the mechanism to control access to a shared device
from multiple nodes. The reservation persists even if the bus is reset for error recovery, which
is not the case with the SCSI 2 command, where device reservations do not survive after a
node restarts. Also, SCSI 3 PR supports multiple paths to a host, where SCSI 2 works only
with one path from host to a disk. The scope of a persistent reservation is the entire logical
unit.
SCSI 3 Persistent Reservations use the concept of register and reserve. Multiple nodes can
register their reservation keys (also known as PR_Key) with the shared device and establish a
reservation in any of the following modes, as shown in Table A-1.
Write exclusive 1h
Exclusive access 3h
In All Registrants type of reservations (WEAR and EAAR), each registered node is a Persistent
Reservation (PR) Holder. The PR Holder value is set to zero. The All registrants type is an
optimization that makes all cluster members equal, so if any member fails, the others
continue.
In all other types of reservation, there is a single reservation holder, which is one of the
following I_T nexus examples:
The nexus for which the reservation was established with a PERSISTENT RESERVE OUT
command with the RESERVE service action, the PREEMPT service action, the PREEMPT AND
ABORT service action, or the REPLACE LOST RESERVATION service action.
The nexus to which the reservation was moved by a PERSISTENT RESERVE OUT command
with the REGISTER AND MOVE service action.
An I_T nexus refers to the combination of the initiator port on the host with the target port on
the server:
1h Write Exclusive (WE)
Only the Persistent reservation holder shall be permitted to perform write operations to the
device. Only one persistent reservation holder at a time.
3h Exclusive Access (EA)
Only the Persistent reservation holder shall be permitted to access (includes read/write
operations) the device. Only one persistent reservation holder at a time.
Table A-2 shows the read/write operations with the type of All Registrants.
Table A-2 Read and write operations with the All Registrants type
Type WEAR (7h) WERO (5h) EAAR (8h) EARO (6h)
In the Registrants Only (RO) type, a reservation is exclusive to one of the registrants. The
reservation of the device is lost if the current PR holder removes this PR Key from the device.
To avoid losing the reservation, any other registrant can replace themselves (known as a
preempt) as the Persistent Reservation Holder. Alternatively, in the All Registrants (AR) type,
the reservation is shared among all registrants.
The lsattr command with the -E option displays the effective policy for the disk in the AIX
ODM. The -P option displays the policy when the device was last configured. This is the
reservation information about the AIX kernel that is used to enforce the reservation during
disk opens.
Setting these attributes by using the chdev command can fail if the resource is busy, as shown
in Example A-3.
Example A-3 Setting the disk attribute with the chdev command
# chdev -l hdisk1 -a reserve_policy=PR_shared
Method error (/usr/lib/methods/chgdisk):
0514-062 Cannot perform the requested function because the specified device is busy.
When the device is in use, you can use the -P flag to chdev to change the effective policy only.
The change is made to the database and the changes are applied to the device when the
system is restarted. Another method is to use the -U flag where the reservation information is
updated with the AIX ODM and the AIX kernel. However, not all devices support the -U flag.
One of the ways to determine this support is to look for the True+ value in the lsattr output,
as shown in Example A-4.
Example A-4 Checking whether the device supports the U flag by using the lsattr command output
# lsattr -Pl hdisk1 -a reserve_policy
reserve_policy PR_shared Reserve Policy True+
Attention: You do not run these commands in your systems. By running these commands,
this section shows you how disk reservations work, especially in a clustered environment,
which demands more care while managing disk reservations.
Persistent Reserve IN (PRIN) commands are used to obtain information about active
reservations and registrations on a device. The following PRIN service actions are commonly
used:
Read keys To read PR Keys of all registrants of the device.
Read reservation To obtain information about the Persistent Reservation Holder. The PR
Holder value is zero if the All Registrants type of reservation exists on
the device; otherwise, it is the PR Key of the node holding the
reservation of the device exclusively.
Report capabilities To read the capability information of the device. The capability bits
indicate whether the device supports persistent reservations and the
types of reservation that are supported by the device. A devrsrv
implementation of this service action is shown in Example A-5.
Attention: You do not run these commands in your systems. By running these commands,
this section shows you how disk reservations work, especially in a clustered environment,
which demands more care while managing disk reservations.
Persistent Preserve OUT (PROUT) commands are used to reserve, register, and remove the
reservations and reservation keys. The following PROUT service actions are commonly used:
Register To register and unregister a PR key with a device.
Reserve To create a persistent reservation for the device.
Release To release the selected persistent reservation and not remove any
registrations.
Clear To release any persistent reservations and remove all registrations on
the device.
Preempt To replace the persistent reservation or remove registrations.
Preempt and abort Along with preempting, to abort all tasks for one or more preempted
nodes.
The value of the service action key and the reservation type matters when Preempt or
Preempt and Abort actions are performed. Therefore, a little more information about these
service actions is necessary.
The PREEMPT AND ABORT service action is identical to the responses to a PREEMPT service
action except that all tasks from the device that is associated with the persistent reservations
or registrations being preempted (but not the task containing the PROUT command itself) shall
be aborted. See Table A-3.
Table A-3 Effects of preempt and abort under different reservation types
Reservation type Service action Action
reservation key
Perform the register action from each system (1 - 4) to register its reservation key with the
disk and reserve action to establish the reservation. The PR_Holder_key value represents the
current reservation holder of the disk. As shown in Table A-4 on page 419, in the RO type
only one system can hold the reservation of the disk at a time (key 0x1 in our example).
PR_Holder_Key 0 0x1
A read key command displays all of the reservation keys that are registered with the disk
(0x1, 0x2, 0x3, and 0x4). The read reservation command gives the value of PR_Holder_Key,
which varies per reservation type. If there is a network or any other failure such that system 1
and the rest of the systems cannot communicate with each other for a certain period, a
split-brain or split-cluster situation results, as shown in Figure A-2.
Suppose that your cluster manager decides on system 2 to take ownership (or the subcluster
with system 2); then, the system can issue a PROUT command with a preempt or preempt and
abort option and remove the PR_Key 0x1 registration from the disk. The result is that the
reservation is moved away from system 1, as shown in Table A-5, and is denied access to the
shared disk.
PR_Holder_Key 0 0x2
Preempt or preempt and abort functions can take the following arguments:
Current_key PR_key of nodes issuing the command, for example, 0x2.
Disk The shared disk in discussion.
Action_key PR_key on which the action must be taken.
The action_key is 0x1 with the RO type of reservation. The action_key can be either 0 or 0x1
with the AR type of reservation. The two methods of preempting in case of an AR type are
explained as follows:
Method 1: Zero action key
If the action key is zero, the following actions take place:
– Registration of systems 1, 3, and 4 are removed.
– The persistent reservation is removed.
– A reservation from system 2 is created.
If the access to the rest of the system in active subclusters must be regained, perform an
event to reregister the keys of the systems of the active cluster (systems 3 and 4).
Method 2: Nonzero action key
If the action key is nonzero, in our case, the key of the system, there is no release of the
persistent reservation, but the registration of the PR_Key 0x1 is removed, which achieves
fencing, as shown in Figure A-4.
Table A-6 shows the result of prin commands after preempting system 1.
Method 1 Method 2
PR_Holder_Key 0 0x2
If the unregistered key is the PR_Holder_key (0x2) in the RO type of reservation, along with
the PR_key, the reservation to the disk is also lost. Removing Key 0x2 has no impact on the
reservation in the case of the AR reservation type. The same is true when other keys are
removed.
Any preempt attempt by system 1 fails with a conflict because its key is not registered with the
disk.
Release request
A release request from persistent reservation holder node releases the reservation of the
disk only, and the pr_keys remains registered. Referring to Table A-7, with AR type of
reservation, a release command from any of the registrants (0x2 0x3 0x4) results in the
reservation being removed. In the case of RO type, a release command from non pr_holders
(0x3 0x4) returns successfully, but with no impact on the reservation or registration. Release
requests come from PR_holder (0x2) in this case.
Clear request
Referring again to Table A-7, if a clear request is made to the target device from any of the
nodes, the persistent reservation of the disk is lost, and all of the pr_keys that are registered
with the disk (0x2 0x3 0x4) are removed. As the T10 document suggests, the clear action
must be restricted to recovery operations because it defeats the persistent reservation feature
that protects data integrity.
Note: When a node opens the disk or a register action is performed, it registers with the
PR_key value through each path to the disk. Therefore, you can see multiple registrations
(I_T nexuses) with the same key. The number of registrations is equal to the number of
active paths from the host to the target because each path represents an I_T nexus.
Example A-6 IBM storage support with native AIX MPIO of the SCSI PR Exclusive
# lsattr -Rl hdiskx -a reserve_policy | grep PR
PR_exclusive
PR_shared
Director bits SCSI3 Interface (SC3) and SCSI Primary Commands (SC2) must be
enabled. Flag SCSI3_persist_reserv must also be enabled to use persistent reservation
on powerpath devices.
If the current reservation key is zero and the TYPE field (persistent reservation type as shown
in Table A-1 on page 414) is also 0, this means that there is no persistent reservation on the
disk. If the TYPE field is Write Exclusive All Registrants(7h), then some other host is
already registered for shared access. In either case, complete the following actions:
1. Register our key on to the disk by using a PR OUT command with the Register and Ignore
Existing Key service action.
2. Reserve the disk by using a PR OUT command with the RESERVE service action and the type
of Write Exclusive All Registrants (7h).
While closing the disk, for PR_exclusive reservations alone, send a PR OUT command with the
Clear service action to the disk to clear all of the existing reservations and registration. This
command is sent through any one of the good paths of the disk (the I_T nexus where
registration was done successfully).
While changing the reserve_policy by using chdev from PR_shared to PR_exclusive, from
PR_shared or PR_exclusive to single_path (or no_reserve if the key in ODM is one of the
registered keys on the disk), send a PR OUT command with the Clear service action to the disk
to clear all of the existing reservations and registration.
The clrsrvmgr command of PowerHA V7.2 lists and clears the reservation of a disk or a
group of disks in a volume group (VG).
The manager does not guarantee the operation because disk operations depend on the
accessibility of the device. However, it tries to show the reason for failure when used with the
-v option. The utility does not support operations at both the disk and VG levels together.
Therefore, the -l and -g options cannot coexist. At the VG level, the number of disks in the
VG, and each target disk name, are displayed as shown in the following code:
# clrsrvmgr -rg PRABVG
Number of disks in PRABVG: 2
hdisk1011
Configured Reserve Policy : no_reserve
Effective Reserve Policy : no_reserve
Reservation Status : No reservation
hdisk1012
Configured Reserve Policy : no_reserve
Effective Reserve Policy : no_reserve
Reservation Status : No reservation
At disk level, the disk name is not mentioned because the target device is known:
# clrsrvmgr -rl hdisk1015 -v
Configured Reserve Policy : PR_shared
Effective Reserve Policy : PR_shared
Reservation Status : No reservation
PowerHA V7.2 recognizes and supports live update of cluster member nodes:
PowerHA switches to an unmanage mode during the operation.
It allows workload and storage activities continue to be run without interruption.
Live update can be performed on one node in the cluster at a time.
Tip: This AIX 7.2 Live Kernel update YouTube video provides a demonstration of LKU.
single = /home/dummy.150813.epkg.Z
--- EOF ---
disks:
nhdisk = <hdisk#>
mhdisk = <hdisk#>
tohdisk = <hdisk#>
tshdisk = <hdisk#>
hmc:
lpar_id =
management_console = dsolab134
user = hscroot
disks:
nhdisk = hdisk1
mhdisk = hdisk2
tohdisk = hdisk3
tshdisk = hdisk7
hmc:
lpar_id =
management_console = dsolab134
user = hscroot
software:
single = /home/dummy.150813.epkg.Z
6. Install the interim fix by running the following command. The flags that are used in the
commands are described as follows:
-d device or directory Specifies the device or directory containing the images to
install.
-k Specifies that the AIX Live Update operation is to be
performed. This is a new flag and for LKU.
# geninstall -k -d /home/ dummy.150813.epkg.Z
Validating live update input data.
Computing the estimated time for the liveupdate operation:
-------------------------------------------------------
LPAR: kern102
Blackout_time(s): 82
Global_time(s): 415
....................................
Initializing live update on original LPAR.
# clRGinfo -m
------------------------------------------------------------
Group Name Group State Application state Node
------------------------------------------------------------
RG1 UNMANAGED kern102
montest OFFLINE
AIX Live Update is automatically enabled at PowerHA V7.2.0 and AIX 7.2.0 and later
versions. AIX Live Update is not supported on AIX 7.1.X with PowerHA V7.2.0 installed.
However, if you are upgrading AIX to Version 7.2.0 or later, you must enable the AIX Live
Update function in PowerHA to use the Live update support of AIX.
The publications that are listed in this section are considered suitable for a more detailed
description of the topics that are covered in this book.
IBM Redbooks
The following IBM Redbooks publications provide additional information about the topic in this
document. Some publications referenced in this list might be available in softcopy only.
IBM PowerHA SystemMirror 7.1.2 Enterprise Edition for AIX, SG24-8106
IBM PowerHA SystemMirror for AIX Cookbook, SG24-7739
IBM PowerHA SystemMirror Standard Edition 7.1.1 for AIX Update, SG24-8030
IBM PowerHA SystemMirror V7.2 for IBM AIX Updates, SG24-8278
Power Enterprise Pools on IBM Power Systems, REDP-5101
You can search for, view, download or order these documents and other Redbooks,
Redpapers, web docs, draft, and additional materials, at the following website:
ibm.com/redbooks
Other publications
This publication is also relevant as a further information source:
IBM RSCT for AIX: Guide and Reference, SA22-7889
Online resources
These websites are also relevant as further information sources:
PowerHA SystemMirror Interim Fix Bundles information:
https://aix.software.ibm.com/aix/ifixes/PHA_Migration/ha_install_mig_fixes.htm
PowerHA wiki:
http://tinyurl.com/phawiki
PowerHA LinkedIn group:
https://www.linkedin.com/grp/home?gid=8413388
SG24-8372-00
ISBN 0738442518
Printed in U.S.A.
®
ibm.com/redbooks