Powerha Systemmirror For Aix Cookbook: Books
Powerha Systemmirror For Aix Cookbook: Books
PowerHA SystemMirror
for AIX Cookbook
Dino Quintero
Shawn Bodily
Vera Cruz
Sachin P. Deshmukh
Karim El Barkouky
Youssef Largou
Jean-Manuel Lenez
Vivek Shukla
Kulwinder Singh
Tim Simon
Redbooks
Draft Document for Review March 23, 2023 11:54 am 7739edno.fm
IBM Redbooks
February 2023
SG24-7739-02
7739edno.fm Draft Document for Review March 23, 2023 11:54 am
Note: Before using this information and the product it supports, read the information in “Notices” on
page xiii.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Part 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 3. Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.1 High availability planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.2 Planning for PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.2.1 Planning strategy and example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2.2 Planning tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2.3 Getting started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.2.4 Current environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.2.5 Addressing single points of failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2.6 Initial cluster design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.2.7 Naming conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.2.8 Completing the cluster overview planning worksheet . . . . . . . . . . . . . . . . . . . . . . 80
3.3 Planning cluster hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.3.1 Overview of cluster hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.3.2 Completing the cluster hardware planning worksheet . . . . . . . . . . . . . . . . . . . . . 82
3.4 Planning cluster software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.4.1 AIX and RSCT levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.4.2 Virtual Ethernet and vSCSI support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.4.3 Required AIX file sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.4.4 PowerHA 7.2.7 file sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.4.5 AIX files altered by PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.4.6 Application software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4.7 Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4.8 Completing the software planning worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.5 Operating system considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.6 Planning security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.6.1 Cluster security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.6.2 User administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.6.3 HACMP group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.6.4 Planning for PoweHA file collections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.7 Planning cluster networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.7.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.7.2 General network considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.7.3 IP Address Takeover planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.7.4 Additional network planning considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.7.5 Completing the network planning worksheets. . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.8 Planning storage requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.8.1 Internal disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.8.2 Cluster repository disk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.8.3 SAN-based heartbeat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.8.4 Shared disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.8.5 Enhanced Concurrent Mode (ECM) volume groups . . . . . . . . . . . . . . . . . . . . . . 109
3.8.6 How fast disk takeover works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.8.7 Enabling fast disk takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.8.8 Shared logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.8.9 Completing the storage planning worksheets . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.9 Application planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.9.1 Application controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.9.2 Application monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.9.3 Availability analysis tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.9.4 Completing the application planning worksheets . . . . . . . . . . . . . . . . . . . . . . . . 116
3.10 Planning for resource groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.10.1 Resource group attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.10.2 Completing the planning worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
3.11 Detailed cluster design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.12 Developing a cluster test plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.12.1 Custom test plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.12.2 Cluster Test Tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
3.13 Developing a PowerHA 7.2.7 installation plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
3.14 Backing up the cluster configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
3.15 Documenting the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.15.1 Native HTML report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.16 Change and problem management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.17 Planning tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.17.1 Paper planning worksheets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.17.2 Cluster diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Contents v
7739TOC.fm Draft Document for Review March 23, 2023 11:54 am
Contents vii
7739TOC.fm Draft Document for Review March 23, 2023 11:54 am
Contents ix
7739TOC.fm Draft Document for Review March 23, 2023 11:54 am
Contents xi
7739TOC.fm Draft Document for Review March 23, 2023 11:54 am
Notices
This information was developed for products and services offered in the US. This material might be available
from IBM in other languages. However, you may be required to own a copy of the product or product version in
that language in order to access it.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user’s responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, MD-NC119, Armonk, NY 10504-1785, US
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without
incurring any obligation to you.
The performance data and client examples cited are presented for illustrative purposes only. Actual
performance results may vary depending on specific configurations and operating conditions.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
Statements regarding IBM’s future direction or intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to actual people or business enterprises is entirely
coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are
provided “AS IS”, without warranty of any kind. IBM shall not be liable for any damages arising out of your use
of the sample programs.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation, registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at “Copyright
and trademark information” at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks or registered trademarks of International Business Machines Corporation,
and might also be trademarks or registered trademarks in other countries.
AIX® IBM Spectrum® Redbooks®
Cognos® MQSeries® Redbooks (logo) ®
Db2® Orchestrate® Storwize®
DB2® OS/400® System z®
DS8000® POWER® SystemMirror®
HyperSwap® POWER8® Tivoli®
IBM® POWER9™ WebSphere®
IBM Cloud® PowerHA® XIV®
IBM FlashSystem® PowerVM®
IBM Security™ pureScale®
The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive
licensee of Linus Torvalds, owner of the mark on a worldwide basis.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
Ansible, OpenShift, Red Hat, RHCSA, are trademarks or registered trademarks of Red Hat, Inc. or its
subsidiaries in the United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other company, product, or service names may be trademarks or service marks of others.
Preface
This IBM Redbooks publication can help you install, tailor, and configure the new IBM
PowerHA Version 7.2.7, and understand new and improved features such as migrations,
cluster administration, and advanced topics like utilizing Resource Optimized High Availability
(ROHA), creating a cross site LVM stretched campus cluster, and running PowerHA System
Mirror in the IBM Power Virtual Server environment.
With this book, you can gain a broad understanding of the IBM PowerHA SystemMirror®
architecture. If you plan to install, migrate, or administer a high availability cluster, this book is
right for you.
This book can help IBM AIX professionals who seek a comprehensive and task-oriented
guide for developing the knowledge and skills required for PowerHA cluster design,
implementation, and daily system administration. It provides a combination of theory and
practical experience.
This book is targeted toward technical professionals (consultants, technical support staff, IT
architects, and IT specialists) who are responsible for providing high availability solutions and
support with the IBM PowerHA SystemMirror Standard on IBM POWER® systems.
Authors
This book was produced by a team of specialists from around the world working at IBM
Redbooks, Austin Center.
Dino Quintero is a Systems Technology Architect with IBM® Redbooks®. He has 28 years of
experience with IBM Power technologies and solutions. Dino shares his technical computing
passion and expertise by leading teams developing technical content in the areas of
enterprise continuous availability, enterprise systems management, high-performance
computing (HPC), cloud computing, artificial intelligence (including machine and deep
learning), and cognitive solutions. He is a Certified Open Group Distinguished Technical
Specialist. Dino is formerly from the province of Chiriqui in Panama. Dino holds a Master of
Computing Information Systems degree and a Bachelor of Science degree in Computer
Science from Marist College.
Shawn Bodily is an eight-time IBM Champion for Power Systems. He is known online as
“PowerHA guy” and is a Senior IT Consultant for Clear Technologies in Dallas, Texas. He has
30 years of IBM AIX experience and the last 26 years specializing in high availability and
disaster recovery primarily focused around IBM PowerHA®. He has written and presented
extensively about high availability and storage at technical conferences, webinars, and on site
to customers. He is an IBM Redbooks platinum author who has co-authored over a dozen
IBM Redbooks publications and IBM Redpaper publications. He’ is also the only author to
work on every version of this book.
Vera Cruz is a consultant for IBM Power Systems in IBM ASEAN Technology Lifecycle
Services. She has 28 years of IT experience doing implementation, performance
management, high-availability and risk assessment, and security assessment for IBM AIX
and IBM Power Systems across diverse industries, including banking, manufacturing, retail,
and government institutions. She has been with IBM for 8 years. Before joining IBM, she
worked for various IBM Business Partners in the Philippines and Singapore, working as Tech
Support Specialist and Systems Engineer for IBM AIX and IBM Power Systems. She holds a
degree in Computer Engineering at the Cebu Institute of Technology University in Cebu,
Philippines.
Sachin P. Deshmukh is the Global Power/AIX Platform Lead for Kyndryl, based in the USA.
His area of expertise includes IBM AIX operating system provisioning and support, IBM
PowerHA, virtualization and Cloud platform. He provides guidance, oversight and assistance
to global delivery teams supporting Kyndryl accounts. As a member of the Critical Response
Team, he works on major incidents and high severity issues. He participates in proactive
Technical Health Checks and Service Management Reviews. He interacts with automation,
design, procurement, architecture and support teams for setting delivery standards and
creating various best practices documentation. He creates and maintains the IBM AIX
Security Technical Specifications for Kyndryl. He is also certified on various other platforms
such as AWS Solutions Architect (Associate), AWS Cloud Practitioner, RHCSA among
others. Prior to the transition to Kyndryl in 2021, he had been with IBM since 1999. He has
been closely associated with IBM AIX and the IBM Power Systems platform for close to 30
years.
Youssef Largou is the founding director of PowerM, a platinum IBM Business Partner in
Morocco. He has 21 years of experience in systems, high-performance computing (HPC),
middleware, and hybrid cloud, including IBM Power, IBM Storage, IBM Spectrum, IBM
WebSphere®, IBM Db2®, IBM Cognos®, IBM WebSphere Portal, IBM MQ, Enterprise
Service Bus (ESB), IBM Cloud® Paks, and Red Hat OpenShift. He has worked within
numerous industries with many technologies. Youssef is an IBM Champion 2020,2021 and
2022, IBM Redbooks Gold Author and he designed many reference architectures. He has
been awarded as an IBM Beacon Award Finalist in Storage, Software Defined Storage and
LinuxONE five times. He holds an engineer degree in Computer Science from the Ecole
Nationale Supérieure des Mines de Rabat and Excecutif MBA from EMLyon.
Jean-Manuel Lenez has been a Presales engineer since 1999 with IBM Switzerland, He
specializes in UNIX Power / IBM AIX® / IBM i server technologies as well as associated
products such as PowerVM, PowerHA, PowerSC, Linux On Power, IBM Cloud. He is heavily
involved in his presales mission, where he leads projects with major customers around
various subjects: Artificial intelligence, deep learning, SAP HANA, server consolidation,
HA/DR. He is motivated and passionate about technology and he is always ready to assist
companies in addressing the new challenges of the market and to invest in the latest
technological developments: Hybrid-Cloud, OpenShift, Dev-Ops, Ansible.
Vivek Shukla is a Presales Consultant for cloud, AI, and cognitive offerings in India and IBM
Certified L2 (Expert) Brand Technical Specialist. He has over 20 years of IT experience in
Infrastructure Consulting, IBM AIX, and IBM Power Servers and Storage implementations. He
also has hands-on experience with IBM Power Servers, IBM AIX and system software
installations, RFP understandings, SOW preparations, sizing, performance tuning, RCA
analysis, disaster recovery, and mitigation planning. He has written several Power FAQs and
is Worldwide Focal for Techline FAQs Flash. He holds a Master's degree in Information
Technology from IASE University and Bachelor's degree (BTech) in Electronics &
Telecommunication Engineering from IETE, New Delhi. His area of expertise includes Power
Enterprise Pools, Red Hat OpenShift, Cloud Paks, and Hybrid Cloud.
Kulwinder Singh is a Technical Support Professional with IBM India Systems Development
Lab, IBM India. He has over 25 years of experience in the IT infrastructure management.
Currently he supports customers as IBM AIX L2 development support for IBM AIX, PowerHA,
IBM VM Recovery manager HA and DR on Power Systems. He holds Bachelor of computer
Application degree from St. Peter's University. His areas of expertise include IBM AIX, High
availability, and disaster recovery solutions, Spectrum Protect storage, SAN etc.
Tim Simon is a Redbooks Project Leader in Tulsa, Oklahoma, USA. He has over 40 years of
experience with IBM primarily in a technical sales role working with customers to help them
create IBM solutions to solve their business problems. He holds a BS degree in Math from
Towson University in Maryland. He has worked with many IBM products and has extensive
experience creating customer solutions using IBM Power, IBM Storage, and IBM System z®
throughout his career.
The following group of individuals were part of the residency that created the updates to this
Redbooks publication as well as two additional documents. Thanks for their support in the
residency.
Felipe Bessa is an IBM Brand Technical Specialist and Partner Technical Advocate on IBM
Power Systems in Brazil.
Carlos Jorge Cabanas Aguero is a Consultant with IBM Technology Lifecycle Services in
Argentina.
Dishant Doriwalais a Senior Staff Software engineer, Test Lead for VM Recovery Manager
for HA and DR product. He works in IBM Systems Development Labs (ISDL), Hyderabad,
India.
Santosh S Joshiis a Senior Staff Software Engineer in IBM India Systems Development Lab,
IBM India.
Antony Steel is a senior technical staff member working with IBM Australia.
Uma Maheswara Rao Chandolu, Director - PowerHA IBM AIX & VMRecovery Manager
HA/DR IBM Systems, IBM India
Jes Kiran Chittigala, HA & DR Architect for Power Systems VMRM, Master Inventor
IBM India System Development Labs
Preface xvii
7739pref.fm Draft Document for Review March 23, 2023 11:54 am
Vijay Yalamuri,
IBM India
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
[email protected]
Mail your comments to:
IBM Corporation, IBM Redbooks
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Preface xix
7739pref.fm Draft Document for Review March 23, 2023 11:54 am
Summary of changes
This section describes the technical changes made in this edition of the book and in previous
editions. This edition might also include minor corrections and editorial changes that are not
identified.
Summary of Changes
for SG24-7739-02
for PowerHA SystemMirror for AIX Cookbook
as created or updated on March 23, 2023.
New information
Includes information about the recent IBM PowerHA SystemMirror for AIX 7.2.7.
Added chapter on Cross-Site LVM stretched campus cluster.
Added chapter on PowerHA and PowerVS.
Added chapter on the GLVM wizard.
Changed information
Removed chapter on WPARs as they are no longer supported.
Updates were made to several of the chapters to incorporate the latest improvements to
IBM PowerHA SystemMirror for AIX.
Part 1
Part 1 Introduction
In Part 1, we provide an overview of PowerHA and describe the PowerHA components as part
of a successful implementation.
We also introduce the basic PowerHA management concepts, with suggestions and
considerations to ease the system administrator’s job.
With PowerHA SystemMirror software, critical resources remain available. For example, a
PowerHA SystemMirror cluster could run a database server program that services client
applications. The clients send queries to the server program that responds to their requests
by accessing a database, stored on a shared external disk.
This high availability system combines custom software with industry-standard hardware to
minimize downtime by quickly restoring services when a system, component, or application
fails. Although not instantaneous, the restoration of service is rapid, usually within 30 to 300
seconds.
High availability is sometimes confused with simple hardware availability. Fault tolerant,
redundant systems (such as RAID) and dynamic switching technologies (such as DLPAR)
provide recovery of certain hardware failures, but do not provide the full scope of error
detection and recovery required to keep a complex application highly available.
Recent surveys of the causes of downtime show that actual hardware failures account for
only a small percentage of unplanned outages. Other contributing factors include:
Operator errors
Environmental problems
Application and operating system errors
Reliable and recoverable hardware simply cannot protect against failures of all these different
aspects of the configuration. Keeping these varied elements, and therefore the application,
highly available requires:
Thorough and complete planning of the physical and logical procedures for access and
operation of the resources on which the application depends. These procedures help to
avoid failures in the first place.
A monitoring and recovery package that automates the detection and recovery from
errors.
A well-controlled process for maintaining the hardware and software aspects of the cluster
configuration while keeping the application available.
PowerHA - Features
IBM PowerHA technology positions you to deploy an HA solution that addresses storage and
high availability requirements with one integrated configuration, with a simplified user
interface.
IBM PowerHA SystemMirror for AIX is available in either Standard Edition or Enterprise
Edition. The Standard Edition is generally used for local (single site), or close proximity
cross-site/campus style clusters. The Enterprise Edition is more synonymous with disaster
recovery by using some form of data replication across diverse sites.
Test your PowerHA SystemMirror configuration by using the Cluster Test Tool. You can
evaluate how a cluster behaves under a set of specified circumstances, such as when a
node becomes inaccessible, a network becomes inaccessible, and so forth.
Ensure high availability of applications by eliminating single points of failure in a PowerHA
SystemMirror environment.
Leverage high availability features available in AIX.
Manage how a cluster handles component failures.
Secure cluster communications.
Monitor PowerHA SystemMirror components and diagnose problems that may occur.
High availability solutions should eliminate single points of failure through appropriate design,
planning, selection of hardware, configuration of software, control of applications, a carefully
controlled environment, and change management discipline.
In short, we can define high availability as the process of ensuring – through the use of
duplicated or shared hardware resources, managed by a specialized software component –
that an application is available for use.
A short definition for cluster multiprocessing might be multiple applications running over
several nodes with shared or concurrent access to the data.
The cluster multiprocessing component depends on the application capabilities and system
implementation to efficiently use all resources available in a multi-node (cluster) environment.
This must be implemented starting with the cluster planning and design phase.
PowerHA is only one of the component of your high availability environment. It provides
monitoring and automated response to issues which occur in an ecosystem of increasingly
reliable operating systems, hot-swappable hardware, and increasingly resilient applications.
PowerHA also provides disaster recovery functionality such as cross site mirroring, IBM
HyperSwap® and Geographical Logical Volume Mirroring. These cross-site clustering
methods support PowerHA functionality between two geographic sites. Various additional
methods exist for replicating the data to remote sites on both IBM and non-IBM storage. For
more on information options and supported configurations go to:
https://www.ibm.com/docs/en/SSPHQG_7.2/pdf/storage_pdf.pdf.
Enhanced Hours Until last transaction Double the basic hardware cost
stand-alone
High availability Seconds Until last transaction Double hardware and additional
clusters services; more costs
The highly available solution for IBM POWER systems offers distinct benefits:
Proven solution (more than 25 years of product development).
Using “off the shelf” hardware components.
Proven commitment for supporting our customers.
IP version 6 (IPv6) support for both internal and external cluster communication.
Smart Assist technology enabling high availability support for all prominent applications.
Flexibility (virtually any application running on a stand-alone AIX system can be protected
with PowerHA).
When you plan to implement a PowerHA solution, consider the following aspects:
Thorough HA design and detailed planning from end to end.
Elimination of single points of failure.
Selection of appropriate hardware.
A typical PowerHA environment is shown in Figure 1-1. Both IP heartbeat networks and
non-IP network heartbeating is performed through the cluster repository disk.
1.2.1 Downtime
Downtime is the period when an application or service is unavailable to its clients. Downtime
can be classified in two categories, planned and unplanned:
Planned:
– Hardware upgrades
– Hardware/Software repair/replacement
– Software updates/upgrades
– Backups (offline backups)
– Testing (periodic testing is required for cluster validation)
– Development
Unplanned:
– Administrator errors
– Application failures
– Hardware failures
– Operating system errors
– Environmental disasters
The role of PowerHA is to maintain application availability through the unplanned outages and
normal day-to-day administrative requirements. PowerHA provides monitoring and automatic
recovery of the resources on which your application depends.
Good design can remove single points of failure in the cluster: nodes, storage, and networks.
PowerHA manages these, and also the resources required by the application (including the
application start/stop scripts).
As previously mentioned, a good design is able to avoid single points of failure, and PowerHA
can manage the availability of the application when downtimes occur. Table 1-2 lists cluster
objects which can result in loss of availability of the application if they fail. Each cluster object
can be a physical or virtual component.
Power supply Multiple circuits, power supplies, or uninterruptible power supply (UPS)
TCP/IP subsystem Use of non-IP networks to connect each node to its neighbor in a ring
Resource groups Use of resource groups to control all resources required by an application
In addition, other management tasks – such as modifying storage or managing users – can
be performed on the running cluster using the Cluster Single Point of Control (C-SPOC)
without interrupting user access to the application running on the cluster nodes. C-SPOC also
ensures that changes made on one node are replicated across the cluster in a consistent
manner.
HACMP was originally designed as a stand-alone product (known as HACMP classic). After
the IBM high availability infrastructure, known as Reliable Scalable Clustering Technology
(RSCT), became available HACMP adopted the technology and became HACMP Enhanced
Scalability (HACMP/ES) which provided performance and functional advantages over the
classic version. Later, HACMP terminology was renamed to PowerHA starting at v5.5 and
then to PowerHA SystemMirror with v6.1.
Starting with the PowerHA SystemMirror 7.1, the Cluster Aware AIX (CAA) feature of the
operating system is used to configure, verify, and monitor the cluster services. This major
change improved the reliability of PowerHA because the cluster service functions were moved
to kernel space, rather than running in user space. CAA was introduced in AIX 6.1TL6.
At the time of the writing of this book, the current release is PowerHA SystemMirror 7.2.7.
– Create and manage logical volumes, mirror pools, and resource group dependencies.
– Generate the snapshot report and view contents of a snapshot before you restore a
snapshot.
– Enhanced cluster reports to display details about repository disks and methods.
– Non-root support for cloning cluster from snapshots.
– Configure PowerHA SystemMirror GUI server to be highly available by using the High
Availability wizard option in the PowerHA SystemMirror GUI.
– Import multiple clusters by using the Add Multiple Clusters wizard.
– Enhanced activity log to display the activity ID, start time and end time of the activity,
and the duration of the activity. You can also view details such as the number of
successful login attempts and failed login attempts, the number of new activities, and
the number activities that are not viewed.
– Enhanced security features with options to disable anonymous login and global
access.
– Automatic download and installation of the remaining files that are required to complete
the PowerHA SystemMirror GUI installation process from the IBM website by using the
smuiinst.ksh command.
options use storage outside of the cluster domain to read and write files. This determine
the winning and losing sides of a partition. PowerHA SystemMirror Version 7.2.5
introduces a new Cloud option that uses cloud-based storage for the same purpose. This
feature supports IBM and Amazon Web Services (AWS) cloud services. For more
information, see Configuring split and merge policies.
Enhancements to clverify option for netmon.cf file content validation
PowerHA SystemMirror includes a robust verification mechanism, which checks multiple
aspects of the cluster and AIX configuration for proper settings and consistency. In
PowerHA SystemMirror Version 7.2.5, or later, the cluster verification operation includes
checking an optional netmon.cf file. This verification process avoids false network events.
The cluster verification process now verifies the content and consistency of the netmon.cf
file across the cluster nodes.
Oracle migration support
During Oracle database migration, a customer might change the Oracle home directory. In
such cases, PowerHA SystemMirror Smart Assist for Oracle must be configured with new
Oracle home directory. In PowerHA SystemMirror Version 7.2.5, or later, a new option is
introduced, which automatically updates PowerHA SystemMirror Smart Assist for Oracle
to use the new Oracle home directory.
Oracle temporary file removal
During discovery, start, stop, and monitoring operations of PowerHA SystemMirror Smart
Assist for Oracle, temporary files are created in the /tmp directory. However, system failure
might occur due to increase in size of the /tmp directory and low availability of free space.
In PowerHA SystemMirror Version 7.2.5, or later, you can use the PowerHA SystemMirror
default log directory for intermediate file operations.
SMIT enhancements
After changing the PowerHA SystemMirror configuration, the cluster must be verified and
synchronized to implement the updates. If the updates are made by using the System
Management Interface Tool (SMIT) interface, the option to verify and synchronize was not
available for some SMIT panels. In PowerHA SystemMirror Version 7.2.5, or later, the
verify and synchronize options are available on most SMIT panels.
Cross-cluster verification utility enhancements
The cluster verification utility checks the cluster configuration on all nodes within the same
cluster. In PowerHA SystemMirror Version 7.2.5, or later, you can use the cross-cluster
verification (ccv) utility to compare Cluster Aware AIX (CAA) tunables between two
different clusters.
EMC SRDF/Metro SmartDR configuration
PowerHA SystemMirror Version Enterprise Edition 7.2.5 SP1 added EMC SRDF/Metro
SmartDR configuration, which is a two-region High Availability and Disaster Recovery
(HADR) framework, that integrates SRDF/Metro and SRDF/Async replicated resources.
GLVM Configuration Assistant enhancements
In PowerHA SystemMirror Version Enterprise Edition 7.2.5 the Geographic Logical
Volume Manager (GLVM) Configuration Assistant is enhanced with new features that
converts an existing volume group to GLVM-based volume group and updates an existing
resource group to include GLVM resources. Also, the delete or rollback feature that is used
for removing resources and configurations that is performed by using the GLVM
Configuration Assistant is improved. For more information, see Geographic Logical
Volume Manager.
RPV statistics data is automatically sent to the PowerHA SystemMirror GUI, which
displays it in a graphical format.
– Added GLVM policies such as compression, io_grp_latency and no_parallel_ls.
Cloud RAS enhancements
For Cloud Backup Management, Reliability, Availability, and Serviceability (RAS) has been
enhanced with improved logging process.
Standard to linked cluster conversion
In PowerHA SystemMirror Version 7.2.6 an existing standard cluster can be dynamically
changed to a linked cluster by using the clmgr command. This feature is useful for
converting a standard cluster to IBM Power Virtual Server (PowerVS) cloud cluster. For
more information, see Converting a standard cluster to a linked cluster.
PowerHA SystemMirror GUI:
– GLVM historical charts
GLVM historical charts provide information about cache utilization data in a graphical
format. You can view the historical data about cache utilization, network utilization, and
disk utilization for the specified date range and for different time intervals (minute, hour,
day, week, and month).
– Asynchronous cache size in GLVM
You can view and modify asynchronous cache size that is set during GLVM
configuration.
– GLVM policies
GLVM tunables that are used to configure mirror pool in the physical volume at the
remote site. In PowerHA SystemMirror Version 7.2.6, or later, you can set the following
GLVM tunable attributes:
• Compression
• I/O group latency
• Number of parallel logical volume
– Multi-factor authentication
In PowerHA SystemMirror Version 7.2.6, or later, multi-factor authentication is enabled
for non-root GUI users. PowerHA SystemMirror GUI uses IBM Security™ Verify
Access account for multi-factor authentication. Multi-factor authentication can be
performed by using either mobile authentication or email authentication.
The mobile authentication method uses the login credentials (username and
password). For the email authentication method, you must select either one-time
password (OTP) that is delivered through an email or select the Short Message
Service (SMS).
Note: You must create an IBM Security Verify application account to use the multi-factor
authentication features.
1.4.1 Terminology
The terminology used to describe PowerHA configuration and operation continues to evolve.
The following terms are used throughout this book:
Cluster Loosely-coupled collection of independent systems (nodes) or logical
partitions (LPARs) organized into a network for the purpose of sharing
resources and communicating with each other.
PowerHA defines relationships among cooperating systems where
peer cluster nodes provide the services offered by a cluster node if
that node is unable to do so. These individual nodes are together
responsible for maintaining the functionality of one or more
applications in case of a failure of any cluster component.
Node An IBM Power system (or LPAR) running AIX and PowerHA that is
defined as part of a cluster. Each node has a collection of resources
(disks, file systems, IP addresses, and applications) that can be
transferred to another node in the cluster in case the node or a
component fails.
Clients A client is a system that can access the application running on the
cluster nodes over a local area network (LAN). Clients run a client
application that connects to the server (node) where the application
runs.
1.4.2 Concepts
The basic concepts of PowerHA can be classified as follows:
Topology Contains basic cluster components nodes, networks, communication
interfaces, and communication adapters.
Resources Logical components or entities that are being made highly available
(for example, file systems, raw devices, service IP labels, and
applications) by being moved from one node to another. All
resources that together form a highly available application or service,
are grouped together in resource groups (RG).
PowerHA keeps the RG highly available as a single entity that can be
moved from node to node in the event of a component or node
failure. Resource groups can be available from a single node or, in
the case of concurrent applications, available simultaneously from
multiple nodes. A cluster can host more than one resource group,
thus allowing for efficient use of the cluster nodes.
Service IP label A label that matches to a service IP address and is used for
communications between clients and the node. A service IP label is
part of a resource group, which means that PowerHA can monitor it
and keep it highly available.
IP address takeover The process whereby an IP address is moved from one adapter to
another adapter on the same logical network. This adapter can be on
the same node, or another node in the cluster. If aliasing is used as
the method of assigning addresses to adapters, then more than one
address can reside on a single adapter.
Resource takeover This is the operation of transferring resources between nodes inside
the cluster. If one component or node fails because of a hardware or
operating system problem, its resource groups are moved to the
another node.
Fallover This represents the movement of a resource group from one active
node to another node (backup node) in response to a failure on that
active node.
Fallback This represents the movement of a resource group back from the
backup node to the previous node, when it becomes available. This
movement is typically in response to the reintegration of the
previously failed node.
Heartbeat packet A packet sent between communication interfaces in the cluster, used
by the various cluster daemons to monitor the state of the cluster
components (nodes, networks, adapters).
RSCT daemons These consist of two types of processes (topology and group
services) that monitor the state of the cluster and each node. The
cluster manager receives event information generated by these
daemons and takes corresponding (response) actions in case of any
failure.
Group leader The node with the highest IP address as defined in one of the
PowerHA networks (the first network available), that acts as the
central repository for all topology and group data coming from the
RSCT daemons concerning the state of the cluster.
Group leader backup This is the node with the next highest IP address on the same
arbitrarily chosen network, that acts as a backup for the group
leader. It takes over the role of group leader in the event that the
group leader leaves the cluster.
Mayor A node chosen by the RSCT group leader (the node with the next
highest IP address after the group leader backup), if such exists, else
it is the group leader backup itself. The mayor is responsible for
informing other nodes of any changes in the cluster as determined
by the group leader.
All components, CPUs, memory, and disks have a special design and provide continuous
service, even if one sub-component fails. Only special software solutions can run on fault
tolerant hardware.
Such systems are expensive and extremely specialized. Implementing a fault tolerant solution
requires a lot of effort and a high degree of customization for all system components.
In such systems, the software involved detects problems in the environment, and manages
application survivability by restarting it on the same or on another available machine (taking
over the identity of the original machine: node).
Therefore, eliminating all single points of failure (SPOF) in the environment is important. For
example, if the machine has only one network interface (connection), provide a second
network interface (connection) in the same node to take over in case the primary interface
providing the service fails.
Another important issue is to protect the data by mirroring and placing it on shared disk areas,
accessible from any machine in the cluster.
The PowerHA software provides the framework and a set of tools for integrating applications
in a highly available system.
Remember, PowerHA is not a fault tolerant solution and should never be misconstrued as one.
7200-03-01 3.2.4.0
7200-04-01 3.2.5.0
7200-03-01 3.2.4.0
7200-04-01 3.2.5.0
7100-05-05 3.2.3.0
7200-01-06 3.2.2.0
7200-02-04 3.2.3.0
7200-03-03 3.2.4.0
7200-04-01 3.2.5.0
7200-04-02 3.2.5.0
7200-05-00 3.2.6.0
7200-01-06 3.2.2.0
7200-02-06 3.2.3.0
7200-03-07 3.2.4.0
7200-04-04 3.2.5.0
7200-05-03 3.2.6.0
7300-00-00 3.3.0.0
7200-01-06 3.2.2.0
7200-02-06 3.2.3.0
7200-03-06 3.2.4.0
7200-04-06 3.2.5.0
7200-05-05 3.2.6.0
7300-00-02 3.3.0.0
7300-01-01 3.3.1.0
a. authorized program analysis report (APAR)
The current list of recommended service packs for PowerHA are at the PowerHA AIX code
Level Reference Table.
The following AIX base operating system (BOS) components are prerequisites for PowerHA:
bos.adt.lib
bos.adt.libm
bos.adt.syscalls
bos.ahafs
bos.cluster
bos.clvm.enh
bos.data
bos.net.tcp.client
bos.net.tcp.server
bos.rte.SRC
bos.rte.libc
bos.rte.libcfg
bos.rte.libcur
bos.rte.libpthreads
bos.rte.lvm
bos.rte.odm
devices.common.IBM.storfwork.rte (optional, but required for sancomm)
To determine if the appropriate file sets are installed and what their levels are, issue the
following commands:
/usr/bin/lslpp -l rsct.compat.basic.hacmp
/usr/bin/lslpp -l rsct.compat.clients.hacmp
/usr/bin/lslpp -l rsct.basic.rte
/usr/bin/lslpp -l rsct.core.rmc
If the file sets are not present, install the appropriate version of RSCT as shown in Table 1-3
on page 20.
The application might also require a unique node-bound license (a separate license file on
each node).
Some applications also have restrictions with the number of floating licenses available within
the cluster for that application. To avoid this problem, be sure that you have enough licenses
for each cluster node so the application can run simultaneously on multiple nodes (especially
for concurrent applications).
on the active node. This is often referred to as N+1 with N being the total cores on the active
node. If a two-node cluster is configure for mutual takeover, also known as dual hot-standby,
then you have to provision PowerHA licenses for all the activated cores for both nodes to be
license compliant. If you have any questions about licensing contact your IBM sales
representative or IBM Business Partner.
PowerHA licensing considerations in IBM Cloud and IBM Power Virtual Server:
The following considerations should be taken into consideration when licensing PowerHA on
IBM Cloud:
Customer’s perpetual PowerHA licenses cannot be transferred to the cloud.
Customers cannot bring their own PowerHA license to the cloud.
Customers can acquire PowerHA for AIX fixed term licenses and deploy on a system
within their enterprise or in the cloud or a service provider machine.
When deploying in public cloud, you should consider licensing as many processors as you
want to hold in reserve on the secondary node.
Public cloud is multi-tenant, therefore scaling up capacity on the secondary node upon
fail-over cannot be guaranteed.
Enterprise Edition is required when replicating data within a PowerHA cluster.
Standard Edition is deployed only with a shared-storage configuration.
Sub-capacity licensing means that only the processor cores that are incorporated into the
cluster need be licensed.
N+1 licensing means that the second node (system/LPAR) in the cluster requires only one
PowerHA license. This is not necessarily applicable in public cloud.
For example, if all the data for a critical application resides on a single disk, then that disk is a
single point of failure for the entire cluster and if that specific disk fails it cannot be protected
by PowerHA. AIX logical volume manager or storage subsystems protection must be used in
this case. PowerHA only provides takeover for the disk on the backup node, to make the data
available for use.
This is why PowerHA planning is so important, because your major goal throughout the
planning process is to eliminate single points of failure. A single point of failure exists when a
critical cluster function is provided by a single component. If that component fails, the cluster
has no other way of providing that function, and the application or service dependent on that
component becomes unavailable.
Also keep in mind that a well-planned cluster is easy to install, provides higher application
availability, performs as expected, and requires less maintenance than a poorly planned
cluster. Planning worksheets are provided in Appendix A, “Paper planning worksheets” on
page 579 to help you get started.
If you choose NIM, you must copy all the PowerHA file sets onto the NIM server and define an
lpp_source resource before proceeding with the installation.
To install the PowerHA software on a server node, complete the following steps:
1. If you are installing directly from the installation media, such as a DVD image or from a
local repository, enter the smitty install_all fast path command. The System
Management Interface Tool (SMIT) displays the “Install and Update from ALL Available
Software” panel.
2. Enter the device name of the installation medium or installation directory in the INPUT
device/directory for software field and press Enter.
3. Enter the corresponding field values.
To select the software to install, press F4 for a software listing, or enter all to install all
server and client images. Select the packages you want to install according to your cluster
configuration. Some of the packages might require prerequisites that are not available in
your environment.
The following file sets are required and must be installed on all servers:
– cluster.es.server
– cluster.es.client
– cluster.cspoc
Read the license agreement and select Yes in the Accept new license agreements field.
You must choose Yes for this item to proceed with installation. If you choose No, the
installation might stop, and issue a warning that one or more file sets require the software
license agreements. You accept the license agreement only once for each node.
4. Press Enter to start the installation process.
Tip: A good practice is to download and install the latest PowerHA Service Pack at the time
of installation from https://tinyurl.com/pha726sps.
Post-installation steps
To complete the installation, complete the following steps:
1. Verify the software installation by using the AIX lppchk command, and check the installed
directories to see if the expected files are present.
2. Run the lppchk -v and lppchk -c cluster* commands. No output will be produced if the
installation is good; if not, use the proper problem determination techniques to fix any
problems.
3. A reboot might be required if RSCT prerequisites have been installed since the last time
the system was rebooted.
More information
For more information about upgrading PowerHA, see Chapter 5, “Migration” on page 155.
When the cluster is configured, the cluster topology and resource information is entered on
one node. A verification process is then run and the data synchronized out to the other nodes
that are defined in the cluster. PowerHA keeps this data in its own Object Data Manager
(ODM) classes on each node in the cluster.
Although PowerHA can be configured or modified from any node in the cluster, a good
practice is to perform administrative operations from one node to ensure that PowerHA
definitions are kept consistent across the cluster. This prevents a cluster configuration update
from multiple nodes that might result in inconsistent data.
Installation changes
The following AIX configuration changes are made:
These files are modified:
– /etc/inittab
– /etc/rc.net
– /etc/services
– /etc/snmpd.conf
– /etc/snmpd.peers
– /etc/syslog.conf
– /etc/trcfmt
– /var/spool/cron/crontabs/root
The hacmp group is added.
The /etc/hosts file can be changed by adding or modifying entries using the cluster
configuration and verification auto-correct option.
The following network options are set to 1 (1 = enabled) on startup:
– routerevalidate
The verification utility ensures that the value of each network option is consistent across
all cluster nodes for the following settings:
– tcp_pmtu_discover
– udp_pmtu_discover
– ipignoreredirects
– nbc_limit
– nbc_pseg_limit
Services:
– IBM.ConfigRM
– IBM.HostRM
– IBM.ServiceRM
– Group (cthags)
– Resource monitoring and control (ctrmc)
Figure 2-1 on page 31 shows a typical cluster topology and has these components:
Two nodes
Two IP networks (PowerHA logical networks) with redundant interfaces on each node
Shared storage
Repository disk
2.3.2 Sites
The use of sites is optional. They are primarily designed for use in cross-site LVM mirroring,
PowerHA Enterprise Edition (PowerHA/EE) configurations, or both. A site consists of one or
more nodes that are grouped together at a location. PowerHA supports a cluster that is
divided into two sites. Site relationships also can exist as part of a resource group’s definition,
but should be set to ignore if sites are not used.
Although using sites outside of PowerHA/EE and cross-site LVM mirroring is possible,
appropriate methods or customization must be provided to handle site operations. If sites
are defined, site specific events are run during node_up and node_down events that may be
unnecessary.
When defining the cluster node, a unique name must be assigned and a communication path
to that node must be supplied (IP address or a resolvable IP label associated with one of the
interfaces on that node). The node name can be the host name (short), a fully qualified name
(host name.domain.name), or any name up to 64 characters: [a-z], [A-Z], [0-9], hyphen (-), or
underscore (_). The name can start with either an alphabetic or numeric character.
The communication path is first used to confirm that the node can be reached, then used to
populate the ODM on each node in the cluster after secure communications are established
between the nodes. However, after the cluster topology and CAA cluster are configured, any
interface can be used to attempt to communicate between nodes in the cluster.
Important: If you want the node name to differ from the system host name, you must
explicitly state the host name IP address for the communication path.
2.3.4 Networks
In PowerHA, the term network is used to define a logical entity that groups the communication
interfaces used for IP communication between the nodes in the cluster, and for client access.
The networks in PowerHA can be defined with an attribute of either public (which is the
default) or private. Private networks indicate to CAA to not be used for heartbeat or
communications.
Each interface is capable of hosting several IP addresses. When configuring a cluster, you
define the IP addresses that PowerHA monitors by using CAA and the IP addresses that
PowerHA itself keeps highly available (the service IP addresses and persistent aliases).
Service IP label or address An IP label or address over which a service is provided. It can
be bound to a single node or shared by multiple nodes.
Although not part of the topology, these are the addresses that
PowerHA keeps highly available as they are defined as a
resource within a resource group.
Boot interface Previous versions of PowerHA used the terms boot adapter
and standby adapter depending on the function. These terms
are collapsed into one term (boot interface) to describe any IP
network interface that can be used by PowerHA to host a
service IP label or address.
IP aliases An IP alias is an IP address that is added to an interface, rather
than replacing its base IP address. This is an AIX function that
is supported by PowerHA. However, PowerHA assigns to the IP
alias the same subnet mask of the base IP address over which
it is configured.
Logical network interface The name to which AIX resolves a port (for example, en0) of a
physical network adapter.
Important: A good practice is to have all those IP addresses defined in the /etc/hosts file
on all nodes in the cluster. There is certainly no requirement to use fully qualified names.
While PowerHA is processing network changes, the NSORDER variable is set to local (for
example, pointing to /etc/hosts). However, another good practice is to set this in
/etc/netsvc.conf file.
Network definitions can be added using the SMIT panels. However, during the initial cluster
configuration a discovery process is run which automatically defines the networks and
assigns the interfaces to them.
The discovery process harvests information from the /etc/hosts file, defined interfaces,
defined adapters, and existing enhanced concurrent mode disks. The process then creates
the following files in the /usr/es/sbin/cluster/etc/config directory:
clip_config Contains details of the discovered interfaces; used in the F4 SMIT
lists.
clvg_config Contains details of each physical volume (PVID, volume group name,
status, major number, and so on) and a list of free major numbers.
Running discovery can also reveal any inconsistency in the network at your site.
PowerHA SystemMirror 7.1 and later uses CAA services to configure, verify, and monitor the
cluster topology. This is a major reliability improvement because core functions of the cluster
services, such as topology related services, now run in the kernel space. This makes it much
less susceptible to interference by the workloads running in the user space.
Communication paths
Cluster communication is achieved by communicating over multiple redundant paths. The
following redundant paths provide a robust clustering foundation that is less prone to cluster
partitioning:
TCP/IP
PowerHA SystemMirror and Cluster Aware AIX, either through multicast or unicast, use all
network interfaces that are available for cluster communication. All of these interfaces are
discovered by default and used for health management and other cluster communication.
You can use the PowerHA SystemMirror management interfaces to remove any interface
that you do not want to be used by specifying these interfaces in a private network.
If all interfaces on that PowerHA network are unavailable on that node, PowerHA transfers
all resource groups containing IP labels on that network to another node with available
interfaces. This is a default behavior associated with a feature called selective fallover on
network loss.
SAN-based (sfwcomm)
A redundant high-speed path of communication is established between the hosts by using
the storage area network (SAN) fabric that exists in any data center between the hosts.
Discovery-based configuration reduces the burden for you to configure these links.
Repository disk
Health and other cluster communication is also achieved through the central repository
disk.
Repository disk
Cluster Aware AIX maintains cluster related configuration information such as node list,
various cluster tunables, etc. Note that all of this configuration information is also maintained
in memory by Cluster Aware AIX (CAA). Hence in a live cluster, CAA can recreate the
configuration information when a new replacement disk for the repository disk is provided.
Configuration management
CAA identifies the repository disk by an unique 128 bit UUID. The UUID is generated in the
AIX storage device drivers using the characteristics of the disk concerned. CAA stores the
repository disk related identity information in the AIX ODM CuAt as part of the cluster
information. Example 2-1 on page 35 is a sample output from a PowerHA 7.1.3 cluster:
CuAt:
name = "cluster0"
attribute = "clvdisk"
value = "2fb6d8b9-1147-45f9-185b-4e8e67716d4d"
type = "R"
generic = "DU"
rep = "s"
nls_index = 2
When an additional node tries to join a cluster during AIX boot time, CAA uses the ODM
information to locate the repository disk. The repository disk must be reachable to retrieve the
necessary information to join and synchronize with all other nodes in the cluster. If CAA is not
able to reach the repository disk, then CAA will not proceed with starting the cluster services
and will log the error about the repository disk in the AIX errorlog. In this case, the
administrator fixes the repository disk related issues and then starts CAA manually.
If a node failed to join a cluster because the ODM entry is missing, the ODM entry can be
repopulated and the node forced to join the cluster using clusterconf. This assumes the
administrator knows the hard disk name for the repository disk (clusterconf -r hdisk#).
Health management
The repository disk plays a key role in bringing up and maintaining the health of the cluster.
The following are some of the ways the repository disk is used for heartbeats, cluster
messages and node to node synchronization.
There are two key ways the repository disk is used for health management across the cluster:
1. Continuous health monitoring.
2. Distress time cluster communication.
For continuous health monitoring CAA and disk device drivers maintain health counters per
node. These health counters are updated and read at least once every two seconds by the
storage framework device driver. The health counters of the other nodes are compared every
6 seconds to determine if the other nodes are still functional. These time setting may be
changed in the future if necessary.
When all the network interfaces on a node have failed, then the node is in a distress condition.
In this distress environment, CAA and the storage framework use the repository disk to do all
the necessary communication between the distressed node and other nodes. Note that this
type of communication requires certain area of the disk to be set aside per node for writing
the messages meant to be delivered to other nodes. This disk space is automatically
allocated at cluster creation time. No action from the customer is needed. When operating in
this mode, each node has to scan the message areas of all other nodes several times per
second to receive any messages meant for them.
Note that as this second method of communication is not the most efficient form of
communication. It requires more polling of the disk and it is expected that this form of
communication is used only when the cluster is in distress mode. A fallover of selective
fallover network loss occurs automatically without any user intervention.
Failures of any of these writes and reads will result in repository failure related events to CAA
and PowerHA. This means the administrator would have to provide a new disk to be used as
a replacement disk for the original failed repository disk.
In the event of a repository disk failure, PowerHA detects the failure of the active repository
disk. At that point, it verifies the active repository disk is not usable. If not, it attempts to switch
to the backup repository disk. If it is successful, then the backup repository disk becomes the
active repository disk. For the process for replacing the repository disk in PowerHA 7.2.7, see
6.6, “Repository disk replacement” on page 218.
Therefore, starting with PowerHA SystemMirror 7.1.2 Enterprise Edition, PowerHA does not
allow a node to operate if it no longer has access to the repository disk and also registers an
abnormal node down event. This allows a double failure scenario to be tolerated.
PowerHA SystemMirror Enterprise Edition v7.1.3 introduced the manual split and merge
policies. It can and should be applied globally across the cluster. However, there is also an
option to specify whether it should apply to storage replication recovery.
Split handling None: This is the default setting. Select this for the partitions to operate
policy independently of each other after the split occurs.
Tie breaker:
Disk - Select this to use the disk that is specified in the Select tiebreaker
field after a split occurs. When the split occurs, one site wins the SCSI
reservation on the tie breaker disk. The site that looses the SCSI reservation
uses the recovery action that is specified in the policy setting. The disk used
must support SCSI3-persistent or SCSI2 reserve to be a suitable candidate
disk.
Note: If you select TieBreaker-Disk in the Merge handling policy field,
you must select TieBreaker-Disk for this field.
NFS - Select this option to specify an NFS file as the tie breaker.
During the cluster split, a predefined NFS file is used to decide the
winning partition. The partition that losses the NFS file reservation
uses the recovery action that is specified in the policy setting.
Note: If you select TieBreaker-NFS in the Merge handling policy field,
you must select TieBreaker-NFS for this field.
Manual: Select this to wait for manual intervention when a split occurs.
PowerHA SystemMirror does not perform any actions on the cluster until
you specify how to recover from the split.
Note: If you select Manual in the Merge handling policy field, you must
select Manual for this field.
Cloud: Select this to use a bucket from either IBM Cloud or AWS in a
tiebreaker type fashion.
Note: If you select Cloud in the Merge handling policy field, you must
select Cloud for this field.
Merge handling Majority: Select this to choose the partition with the highest number
policy of nodes the as primary partition.
Tie breaker:
Disk - Select this to use the disk that is specified in the Select
tiebreaker field after a split occurs. When the split occurs, one site
wins the SCSI reservation on the tie breaker disk. The site that
losses the SCSI reservation uses the recovery action that is
specified in the policy setting.
Note: If you select TieBreaker-Disk in the Split handling policy field, you
must select TieBreaker-Disk for this field.
NFS - Select this option to specify an NFS file as the tie breaker.
During the cluster split, a predefined NFS file is used to decide the
winning partition. The partition that looses the NFS file reservation
uses the recovery action that is specified in the policy setting.
Note: If you select TieBreaker-NFS in the Split handling policy field, you
must select TieBreaker-NFS for this field.
Manual: Select this to wait for manual intervention when a split occurs.
PowerHA SystemMirror does not perform any actions on the cluster until
you specify how to recover from the split.
Note: If you select Manual in the Split handling policy field, you must
select Manual for this field.
Cloud: Select this to use a bucket from either IBM Cloud or AWS in
a tiebreaker type fashion.
Note: If you select Cloud in the Split handling policy field, you must
select Cloud for this field.
Split and merge Reboot: Reboots all nodes in the site that does not win the tie breaker or is
action plan not responded to manually when using the manual choice option.
Disable Applications Auto-Start and Reboot: Select this option to reboot
nodes on the losing partition when a cluster split event occurs. If you select
this option, the resource groups are not brought online automatically after
the system reboots.
Note: This option is available only if your environment is running AIX
Version 7.2.1, or later.
Disable Cluster Services Auto-Start and Reboot: Select this option to
reboot nodes on the losing partition when a cluster split event occurs. If you
select this option, Cluster Aware AIX (CAA) is not started. The resource
groups are not brought online automatically. After the cluster split event is
resolved,in SMIT, you must select Problem DeterminationTools → Start
CAA on Merged Node to restore the cluster.
Select tie breaker Select an iSCSI disk or a SCSI disk that you want to use as the tie breaker
disk. It must support either SCSI-2 or SCSI-3 reserves.
NFS export This field is available if you specify Tie Breaker - NFS in both the Split
server Handling Policy and the Merge Handling Policy fields. Specify the fully
qualified domain name of the NFS server that is used for the NFS
tie-breaker. The NFS server must be accessible from each node in the
cluster by using the NFS server IP address.
Local mount This field is available if you specify Tie Breaker - NFS in both the Split
directory Handling Policy and the Merge Handling Policy fields.
Specify the absolute path of the NFS mount point that is used for the NFS
tie-breaker. The NFS mount point must be mounted on all nodes in the
cluster.
NFS Export This field is available if you specify Tie Breaker - NFS in both the Split
Directory Handling Policy and the Merge Handling Policy fields.
Specify the absolute path of the NFSv4 exported directory that is used for
the NFS tie-breaker. The NFS exported directory must be accessible from
all nodes in the cluster that use NFSv4.
You must verify that the following services are active in the NFS server:
biod
nfsd
nfsgryd
portmap
rpc.lockd
rpc.mountd
rpc.statd
TCP
You must verify that the following services are active in the NFS
client on all cluster nodes:
biod
nfsd
rpc.mountd
rpc.statd
TCP
Important: Starting with PowerHA 7.1.0 and later, the RSCT topology service
subsystem is deactivated and all its functions are performed by CAA topology services.
Figure 2-2 shows CAA, RSCT daemons, and how they interact with each other and the
PowerHA daemons and with other applications.
node1 node2
As IP addresses are added to the interface through aliasing, more than one service IP label
can coexist on one interface. By removing the need for one interface per service IP address
that the node can host, IPAT through aliasing is the more flexible option and in some cases
can require less hardware. IPAT through aliasing also reduces fallover time, because adding
an alias to an interface is faster than removing the base IP address and then applying the
service IP address.
IPAT through aliasing is supported only on networks that support the gratuitous ARP function
of AIX. Gratuitous ARP is when a host sends out an ARP packet before using an IP address
and the ARP packet contains a request for this IP address. In addition to confirming that no
other host is configured with this address, it ensures that the ARP cache on each machine on
the subnet is updated with this new address.
If multiple service IP alias labels or addresses are active on one node, PowerHA by default
equally distributes them among all available interfaces on the logical network. This placement
can be controlled by using distribution policies, which is explained in more detail in 12.4,
“Site-specific service IP labels” on page 488.
For IPAT through aliasing, each boot interface on a node must be on a different subnet,
though interfaces on different nodes can obviously be on the same subnet. The service IP
labels can be on the same subnet as the boot adapter only if it is a single adapter
configuration. Otherwise, they must be on separate subnets also.
Important: For IPAT through aliasing networks, PowerHA will briefly have the service IP
addresses active on both the failed Interface and the takeover interface so it can preserve
routing. This might cause a DUPLICATE IP ADDRESS error log entry, which can be ignored.
Assigning a persistent node IP label for a network on a node allows you to have a highly
available node-bound address on a cluster network. This address can be used for
administrative purposes because it always points to a specific node regardless of whether
PowerHA is running.
Note: There can be one persistent IP label per network per node. For example, if a node is
connected to two networks that are defined in PowerHA, that node can be identified
through two persistent IP labels (addresses), one for each network.
The persistent IP labels are defined in the PowerHA configuration, and they become available
when the cluster definition is synchronized. A persistent IP label remains available on the
interface it was configured, even if PowerHA is stopped on the node, or the node is rebooted.
If the interface on which the persistent IP label is assigned fails while PowerHA is running, the
persistent IP label is moved to another interface in the same logical network on the same
node.
The persistent IP alias must be on a different subnet from each of the boot interface subnets
and can be either in the same subnet or in a different subnet of the service IP address. If the
node fails or all interfaces on the logical network on the node fail, then the persistent IP label
will no longer be available.
For more details about cluster security, see 8.1, “Cluster security” on page 338.
The Cluster Communications daemon is started by inittab, with the entry being created
by the installation of PowerHA. The daemon is controlled by the system resource controller,
so startsrc, stopsrc, and refresh work. In particular, refresh is used to reread
/etc/cluster/rhosts and move the log files.
The use of the /etc/cluster/rhosts file is before the cluster is first synchronized in an
insecure environment. After the CAA cluster is created, the only time that the file is needed is
when adding more nodes to the cluster. After the cluster is synchronized and CAA cluster
created, the contents within the file can be deleted. However, do not remove the actual file.
The Cluster Communications daemon provides the transport medium for PowerHA cluster
verification, global ODM changes, and remote command execution. The following commands
use clcomd (they cannot be run by a standard user):
clrexec Run specific and potentially dangerous commands.
cl_rcp Copy AIX configuration files.
cl_rsh Used by the cluster to run commands in a remote shell.
clcmd Takes an AIX command and distributes it to a set of nodes that are members of
a cluster.
The logging for the clcomd daemon is turned on by default, and the log files clcomd.log and
clcomddiag.log, can be found within the /var/hacmp/clcomd directory.
2.4.1 Definitions
PowerHA uses the underlying topology to ensure that the applications under its control and
the resources they require are kept highly available.
– Service IP labels or addresses
– Physical disks
– Volume groups
– Logical volumes
– File systems
– NFS
– Application controller scripts
– Application monitors
– Tape resources
The applications and the resources required are configured into resource groups. The
resource groups are controlled by PowerHA as single entities whose behavior can be tuned
to meet the requirements of clients and users.
Figure 2-4 shows resources that PowerHA makes highly available, superimposed on the
underlying cluster topology.
tty2
Service IP label
en0 1
en1
node1
tty2 Service IP label 2
en0 en2 rg_01
en1 en3 tty1
tty1
node2
en2
rg_02
en3 tty1
tty1 en0
rg_03
en1
node3
Service IPen2
label 3
en3 tty2
Shared storage
share_vg
The following common resources shown in Figure 2-4 are made highly available:
Service IP Labels
Applications shared between nodes
Storage shared between nodes
2.4.2 Resources
The items in this section are considered resources in a PowerHA cluster.
The service IP addresses becomes available when PowerHA brings the associated resource
group into an ONLINE status.
The placement of the service IP labels is determined by the specified Service IP label
distribution preference. The IP label distribution preference can also be changed dynamically,
but is only used in subsequent cluster events. This is to avoid any extra interruptions in
service. More information about the available options can be found in 12.2, “Distribution
preference for service IP aliases” on page 481.
Storage
The following storage types can be configured as resources:
Volume groups (AIX and Veritas VM)
Logical volumes (all logical volumes in a defined volume group)
File systems (jfs and jfs2): either all for the defined volume groups, or can be specified
individually
Raw disks: defined by physical volume identifier (PVID)
If storage is to be shared by some or all of the nodes in the cluster, then all components must
be on external storage and configured in such a way that failure of one node does not affect
the access by the other nodes.
For a list of supported devices by PowerHA, see the following web page:
https://www.ibm.com/support/pages/powerha-hardware-support-matrix
Important: Be aware that just because a third-party storage is not listed in the matrix does
not mean that storage is not supported. If VIOS supports third-party storage, and PowerHA
supports virtual devices through VIOS, then the storage should also be supported by
PowerHA. However, always verify support with the storage vendor.
For data protection, you can use either Redundant Array of Independent Disks (RAID)
technology (at the storage or adapter level) or AIX LVM mirroring (RAID 1).
Disk arrays are groups of disk drives that work together to achieve data transfer rates higher
than those provided by single (independent) drives. Arrays can also provide data redundancy
so that no data is lost if one drive (physical disk) in the array fails. Depending on the RAID
level, data is either mirrored, striped, or both.
RAID 1
RAID 1 is also known as disk mirroring. In this implementation, identical copies of each
chunk of data are kept on separate disks, or more commonly, each disk has a “twin” that
contains an exact replica (or mirror image) of the information. If any disk in the array fails,
then the mirror disk maintains data availability. Read performance can be enhanced
because the disk that has the actuator (disk head) closest to the required data is always
used, thereby minimizing seek times. The response time for writes can be somewhat
slower than for a single disk, depending on the write policy; the writes can be run either in
parallel (for faster response) or sequentially (for safety).
RAID 2 and RAID 3
RAID 2 and RAID 3 are parallel process array mechanisms, where all drives in the array
operate in unison. Similar to data striping, information to be written to disk is split into
chunks (a fixed amount of data), and each chunk is written to the same physical position
on separate disks (in parallel). When a read occurs, simultaneous requests for the data
can be sent to each disk. This architecture requires parity information to be written for
each stripe of data. The difference between RAID 2 and RAID 3 is that RAID 2 can use
multiple disk drives for parity; RAID 3 can use only one. If a drive fails, the system can
reconstruct the missing data from the parity and remaining drives. Performance is good for
large amounts of data, but poor for small requests, because every drive is always involved,
and there can be no overlapped or independent operation.
RAID 4
RAID 4 addresses some of the disadvantages of RAID 3 by using larger chunks of data
and striping the data across all of the drives except the one reserved for parity. Using disk
striping means that I/O requests need to reference only the drive that the required data is
actually on. This means that simultaneous and also independent reads are possible. Write
requests, however, require a read-modify-update cycle that creates a bottleneck at the
single parity drive. Each stripe must be read, the new data inserted, and the new parity
then calculated before writing the stripe back to the disk. The parity disk is then updated
with the new parity, but cannot be used for other writes until this has completed. This
bottleneck means that RAID 4 is not used as often as RAID 5, which implements the same
process but without the bottleneck.
RAID 5
RAID 5 is similar to RAID 4. The difference is that the parity information is also distributed
across the same disks used for the data, thereby eliminating the bottleneck. Parity data is
never stored on the same drive as the chunks that it protects. This means that concurrent
read and write operations can now be performed, and there are performance increases
because of the availability of an extra disk (the disk previously used for parity). Other
possible enhancements can further increase data transfer rates, such as caching
simultaneous reads from the disks and transferring that information while reading the next
blocks. This can generate data transfer rates that approach the adapter speed.
As with RAID 3, in the event of disk failure, the information can be rebuilt from the
remaining drives. A RAID 5 array also uses parity information, although regularly backing
up the data in the array is still important. RAID 5 arrays stripe data across all drives in the
array, one segment at a time (a segment can contain multiple blocks). In an array with n
drives, a stripe consists of data segments written to n-1 of the drives and a parity segment
written to the n-th drive. This mechanism also means that not all of the disk space is
available for data. For example, in an array with five 72 GB disks, although the total
storage is 360 GB, only 288 GB are available for data.
RAID 6
This is identical to RAID 5, except it uses one more parity block than RAID 5. You can have
two disks die and still have data integrity. This is also often referred to as double parity.
RAID 0+1 (RAID 10)
RAID 0+1, also known as IBM RAID-1 Enhanced, or RAID 10, is a combination of RAID 0
(data striping) and RAID 1 (data mirroring). RAID 10 provides the performance
advantages of RAID 0 while maintaining the data availability of RAID 1. In a RAID 10
configuration, both the data and its mirror are striped across all the disks in the array. The
first stripe is the data stripe, and the second stripe is the mirror, with the mirror being
placed on the different physical drive than the data. RAID 10 implementations provide
excellent write performance, as they do not have to calculate or write parity data. RAID 10
can be implemented using software (AIX LVM), hardware (storage subsystem level), or in
a combination of the hardware and software. The appropriate solution for an
implementation depends on the overall requirements. RAID 10 has the same cost
characteristics as RAID 1.
Some newer storage subsystems have any more specialized RAID type methods that do not
fit exactly into any of these categories, for example IBM XIV.
Important: Although all RAID levels (other than RAID 0) have data redundancy, data must
be regularly backed up. This is the only way to recover data if a file or directory is
accidentally corrupted or deleted.
Leaving quorum on, which is the default, causes resource group fallover if quorum is lost. The
volume group will be forced to vary on the next available node if a forced varyon of volume
groups attribute is enabled. When forced varyon of volume groups is enabled, PowerHA
checks to determine the following conditions:
That at least one copy of each mirrored set is in the volume group.
That each disk is readable.
That at least one accessible copy of each logical partition is in every logical volume.
If these conditions are fulfilled, then PowerHA forces the volume group varyon.
Note: The automatic fallover on volume group, through quorum, loss is also referred to as
selective fallover on volume group loss. This is enabled by default and can be disabled if
desired. However, be aware that this setting affects all volume groups that are assigned as
a resource to PowerHA.
When a node is integrated into the cluster, PowerHA builds a list of all enhanced concurrent
volume groups that are a resource in any resource group containing the node. These volume
groups are then activated in passive mode.
When the resource group comes online on the node, the enhanced concurrent volume groups
are then varied on in active mode. When the resource group goes offline on the node, the
volume group is varied off to passive mode.
Important: PowerHA also utilizes the JFS2 mountguard option. This option prevents a file
system from being mounted on more than one system at time. PowerHA v7.1.1 and later
automatically enable this feature if it is not already enabled.
Although this is not an issue related to PowerHA, be aware that some applications using raw
logical volumes can start writing from the beginning of the device, therefore overwriting the
logical volume control block (LVCB). In this case the application should be configured to skip
at least the first 512 bytes of the logical volume where the LVCB is stored.
Custom methods are provided for Veritas Volume Manager (VxVM) starting with the Veritas
Foundation Suite v4.0. For a newer version, you might need to create a custom user-defined
resource to handle the storage appropriately. More information about this option is in 2.4.7,
“User defined resources and types” on page 52.
File systems (jfs and jfs2) recovery using fsck and logredo
AIX native file systems use database journaling techniques to maintain their structural
integrity. After a failure, AIX uses the journal file system log (JFSlog) using logredo to restore
the file system to its last consistent state. This is faster than using the fsck utility. If the
process of replaying the JFSlog fails, an error occurs and the file system will not be mounted.
The fsck utility performs a verification of the consistency of the file system, checking the
inodes, directory structure, and files. Although this is more likely to recover damaged file
systems, it does take longer. Both options are available to be chosen within a resource group
with fsck being the default setting.
Important: Restoring the file system to a consistent state does not guarantee that the data
is consistent; that is the responsibility of the application.
2.4.3 NFS
PowerHA works with the AIX network file system (NFS) to provide a highly available NFS
server, which allows the backup NFS server to recover the current NFS activity if the primary
NFS server fails. This feature is available only for two-node clusters when using
NFSv2/NFSv3, and more than two nodes when using NFSv4, because PowerHA preserves
locks for the NFS file systems and handles the duplicate request cache correctly. The
attached clients experience the same hang if the NFS resource group is acquired by another
node as they would if the NFS server reboots.
When configuring NFS through PowerHA, you can control these items:
The network that PowerHA will use for NFS mounting.
NFS exports and mounts at the directory level.
Export options for NFS exported directories and file systems. This information is kept in
/usr/es/sbin/cluster/etc/exports, which has the same format as the AIX exports file
(/etc/exports).
NFS cross-mounts
NFS cross-mounts work as follows:
The node that is hosting the resource group mounts the file systems locally, NFS exports
them, and NFS mounts them, thus becoming both an NFS server and an NFS client.
All other participating nodes of the resource group simply NFS-mount the file systems,
thus becoming NFS clients.
If the resource group is acquired by another node, that node mounts the file system locally
and NFS exports them, thus becoming the new NFS server.
Start script This script must be able to start the application from both a clean
and an unexpected shutdown. Output from the script is logged in
the hacmp.out log file if set -x is defined within the script. The
exit code from the script is monitored by PowerHA.
Stop script This script must be able to successfully stop the application.
Output is also logged and the exit code monitored.
Application monitors To keep applications highly available, PowerHA is can monitor
the application too, not just the required resources.
Application startup mode Introduced in PowerHA v7.1.1 this mode specifies how the
application controller startup script is called. Select background,
the default value, if you want the start script to be called as a
background process. This allows event processing to continue
even if the start script has not completed. Select foreground if
needing the event processing to wait until the start script exits.
The full path name of the script must be the same on all nodes, however, the contents of the
script itself can be different from node to node. If they do differ on each node, it will inhibit your
ability to use file collections feature. This is why we generally suggest that you have an
intelligent script that can determine on which node it is running and then start appropriately.
As the exit codes from the application scripts are monitored, PowerHA assumes that a
non-zero return code from the script means that the script failed and therefore starting or
stopping the application was not successful. If this is the case, the resource group will go into
ERROR state and a config_too_long message is recorded in the hacmp.out log.
Consider the following factors when configuring the application for PowerHA:
The application is compatible with the AIX version.
The storage environment is compatible with a highly available cluster.
The application and platform interdependencies must be well understood. The location of
the application code, data, temporary files, sockets, pipes, and other components of the
system such as printers must be replicated across all nodes that will host the application.
As previously described, the application must be able to be started and stopped without
any operator intervention, particularly after a node unexpectedly halts. The application
start and stop scripts must be thoroughly tested before implementation and with every
change in the environment.
The resource group that contains the application must contain all the resources required
by the application, or be the child of one that does.
Application licensing must be accounted for. Many applications have licenses that depend
on the CPU ID; careful planning must be done to ensure that the application can start on
any node in the resource group node list. Also be careful with the numbers of CPUs and
other items on each node because some licensing is sensitive to these amounts.
Application availability
PowerHA also offers an application availability analysis tool, which is useful for auditing the
overall application availability, and for assessing the cluster environment. For more details,
see 7.7.10, “Measuring application availability” on page 335.
A user-defined resource type is one that you can define a customized resource which you can
be added to a resource group. A user-defined resource type contains several attributes that
describe the properties of the instances of the resource type.
Ensure that the user-defined resource type management scripts exist on all nodes that
participate as possible owners of the resource group where the user-defined resource
resides. Full details and configuration options on user defined resources and types can be
found in 11.3, “User-defined resources and types” on page 463.
PowerHA ensures that resource groups remain highly available by moving them from node to
node as conditions within the cluster change. The main states of the cluster and the
associated resource group actions are as follows:
Cluster startup The nodes in the cluster are up and then the resource groups
are distributed according to their startup policy.
Resource failure/recovery When a particular resource that is part of a resource group
becomes unavailable, the resource group can be moved to
another node. Similarly, it can be moved back when the resource
becomes available.
PowerHA shutdown There are several ways to stop PowerHA on a node. One method
causes the node’s resource groups to fall over to other nodes.
Another method takes the resource groups offline. Under some
circumstances, stopping the cluster services on the node, while
leaving the resources active is possible.
Node failure/recovery If a node fails, the resource groups that were active on that node
are distributed among the other nodes in the cluster, depending
on their fallover distribution policies. When a node recovers and
is reintegrated into the cluster, resource groups can be
reacquired depending on their fallback policies.
Cluster shutdown When the cluster is shut down, all resource groups are taken
offline. However for some configurations, the resources can be
left active, but the cluster services are stopped.
Before learning about the types of behavior and attributes that can be configured for resource
groups, you need to understand the following terms:
Node list The list of nodes that is able to host a particular resource group. Each
node must be able to access the resources that make up the resource
group.
Default node priority The order in which the nodes are defined in the resource group. A
resource group with default attributes will move from node to node in
this order as each node fails.
Home node The highest priority node in the default node list. By default this is the
node on which a resource group will initially be activated. This does
not specify the node that the resource group is currently active on.
Startup The process of bringing a resource group into an online state.
Fallover The process of moving a resource group that is online on one node to
another node in the cluster in response to an event.
Fallback The process of moving a resource group that is currently online on a
node that is not its home node, to a re-integrating node.
Startup options
These options control the behavior of the resource group on initial startup:
Online on home node only The resource group is brought online when its home node joins
the cluster. If the home node is not available, it stays in an offline
state. This is shown in Figure 2-5 on page 54.
Online on first available node Shown in Figure 2-6 on page 54, the resource group is brought
online when the first node in its node list joins the cluster.
Online on all available nodes As seen in Figure 2-7 on page 55, the resource group is brought
online on all nodes in its node list as they join the cluster.
Online using distribution policy The resource group is brought online only if the node has no other
resource group of this type already online. This is shown in
Figure 2-8 on page 55. If more than one resource group of this type
exists when a node joins the cluster, PowerHA selects the resource
group with fewer nodes in its node list. However, if one node has a
dependent resource group (that is it is a parent in a dependency
relationship), it is given preference.
Online on home
node only
rg_01
Online on first
available node
rg_01
rg_01
Online on all
available nodes
rg_01
rg_01 rg_01
Online using
distribution policy
check distribution
policy
rg_01 rg_02
Fallover options
These options control the behavior of the resource group if PowerHA must move it to another
node in the response to an event:
Fall over to next priority node in list
The resource group falls over to the next node in the resource group node list. See
Figure 2-9.
Fallover to next
priority node in list
rg_01
Fallover using
DNP
RSCT
rg_01
When you select one of the these attributes, you must also provide values for the DNP
script path and DNP time-out attributes for a resource group. When the DNP script path
attribute is specified, that script is invoked on all nodes and return values are collected
from all nodes. The fallover node decision is made by using these values and the specified
criteria. If you select the cl_highest_udscript_rc attribute, collected values are sorted
and the node that returned the highest value is selected as a candidate node to fallover. If
you select the cl_lowest_nonzero_udscript_rc attribute, collected values are sorted and
the node that returned lowest nonzero positive value is selected as a candidate node to
fallover. If the return value of the script from all nodes are the same or zero, the default
node priority will be considered. PowerHA verifies the script existence and the execution
permissions during verification.
Bring offline
(error node only)
rg_01 rg_01
Fallback options
These options control the behavior of an online resource group when a node joins the cluster:
Fall back to higher priority node in list
The resource group falls back to a higher priority node when it joins the cluster as seen in
Figure 2-12.
Fallback to higher
priority node in list
rg_01
Never fallback
rg_01
Full details, how to configure, and scenarios can be found in Chapter 10, “Extending resource
group capabilities” on page 425.
If a node fails to bring a resource group online when it joins the cluster, the resource group will
be left in the ERROR state. If the resource group is not configured as online on all available
nodes, PowerHA will attempt to bring the resource group online on the other active nodes in
the resource group’s node list.
Each node that joins the cluster automatically attempts to bring online any of the resource
groups that are in the ERROR state.
If a node fails to acquire a resource group during fallover, the resource group will be marked
as “recoverable” and PowerHA will attempt to bring the resource group online in all the nodes
in the resource groups node list. If this fails for all nodes, the resource group will be left in the
ERROR state.
If there is a failure of a network on a particular node, PowerHA will determine what resource
groups are affected (those that had service IP labels in the network) and then attempt to bring
them online on another node. If no other nodes have the required network resources, the
resource groups remain in the ERROR state. If any interfaces become available, PowerHA
works out what ERROR state resource groups can be brought on line, then attempts to do so.
Tip: If you want to override the automatic behavior of bringing a resource group in ERROR
state back online, specify that it must remain offline on a node.
Selective fallovers
The following failures are categorized as selective failover events but they are all enabled by
default:
Interface failure:
– PowerHA swaps interfaces if possible.
– If not possible, RG is moved to the highest priority node with an available interface, and
if not successful, RG will be brought into the ERROR state.
Network failure:
– If the failure is local, affected RGs are moved to another node.
– If the failure is global, the result is node_down for all nodes.
Application failure:
– If an application monitor indicates an application has failed, depending on the
configuration, PowerHA first attempts to restart the application on the same node
(usually three times).
– If restart is not possible, PowerHA moves the RG to another node, and if this fails also,
the RG is brought into the ERROR state.
In addition to the list of provided Smart Assists in Table 2-2, you can build a customized Smart
Assist program to manage other applications not in Table 2-2. The General Application Smart
Assist (GASA) is a preinstalled Smart Assist that comes with PowerHA SystemMirror. Its
intended purpose is for configuring applications that do not already have a target Smart
Assists, but can be easily managed using start and stop scripts. For more details about the
process see Smart Assist development concepts.
2.6.1 Notifications
This section shows notification options in your PowerHA SystemMirror cluster. Notifications
can be customized to meet your business requirements.
Error notification
This uses the AIX error notification facility. This allows you to trap on any specific error logged
in the error report and to run a custom notify method that the user provides.
You can use the verification automatic monitoring cluster_notify event to configure a
PowerHA SystemMirror remote notification method to send a message in case of detected
errors in cluster configuration. The output of this event is logged in the hacmp.out file
throughout the cluster on each node that is running cluster services.
You can configure any number of notification methods, for different events and with different
text or numeric messages and telephone numbers to dial. The same notification method can
be used for several different events, as long as the associated text message conveys enough
information to respond to all of the possible events that trigger the notification. This includes
SMS message support.
After configuring the notification method, you can send a test message to be sure
configurations are correct and that the expected message will be sent for an event.
As described previously, event monitoring is now at the kernel level. The following kernel
extension, which is loaded by the clevmgrdES subsystem, monitors these events for loss of
rootvg:
/usr/lib/drivers/phakernmgr
Details on how to check and change this option and its behavior can be found at 11.2,
“System events” on page 462.
The extra processors and memory, while physically present, are not used until you decide that
the additional capacity that you need is worth the cost. This provides you with a fast and easy
upgrade in capacity to meet peak or unexpected loads.
PowerHA SystemMirror integrates with the DLPAR and CoD functions. You can configure
cluster resources in a way where the logical partition with minimally allocated resources serve
as a standby node, and the application resides on another LPAR node that has more
resources than the standby node.
When it is necessary to run the application on the standby node, PowerHA SystemMirror
ensures that the node has sufficient resources to successfully run the application and
allocates the necessary resources.
For more information about using this feature, see 9.3, “Resource Optimized High Availability
(ROHA)” on page 368.
By using the PowerHA SystemMirror file collection function, you can request that a list of files
be automatically kept in sync across the cluster. You no longer have to manually copy an
updated file to every cluster node, confirm that the file is properly copied, and confirm that
each node has the same version of it. With PowerHA SystemMirror file collections enabled,
PowerHA SystemMirror can detect and warn you if one or more files in a collection is deleted
or has a zero value on one or more cluster nodes.
2.7 Limits
This section lists several common PowerHA limits, at the time of writing. These limits are
presented in Table 2-3.
Nodes 16
Volume groups in a resource group 512 (minus any other resources in the resource group)
File systems in a resource group 512 (minus any other resources in the resource group)
Networks 48
Service IP labels in a resource group 256 (minus rest of total IP addresses in the cluster)
Sites 2
Application controllers in a resource group 512 (minus any other resources in the resource group)
GLVM Devices All disks that are supported by AIX; they can be different
types of disks
Subnet requirements
The AIX kernel routing table supports multiple routes for the same destination. If multiple
matching routes have the same weight, each subnet route will be used alternately. The
problem that this poses for PowerHA is that if one node has multiple interfaces that share the
same route, PowerHA has no means to determine its health. Therefore, we suggest that each
interface on a node should belong to a unique subnet, so that each interface can be
monitored.
A system administrator of a PowerHA cluster, may be asked to perform any of the following
LVM-related maintenance tasks:
Create a new shared volume group.
Extend, reduce, change, or remove an existing volume group.
Create a new shared logical volume.
Extend, reduce, change, or remove an existing logical volume.
Create a new shared file system.
Extend, change, or remove an existing file system.
Add and remove physical volumes.
When performing any of these maintenance tasks on shared LVM components, make sure
that ownership and permissions are reset when a volume group is exported and then
reimported. More details about performing these tasks are available in 7.4, “Shared storage
management” on page 267.
After exporting and importing, a volume group is owned by root and accessible by the system
group.
Note: Applications, such as some database servers, that use raw logical volumes might be
affected by this change if they change the ownership of the raw logical volume device. You
must restore the ownership and permissions back to what is needed after this sequence.
There are also third-party (OEM) storage devices and subsystems that can be used, although
most of them are not directly certified by IBM for PowerHA usage. For these devices, check
the manufacturer’s respective websites.
You can configure OEM volume groups in AIX and use PowerHA SystemMirror to manage
such volume groups, their corresponding file systems, and application controllers. In
particular, PowerHA SystemMirror automatically detects and provides the methods for volume
groups created with the Veritas Volume Manager (VxVM) using Veritas Foundation Suite
(VFS) v.4.0. For other OEM filesystems, depending on the type of OEM volume, custom
methods in PowerHA SystemMirror allow you (or an OEM vendor) to tell PowerHA
SystemMirror that a file system unknown to AIX LVM should be treated the same way as a
known and supported file system, or to specify the custom methods that provide the file
systems processing functions supported by PowerHA SystemMirror.
PowerHA also supports shared tape drives (SCSI or Fibre Channel). The shared tapes can
be connected using SCSI or Fibre Channel (FC). Concurrent mode tape access is not
supported.
Storage configuration is one of the most important tasks you must perform before starting the
PowerHA cluster configuration. Storage configuration can be considered a part of PowerHA
configuration.
Depending on the application needs, and on the type of storage, you decide how many nodes
in a cluster will have shared storage access, and which resource groups will use which disks.
Note: PowerHA does not provide data storage protection. Storage protection is provided
by using these items:
AIX (LVM mirroring)
GLVM
Hardware RAID
In this section, we provide information about data protection methods at the storage level, and
also talk about the LVM shared disk access modes:
Non-concurrent
Enhanced concurrent mode (ECM)
Both access methods actually use enhanced concurrent volume groups. In a non-concurrent
access configuration, only one cluster node can access the shared data at a time. If the
resource group containing the shared disk space moves to another node, the new node will
activate the disks, and check the current state of the volume groups, logical volumes, and file
systems.
In a concurrent access configuration, data on the disks is available to all nodes concurrently.
This access mode does not support file systems (either JFS or JFS2).
LVM requirements
The LVM component of AIX manages the storage by coordinating data mapping between
physical and logical storage. Logical storage can be expanded and replicated, and can span
multiple physical disks and enclosures.
By forcing a volume group to vary on, you can bring and keep a volume group online (as part
of a resource group) with one copy of the data available. Use a forced varyon option only for
volume groups that have mirrored logical volumes. However, be cautious when using this
facility to avoid creating a partitioned cluster.
Note: You should also specify the superstrict allocation policy for any/all logical volumes in
volume groups used with the forced varyon option. In this way, the LVM ensures that the
copies of a logical volume are always on separate disks, and increases the chances that
forced varyon will be successful after a failure of one or more disks.
This option is useful in a takeover situation in case a volume group that is part of that
resource group loses one or more disks (VGDAs). If this option is not used, the resource
group will not be activated on the takeover node, thus rendering the application unavailable.
When using the forced varyon of volume groups option in a takeover situation, PowerHA first
tries a normal varyonvg command. If this attempt fails because of lack of quorum, PowerHA
checks the integrity of the data to ensure that at least one available copy of all data is in the
volume group before trying to force the volume online. If there is, it runs the varyonvg -f
command. If not, the volume group remains offline and the resource group action results in an
error state.
Note: The forced varyon feature is usually specific to cross-site LVM and GLVM
configurations.
PowerHA allows customization on predefined cluster events and also allows new events
creation. When you create new events, an important step is to check whether any standard
event exists that covers the action or situation you want.
All standard cluster events have their own meaning and functioning behavior. Some of the
most common examples of cluster events are listed in Table 2-4.
node_up Nodes joining or leaving cluster node_up event starts when a node joins or rejoins the
cluster.
node_down Nodes joining or leaving cluster node_down event starts when a cluster is not
receiving heartbeats from a node; it considers the
node gone and starts a node_down event.
network_up Nodes joining or leaving cluster network_up event starts when a cluster detects that
a network is available and ready for cluster usage
(for a service IP address activation, per example).
network_down Network related events network_down event starts when a specific network
becomes unreachable. It can be a
network_down_local, when only a specific node has
lost its connectivity for a network or
network_down_global, when all nodes have lost
connectivity.
swap_adapter Network related events swap_adapter event starts when the interface that
hosts one service IP address experiences a failure.
If other boot networks are available on the same
node, then swap_adapter event moves the service IP
address to another boot interface and refreshes
network routing table.
fail_interface Interface related issues fail_interface event starts when any node
interface experiences a failure. If the interface has no
service IP defined, only fail_interface event runs.
If the failing interface hosts a service IP address and
there is no other boot interface available to host it,
then a rg_move event is triggered.
join_interface Interface related issues join_interface event starts when a boot interface
becomes available or when it recovers itself from a
failure.
fail_standby Interface related issues fail_standby event starts when a boot interface,
hosting no service IP address, faces a failure.
join_standby Interface related issues join_standby event starts when a boot interface
becomes available or when it recovers itself from a
failure.
rg_move Resource group changes rg_move event starts when a resource group
operation from one node to another starts.
rg_up Resource group changes rg_up event starts when a resource group is
successfully brought online at a node.
rg_down Resource group changes rg_down event starts when a resource group is
brought offline.
Note: All events have detailed usage description in the script file. All standard events are in
the /usr/es/sbin/cluster/events directory.
Part 2
Chapter 3. Planning
In this chapter, we discuss the planning aspects for a PowerHA 7.2.7 cluster. Proper planning
and preparation are necessary to successfully install and maintain a PowerHA cluster. Time
spent properly planning your cluster configuration and preparing your environment will result
in a cluster that is easier to install and maintain and one that provides higher application
availability.
Before you begin planning the cluster, you must have a good understanding of your current
environment, your application, and your expected behavior for PowerHA. Building on this
information, you can develop an implementation plan that helps you to more easily integrate
PowerHA into your environment, and more important, have PowerHA manage your
application availability to your expectations.
PowerHA can be configured to monitor server hardware, operating system, and application
components. In the event of a failure, PowerHA can take corrective actions, such as moving
specified resources (service IP addresses, storage, and applications) to surviving cluster
components to restore application availability as quickly as possible.
Because PowerHA is an extremely flexible product, designing a cluster to fit your organization
requires thorough planning. Knowing your application requirements and behavior provides
important input to your PowerHA plan and will be primary factors in determining the cluster
design. Ask yourself the following questions while developing your cluster design:
Which application services are required to be highly available?
What are the service level requirements for these application services (24/7, 8/5) and how
quickly must service be restored if a failure occurs?
What are the potential points of failure in the environment and how can they be
addressed?
Which points of failure can be automatically detected by PowerHA and which require
custom code to be written to trigger an event?
What is the skill level within the group implementing and maintaining the cluster?
Although the AIX system administrators are typically responsible for the implementation of
PowerHA, they usually cannot do it alone. A team consisting of the following representatives
should be assembled to assist with the PowerHA planning; each will play a role in the success
of the cluster:
Network administrator
AIX system administrator
Database administrator
Application programmer
Support personnel
Application users
Using the concepts described in Chapter 1, “Introduction to PowerHA SystemMirror for AIX”
on page 3, begin the PowerHA implementation by developing a detailed PowerHA cluster
configuration and implementation plan.
For simplicity we use the planning of a simple two-node mutual takeover cluster as an
example. Sample planning worksheets are included as we work through this chapter so you
can see how the cluster planning is developed.
Both the cluster diagram and the paper planning worksheets provide a manual method of
recording your cluster information. A set of planning worksheets is in Appendix A, “Paper
planning worksheets” on page 579.
Chapter 3. Planning 75
7739ch03.fm Draft Document for Review March 23, 2023 11:54 am
Important: Each application to be integrated into the cluster must run in stand-alone
mode. You also must be able to fully control the application (start, stop, and validation test).
The intention is to make use of the two nodes in a mutual takeover configuration where app1
normally resides on Node01, and app2 normally resides on Node02. In the event of a failure,
we want both applications to run on the surviving server. As you can see from the diagram,
we need to prepare the environment to allow each node to run both applications.
Note: Each application to be integrated into the cluster must be able to run in stand-alone
mode on any node that it might have to run on (under both normal and fallover situations).
Analyzing PowerHA cluster requirements, we have three key focus areas as illustrated in
Figure 3-1 on page 75: network, application, and storage. All planning activities are in support
of one of these three items to some extent:
Network How clients connect to the application (the service address). The
service address floats between all designated cluster nodes.
Application What resources are required by the application. The application must
have all it needs to run on a fallover node including CPU and memory
resources, licensing, runtime binaries, and configuration data. It
should have robust start and stop scripts and a tool to monitor its
status.
Storage What type of shared disks will be used. The application data must
reside on shared disks that are available to all cluster nodes.
Chapter 3. Planning 77
7739ch03.fm Draft Document for Review March 23, 2023 11:54 am
This is a good time to create a diagram of the PowerHA cluster. Start simply and gradually
increase the level of details as you go through the planning process. The diagram can help
identify single points of failure, application requirements, and guide you along the planning
process.
Also use the paper or online planning worksheets to record the configuration and cluster
details as you go.
Figure 3-2 illustrates the initial cluster diagram used in our example. At this point, the focus is
on high level cluster functionality. Cluster details are developed as we move through the
planning phase.
We begin to make design decisions for the cluster topology and behavior based on our
requirements. For example, based on our requirements, the initial cluster design for our
example includes the following considerations:
The cluster is a two-node mutual takeover cluster.
Although host names can be used as cluster node names, we choose to specify cluster
node names instead.
Note: A key configuration requirement is that the LPAR partition name, the cluster node
name, and AIX host name must match. This is the assumptions that PowerHA makes.
Each node contains one application but is capable of running both (consider network,
storage, memory, CPU, software).
Each node has one logical Ethernet interface that is protected using Shared Ethernet
Adapter (SEA) in a Virtual I/O Server (VIOS).
IP Address Takeover (IPAT) using aliasing is used.
Each node has a persistent IP address (an IP alias that is always available while the node
is up) and one service IP (aliased to one of the adapters under PowerHA control). The
base Ethernet adapter addresses are on separate subnets.
Shared disks are virtual SCSI devices provided by a VIOS and reside on a SAN and are
available on both nodes.
All volume groups on the shared disks are created in Enhanced Concurrent Mode (ECM)
as required in PowerHA.
Each node has enough CPU and memory resources to run both applications.
Each node has redundant hardware and mirrored internal disks.
AIX 7.2 TL3 SP3 is installed.
PowerHA 7.2.7 is used.
This list captures the basic components of the cluster design. Each item will be investigated in
further detail as we progress through the planning stage.
Reserved words
The list of reserved words can be found in /usr/es/sbin/cluster/etc/reserved_words and
are shown below:
– adapter
– cluster
– command
– custom
– daemon
– event
– group
– network
– node
– resource
– name
– grep
– subnet
– nim
– ip
– IP
– ether
Chapter 3. Planning 79
7739ch03.fm Draft Document for Review March 23, 2023 11:54 am
– token
– rs232
– socc
– fddi
– slip
– tmscsi
– fcs
– hps
– atm
– tmssa
– serial
– public
– private
– diskhb
– diskhbmulti
– alias
– disk
– volume
– vpath
– tty
– scsi
– fscsi
– vscsi
– nodename
– OHN
– OFAN
– OUDP
– OAAN
– FNPN
– FUDNP
– BO
– FBHPN
– NFB
– ipv6
– IPv6
– IPV6
– IW
– ALL
– all
PowerHA supports virtually any AIX supported node, from desktop systems to high end
servers. When choosing a type of node, consider this information:
Ensure that sufficient CPU and memory resources are available on all nodes to allow the
system to behave as you want it to in a fallover situation. The CPU and memory resources
must be capable of sustaining the selected applications during fallover, otherwise clients
might experience performance problems. If you are using LPARs, you might want to use
the DLPAR capabilities to increase resources during fallover. If you are using stand-alone
servers, you do not have this option and so you might have to look at using a standby
server.
Make use of highly available hardware and redundant components where possible in each
server. For example, use redundant power supplies and connect them to separate power
sources.
Protect each node’s rootvg (local operating system copy) through the use of mirroring or
RAID.
Allocate at least two Ethernet adapters per node and connect them to separate switches
to protect from a single adapter or switch failure. Commonly this is done using a single or
dual Virtual I/O Server.
Chapter 3. Planning 81
7739ch03.fm Draft Document for Review March 23, 2023 11:54 am
Allocate two SAN adapters per node to protect from a single SAN adapter failure.
Commonly this is done using a single or dual Virtual I/O Server.
Although not mandatory, we suggest using cluster nodes with similar hardware configurations
so that you can more easily distribute the resources and perform administrative operations.
That is, do not try to fallover from a high-end enterprise class server to a scale-out model and
expect everything to work properly.
Tip: For a list of supported devices by PowerHA, find the Hardware Support Matrix in the
following web page:
https://www.ibm.com/support/pages/powerha-hardware-support-matrix
SAN Switches (2) IBM 2498 SAN24B-5 Zoned for NPIV client WWPNs
SAN Storage IBM 2076-624 V7000 Switch attached (but not shown in
diagram)
Chapter 3. Planning 83
7739ch03.fm Draft Document for Review March 23, 2023 11:54 am
cluster.es.server
– cluster.es.server.diag
– cluster.es.server.events
– cluster.es.server.rte
– cluster.es.server.testtool
– cluster.es.server.utils
cluster.es.smui
– cluster.es.smui.agent
– cluster.es.smui.common
cluster.es.smui.server
cluster.license
cluster.man.en_US.es
– cluster.man.en_US.es.data
cluster.msg.Fr_FR.assist
cluster.msg.Fr_FR.es
– cluster.msg.Fr_FR.es.client
– cluster.msg.Fr_FR.es.server
cluster.msg.Ja_JP.assist
cluster.msg.Ja_JP.es
– cluster.msg.Ja_JP.es.client
– cluster.msg.Ja_JP.es.server
cluster.msg.en_US.assist
cluster.msg.en_US.es
– cluster.msg.en_US.es.client
– cluster.msg.en_US.es.server
cluster.msg.fr_FR.assist
cluster.msg.fr_FR.es
– cluster.msg.fr_FR.es.client
– cluster.msg.fr_FR.es.server
cluster.msg.ja_JP.assist
cluster.msg.ja_JP.es
– cluster.msg.ja_JP.es.client
– cluster.msg.ja_JP.es.server
If using the installation media of PowerHA V7.2.7 Enterprise Edition, the following additional
file sets are available:
cluster.es.cgpprc
– cluster.es.cgpprc.cmds
– cluster.es.cgpprc.rte
cluster.es.genxd
– cluster.es.genxd.cmds
– cluster.es.genxd.rte
Chapter 3. Planning 85
7739ch03.fm Draft Document for Review March 23, 2023 11:54 am
cluster.es.pprc
– cluster.es.pprc.cmds
– cluster.es.pprc.rte
cluster.es.spprc
– cluster.es.spprc.cmds
– cluster.es.spprc.rte
cluster.es.sr
– cluster.es.sr.cmds
– cluster.es.sr.rte
cluster.es.svcpprc
– cluster.es.svcpprc.cmds
– cluster.es.svcpprc.rte
cluster.es.tc
– cluster.es.tc.cmds
– cluster.es.tc.rte
cluster.msg.En_US.cgpprc
cluster.msg.En_US.genxd
cluster.msg.En_US.pprc
cluster.msg.En_US.sr
cluster.msg.En_US.svcpprc
cluster.msg.En_US.tc
cluster.msg.Fr_FR.assist
cluster.msg.Fr_FR.cgpprc
cluster.msg.Fr_FR.genxd
cluster.msg.Fr_FR.glvm
cluster.msg.Fr_FR.pprc
cluster.msg.Fr_FR.sr
cluster.msg.Fr_FR.svcpprc
cluster.msg.Fr_FR.tc
cluster.msg.Ja_JP.cgpprc
cluster.msg.Ja_JP.genxd
cluster.msg.Ja_JP.glvm
cluster.msg.Ja_JP.pprc
cluster.msg.Ja_JP.sr
cluster.msg.Ja_JP.svcpprc
cluster.msg.Ja_JP.tc
cluster.msg.en_US.cgpprc
cluster.msg.en_US.genxd
cluster.msg.en_US.glvm
cluster.msg.en_US.pprc
cluster.msg.en_US.sr
cluster.msg.en_US.svcpprc
cluster.msg.en_US.tc
cluster.msg.fr_FR.cgpprc
cluster.msg.fr_FR.genxd
cluster.msg.fr_FR.glvm
cluster.msg.fr_FR.pprc
cluster.msg.fr_FR.sr
cluster.msg.fr_FR.svcpprc
cluster.msg.ja_JP.cgpprc
cluster.msg.ja_JP.genxd
cluster.msg.ja_JP.glvm
cluster.msg.ja_JP.pprc
cluster.msg.ja_JP.svcpprc
cluster.msg.ja_JP.sr
cluster.msg.ja_JP.tc
cluster.xd.base
cluster.xd.glvm
cluster.xd.license
/etc/hosts
The cluster event scripts use the /etc/hosts file for name resolution. All cluster node IP
interfaces must be added to this file on each node. PowerHA can modify this file to ensure
that all nodes have the necessary information in their /etc/hosts file, for proper PowerHA
operations.
If you delete service IP labels from the cluster configuration by using SMIT, we suggest that
you also remove them from /etc/hosts.
/etc/inittab
The /etc/inittab file is modified in each of the following cases:
PowerHA is installed:
The following line is added when you initially install PowerHA. It will start the clcomdES
and clstrmgrES subsystems if they are not already running.
clcomd:23456789:once:/usr/bin/startsrc -s clcomd
hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init >/dev/console 2>&1
Important: This PowerHA entry is used to start the following daemons with the
startsrc command if they are not already running:
startsrc -s syslogd
startsrc -s snmpd
startsrc -s clstrmgrES
If PowerHA is set to start at system restart, add the following line to the /etc/inittab file:
hacmp6000:2:wait:/usr/es/sbin/cluster/etc/rc.cluster -boot -b -A # Bring up
Cluster
Notes:
Although starting cluster services from the inittab file is possible, we suggest that
you do not use this option. The better approach is to manually control the starting of
PowerHA. For example, in the case of a node failure, investigate the cause of the
failure before restarting PowerHA on the node.
ha_star is also found as an entry in the inittab file. This file set is delivered with the
bos.rte.control file set and not PowerHA.
Chapter 3. Planning 87
7739ch03.fm Draft Document for Review March 23, 2023 11:54 am
/etc/rc.net
The /etc/rc.net file is called by cfgmgr, which is the AIX utility that configures devices and
optionally installs device software into the system, to configure and start TCP/IP during the
boot process. It sets host name, default gateway, and static routes.
/etc/services
PowerHA makes use of the following network ports for communication between cluster
nodes. These are all listed in the /etc/services file as shown in Example 3-1.
Note: If you install PowerHA Enterprise Edition for GLVM, the following entry for the port
number and connection protocol is automatically added to the /etc/services file in each
node on the local and remote sites on which you installed the software:
rpv 6192/tcp
/etc/snmpd.conf
The default version of the file for versions of AIX later than V5.1 is snmpdv3.conf.
The SNMP daemon reads the /etc/snmpd.conf configuration file when it starts and when a
refresh or kill -1 signal is issued. This file specifies the community names and associated
access privileges and views, hosts for trap notification, logging attributes, snmpd-specific
parameter configurations, and SNMP mutliplexing (SMUX) configurations for the snmpd. The
PowerHA installation process adds a clsmuxpd password to this file.
The following entry is added to the end of the file, to include the PowerHA MIB, supervised by
the Cluster Manager:
smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password # PowerHA SystemMirror clsmuxpd
/etc/snmpd.peers
The /etc/snmpd.peers file configures snmpd SMUX peers. During installation, PowerHA adds
the following entry to include the clsmuxpd password to this file:
clsmuxpd 1.3.6.1.4.1.2.3.1.2.1.5 "clsmuxpd_password" # PowerHA SystemMirror clsmuxpd
/etc/syslog.conf
The /etc/syslog.conf configuration file controls output of the syslogd daemon, which logs
system messages. During installation, PowerHA adds entries to this file that direct the output
from problems related to PowerHA to certain files.
CAA also adds a line as shown at the beginning. See Example 3-2.
/etc/trcfmt
The /etc/trcfmt file is the template file for the system trace logging and report utility, trcrpt.
The installation process adds PowerHA tracing to the trace format file. PowerHA tracing is
performed for the clstrmgrES and clinfo daemons.
/var/spool/cron/crontab/root
The PowerHA installation process adds PowerHA log file rotation to the
/var/spool/cron/crontabs/root file as shown in Example 3-3.
Check with the application vendor to ensure that no issues, such as licensing, exist with the
use of PowerHA.
3.4.7 Licensing
The two aspects of licensing are as follows:
PowerHA 7.2.7 (features) licensing
Application licensing
PowerHA 7.2.7
PowerHA licensing is core-based, which means that PowerHA must be licensed for each core
that used by the cluster nodes. The licensing is enforced by proper entitlement of the LPARs.
Because they are core-based licenses for both PowerHA Standard and Enterprise Editions,
the licenses depend on the Power Systems servers on which the cluster nodes run. The
Power Systems servers can be divided into the following categories:
Chapter 3. Planning 89
7739ch03.fm Draft Document for Review March 23, 2023 11:54 am
Small Tier IBM Power Systems S914 and S1014, S922 and S1022, S924 and
S1024,and S950 and S1050.
Medium Tier IBM Power Systems E980, E1080
In environments that are considered hot-standby, the total licensing is often N+1. N being the
total number of CPUs in the production environment and the +1 for the running standby node.
So in the first two bullets listed previously, the licensing would be five and three respectively.
Now if the cluster is mutual takeover/active-active configuration then all CPUs in the cluster
node LPARs must be licensed. Assuming the LPARs are equally sized then for the first two
bullets above the licensing would be eight and four respectively.
This also means licensing is entire cores, so sub-core licensing is not available. Always add
up all cores involved and there is a partial core left over you must round up. For example if
your cluster ends up with 9.4 cores you should license 10 cores.
If there is ever any doubt or questions about licensing your environment always contact your
IBM sales representative or IBM Business Partner for assistance.
Applications
Some applications have specific licensing requirements, such as a unique license for each
processor that runs an application, which means that you must be sure that the application is
properly licensed to allow it to run on more than one system. To do this (license-protecting an
application) they incorporate processor-specific information into the application when it is
installed. As a result, even though the PowerHA 7.2.7 software processes a node failure
correctly, it might be unable to restart the application on the fallover node because of a
restriction on the number of licenses for that application available within the cluster.
Important: To avoid this problem, be sure that you have a license for each system unit in
the cluster that might potentially run an application. Check with your application vendor for
any license issues for when you use PowerHA 7.2.7.
Another good practice is to allow approximately 100 MB free space in /var and /tmp for
PowerHA 7.2.7 logs. This depends on the number of nodes in the cluster, which dictates the
size of the messages stored in the various PowerHA 7.2.7 logs.
Time synchronization
Time synchronization is important between cluster nodes for both application and PowerHA
log issues. This is standard system administration practice, and we suggest that you make
use of an NTP server or other procedure to keep the cluster nodes time in sync.
Note: Maintaining time synchronization between the nodes is especially useful for auditing
and debugging cluster problems.
Chapter 3. Planning 91
7739ch03.fm Draft Document for Review March 23, 2023 11:54 am
This is particularly important in a fallover (takeover) situation. Application users must be able
to access the shared files from any required node in the cluster. This usually means that the
application-related user and group identifiers (UID and GID) must be the same on all nodes.
In preparation for a cluster configuration, be sure to consider and correct this, otherwise you
might experience service problems during a fallover.
After PowerHA is installed, it contains facilities to let you manage AIX user and group
accounts across the cluster. It also provides a utility to authorize specified users to change
their own password across nodes in the cluster.
Attention: If you manage user accounts with a utility such as Network Information Service
(NIS), PSSP user management, or Distributed Computing Environment (DCE) Manager,
do not use PowerHA user management. Using PowerHA user management in this
environment might cause serious system inconsistencies in the user authentication
databases.
For more information about user administration, see 7.3.1, “C-SPOC user and group
administration” on page 254.
Before installation: If you prefer to control the GID of the hacmp group, we suggest that
you create the hacmp group before installing the PowerHA file sets.
For more information about user administration, see 7.3.1, “C-SPOC user and group
administration” on page 254.
In addition to the ports identified in the /etc/services file, the following services also require
ports. However, these ports are selected randomly when the processes start. Currently, there
is no way to indicate specific ports, so be aware of their presence. Typical ports are shown for
illustration, but these ports can be altered if you need to do so:
#clstrmgr 870/udp
#clstrmgr 871/udp
#linfo 32790/udp
These file collections can be managed through SMIT menus. You can add, delete, and modify
file collections to meet your needs.
Configuration_Files
The Configuration_Files collection is a container for the following essential system files:
/etc/hosts
/etc/services
/etc/snmpd.conf
/etc/snmpdv3.conf
/etc/rc.net
/etc/inetd.conf
/usr/es/sbin/cluster/netmon.cf
/usr/es/sbin/cluster/etc/clhosts
/usr/es/sbin/cluster/etc/rhosts
/usr/es/sbin/cluster/etc/clinfo.rc
You can alter the propagation options for this file collection. You can also add and remove files
to/from this file collection.
HACMP_Files
The HACMP_Files collection is a container that typically holds user-configurable files of the
PowerHA configuration such as application start and stop scripts, customized events, and so
on. This file collection cannot be removed or modified, and you cannot add files to or delete
files from it.
Example: For example, when you define an application server to PowerHA (start, stop and
optionally monitoring scripts), PowerHA will automatically include these files into the
HACMP_Files collection.
Unlike the Configuration_Files file collection, you cannot directly modify the files in this
collection. For more information, see 7.2, “File collections” on page 247.
Chapter 3. Planning 93
7739ch03.fm Draft Document for Review March 23, 2023 11:54 am
In a typical clustering environment, clients access the applications through a TCP/IP network
(usually Ethernet) using a service address. This service address is made highly available by
PowerHA and moves between communication interfaces on the same network as required.
PowerHA sends heartbeat packets between all communication interfaces (adapters) in the
network to determine the status of the adapters and nodes and takes remedial actions as
required.
To eliminate the TCP/IP protocol as a single point of failure and prevent cluster partitioning,
PowerHA also uses non-IP networks for heartbeating. This assists PowerHA with identifying
the failure boundary, such as a TCP/IP failure or a node failure.
An Ethernet network is used for public access and has multiple adapters connected from
each node. This network will hold the base IP addresses, the persistent IP addresses, and
the service IP addresses. You can have more than one network; however, for simplicity, we
are use only one.
The cluster repository is also shown. This provides another path of communications across
the disk. Multipath devices can be configured whenever there are multiple disk adapters in a
node, multiple storage adapters, or both.
PowerHA, through CAA, also can use the SAN HBAs for communications. This is often
referred to as SAN heartbeating, or sancomm. The device that enables it is sfwcomm.
All network connections are used by PowerHA to monitor the status of the network, adapters,
and nodes in the cluster by default. In our example, we plan for an Ethernet and repository
disk but not sancomm network.
3.7.1 Terminology
This section presents a quick summary of the terminology used in describing PowerHA 7.2.7
networking.
IP label
A name that is associated with an IP address, and is resolvable by the system (/etc/hosts,
BIND, and so on).
Service IP label/address
An IP label or IP address over which a service is provided. Typically this is the address
used by clients to access an application. It can be bound to a node or shared by nodes
and is kept highly available by PowerHA.
Persistent IP label/address
A node-bound IP alias that is managed by PowerHA 7.2.7 (the persistent alias never
moves to another node).
Communication interface
A physical interface that supports the TCP/IP Protocol (for example an Ethernet adapter).
Network interface card (NIC)
A physical adapter that is used to provide access to a network (for example an Ethernet
adapter is referred to as a NIC).
Chapter 3. Planning 95
7739ch03.fm Draft Document for Review March 23, 2023 11:54 am
Network connections
PowerHA 7.2.7 requires that each node in the cluster have at least one direct, non-routed
network connection with every other node. These network connections pass heartbeat
messages among the cluster nodes to determine the state of all cluster nodes, networks and
network interfaces.
PowerHA 7.2.7 also requires that all communication interfaces for a cluster network be
defined on the same physical network, route packets, and receive responses from each
other without interference by any network equipment.
Do not use intelligent switches, routers, or other network equipment that do not transparently
pass UDP broadcasts, and other packets between all cluster nodes.
Bridges, hubs, and other passive devices that do not modify the packet flow can be safely
placed between cluster nodes, and between nodes and clients.
Figure 3-4 illustrates a physical Ethernet configuration, showing dual Ethernet adapters on
each node connected across two switches but all configured in the same physical network
(VLAN). This is sometimes referred to as being in the same MAC collision domain.
EtherChannel
PowerHA supports the use of EtherChannel (or Link Aggregation) for connection to an
Ethernet network. EtherChannel can be useful if you want to use several Ethernet adapters
for both extra network bandwidth and fallover, but also want to keep the PowerHA
configuration simple. With EtherChannel, you can specify the EtherChannel interface as the
communication interface. Any Ethernet failures, with the exception of the Ethernet network
itself, can be handled without PowerHA being aware or involved.
Important:
The host name cannot be an alias in the /etc/hosts file.
The name resolution for the host name must work for both ways, therefore a limited set
of characters can be used.
The IP address that belongs to the host name must be reachable on the server, even
when PowerHA is down.
The host name cannot be a service address.
The host name cannot be an address located on a network which is defined as private
in PowerHA.
The host name, the CAA node name, and the “communication path to a node” must be
the same.
By default, the PowerHA, node name, the CAA nodename, and the “communication
path to a node” are set to the same name.
The host name and the PowerHA nodename can differ.
The rules leave the base addresses and the persistent address as candidates for the host
name. You can use the persistent address as the host name only if you set up the persistent
alias manually before you configure the cluster topology.
Starting with PowerHA 7.1.3, PowerHA (through CAA) now offers the ability to change the
cluster node host name dynamically as needed. For more information about this capability,
see Chapter 11 of the Guide to IBM PowerHA SystemMirror for AIX Version 7.1.3,
SG24-8167.
/etc/hosts
An IP address and its associated label (name) must be present in the /etc/hosts file. We
suggest that you choose one of the cluster nodes to perform all changes to this file and then
use SCP or file collections to propagate the /etc/hosts file to the other nodes. However, in an
inactive cluster, the auto-corrective actions during cluster verification can at least keep the IP
addresses that are associated with the cluster in sync.
Note: Be sure that you test the direct and reverse name resolution on all nodes in the
cluster and the associated Hardware Management Consoles (HMCs). All these must
resolve names identically, otherwise you might run into security issues and other problems
related to name resolution.
Chapter 3. Planning 97
7739ch03.fm Draft Document for Review March 23, 2023 11:54 am
IP aliases
An IP alias is an IP address configured onto a NIC in addition to the base IP address of the
NIC. The use of IP aliases is an AIX function that is supported by PowerHA 7.2.7. AIX
supports multiple IP aliases on a NIC, each on the same or different subnets.
Note: While AIX allows IP aliases with different subnet masks to be configured for an
interface, PowerHA 7.2.7 uses the subnet mask of the base IP address for all IP aliases
configured on this network interface.
Important: If the persistent IP address exists on the node, it must be an alias, not the base
address of an adapter.
Figure 3-5 illustrates the concept of the persistent address. Notice that this is simply another
IP address, configured on one of the base interfaces. The netstat command shows it as an
additional IP address on an adapter.
Subnetting
All the communication interfaces that are configured in the same PowerHA network must
have the same subnet mask. Interfaces that belong to a different PowerHA network can have
either the same or different network mask.
Note: When using a single adapter per network configuration, the base (or boot) IP and
the service IP can be on the same subnet. This is common today and when used
eliminates the need for a persistent IP alias.
To prevent this situation, if not using a single adapter configuration with a boot IP on the
routable subnet, then be sure to use a persistent address and link the default route to this
subnet. The persistent address will be active while the node is active and therefore so will the
default route. If you choose not to use a persistent address, then you can create a custom
event or post-event script to reestablish the default route if this becomes an issue.
Having said that, this issue typically applies to a multiple interface/multiple boot adapters per
node configuration. Which overall is rare. That is because it is common that the physical
adapter redundancy is provided at a layer outside the OS and PowerHA is none-the-wiser of
it. Hence why most configurations appear to be a single adapter configuration, yet truly are
still redundant.
Note: Not all adapters must contain addresses that are routable outside the VLAN. Only
the service and persistent addresses must be routable. The base adapter addresses and
any aliases used for heartbeating do not need to be routed outside the VLAN because they
are not known to the client side.
Ensure that the switch provides a timely response to Address Resolution Protocol (ARP)
requests. For many brands of switches, this means turning off the following functions:
The spanning tree algorithm
portfast
uplinkfast
backbonefast
Chapter 3. Planning 99
7739ch03.fm Draft Document for Review March 23, 2023 11:54 am
If having the spanning tree algorithm turned on is necessary, then the portfast function should
also be turned on.
Multicast
Although multicast is rarely used in PowerHA, it still is a valid option that may be desired. To
use multicast, see 12.1, “Multicast considerations” on page 478.
Figure 3-6 shows the basic format for global unicast IPv6 addresses.
For PowerHA, you can have your boot IP addresses configured to the link-local address if that
is suitable. However, for configurations involving sites, it will be more suitable for configuring
boot IP addresses with global unicast addresses that can communicate with each other. The
benefit is that you can have extra heartbeating paths, which helps prevent cluster partitions.
In general, automatic IPv6 addresses are suggested for unmanaged devices such as client
PCs and mobile devices. Manual IPv6 addresses are suggested for managed devices such
as servers.
For PowerHA, you are allowed to have either automatic or manual IPv6 addresses. However,
consider that automatic IP addresses have no guarantee to persist. CAA restricts you to
having the host name labeled to a configured IP address, and also does not allow you to
change the IP addresses when the cluster services are active.
PowerHA allows you to mix different IP address families on the same adapter (for example,
IPv6 service label in the network with IPv4 boot, IPv4 persistent label in the network with IPv6
boot). However, the preferred practice is to use the same family as the underlying network for
simplifying planning and maintenance.
To determine the IPv6 multicast address, a standard prefix of 0xFF05 is combined by using the
logical OR operator with the hexadecimal equivalent of the IPv4 address. For example, the
IPv4 multicast address is 228.8.16.129 or 0xE4081081. The transformation by the logical OR
operation with the standard prefix is 0xFF05:: | 0xE4081081. Thus, the resulting IPv6
multicast address is 0xFF05::E408:1081.
IPAT through aliasing is easy to implement and flexible. You can have multiple service
addresses on the same adapter at any one time, and some time can be saved during fallovers
because PowerHA adds an alias rather than reconfigures the base IP address of an adapter.
PowerHA allows the use of IPAT through IP aliases with the following network types that
support gratuitous ARP (in AIX):
Ethernet
XD_data
XD_ip
By default, when PowerHA 7.2.7 starts, it automatically configures the service IP as an alias
with firstalias option. This means it swaps the boot IP off of the interface, puts the service IP
in its place and then aliases the boot IP on top of it.”
In a multiple interface per network configuration, using a persistent alias and including it in the
same subnet as your default route is common. This typically means that the persistent
address is included in the same subnet as the service addresses. The persistent alias can be
used to access the node when PowerHA 7.2.7 is down and also overcome the default route
issue.
You can configure a distribution preference for the placement of service IP labels that are
configured in PowerHA 7.2.7 The placement of the alias is configurable through SMIT menus
as follows:
Anti-collocation
This is the default, and PowerHA distributes the service IP labels across all boot IP
interfaces in the same PowerHA network on the node. It also uses firstalias in its behavior
for each interface.
Collocation
PowerHA allocates all service IP addresses on the same boot IP interface.
Collocation with persistent label
PowerHA allocates all service IP addresses on the boot IP interface that is hosting the
persistent IP label. This can be useful in environments with VPN and firewall configuration,
where only one interface is granted external connectivity. The persistent label will be the
source address.
For more information and examples of using service distribution policies, see 12.2,
“Distribution preference for service IP aliases” on page 481.
We suggest that you make the following entry in the /etc/netsvc.conf file to assure that the
/etc/hosts file is read before a DNS lookup is attempted:
hosts = local, bind4
usr/es/sbin/cluster/netmon.cf
If a virtualized network environment, such as provided by VIOS, is being used for one or more
interfaces, PowerHA can have difficulty accurately determining a particular adapter failure.
For these situations, the netmon.cf file is important to use. For more information see 12.5,
“Understanding the netmon.cf file” on page 495.
The first worksheet (Table 3-6) shows the specifications for the Ethernet network used in our
example.
COMMENTS
Node01 hdisk2
Node02 hdisk2
COMMENTS
After the networks are recorded, document the interfaces and IP addresses that are used by
PowerHA, as shown in Table 3-8 on page 106.
Node01
Node02
COMMENTS Each node contains 2 base adapters, each in their own subnet.
Each node also contains a persistent (node bound) address and a service address.
IPAT through aliases is used
– Volume group major numbers are unique. Though only required for NFS it is a good
common practice to have them match across nodes.
– Determine if mirroring of data is required.
When planning for a repository disk in case of a multi-site cluster solution, understand these
clusters:
Stretched cluster
Requires and shares only one repository disk. When implementing the cluster
configuration with multiple storage in different sites, consider allocating the CAA repository
and the backup repositories in different storages across the sites for increasing the
availability of the repository disk in case of a storage failure. As an example, when using a
cross-site LVM mirroring configuration with a storage subsystem in each site, you can
allocate the primary disk repository in site 1 and the backup repository on the storage in
site 2.
Linked clusters
Requires a repository disk to be allocated to each site. If there is no other storage at a site,
plan to allocate the backup repository disk on a different set of disks (other arrays) within
the same storage for increasing the repository disk availability in case of disk failures.
your administrative procedures and documentation with the backup disk information. You
can also replace a working repository disk with a new one to increase the size or to
change to a different storage subsystem. To replace a repository disk, you can use the
SMIT interface or clmgr command line. The cluster ahaFS event REP_UP occurs upon
replacement.
Varied on: All shared disks must be “zoned” to any cluster nodes requiring access to the
specific volumes. That is, the shared disks must be able to be varied on and accessed by
any node that has to run a specific application.
Be sure to verify that shared volume groups can be manually varied on each node.
In a PowerHA cluster, shared disks are connected to more than one cluster node. In a
non-concurrent configuration, only one node at a time owns the disks. If the owner node fails
to restore service to clients, another cluster node in the resource group node list acquires
ownership of the shared disks and restarts applications.
When working with a shared volume group be sure to not perform any of the following actions:
Do not use any internal disk in a shared volume group because it will not be accessible by
other nodes.
Do not auto-varyon the shared volume groups in a PowerHA cluster at system boot.
Ensure that the automatic varyon attribute in the AIX ODM is set to No for shared volume
groups being part of resource groups. You can use the cluster verification utility to
auto-change this attribute.
Important: If you define a volume group to PowerHA, do not manage it manually on any
node outside of PowerHA while PowerHA is running. This can lead to unpredictable
results. Always use C-SPOC to maintain the shared volume groups.
PowerHA requires all shared volume groups be a type enhanced concurrent regardless of
how it is to be used. For typical non-concurrent configurations this enables the fast disk
takeover feature. When the volume group is activated in enhanced concurrent mode, the LVM
allows access to the volume group on all nodes. However, it restricts the higher-level
connections, such as JFS mounts and NFS mounts, on all nodes, and allows them only on
the node that currently owns the volume group.
Note: Although you must define enhanced concurrent mode volume groups, this does not
necessarily mean that you will use them for concurrent access. For example, you can still
define and use these volume groups as normal shared file system access. However, you
cannot define file systems on volume groups that are intended for concurrent access.
The following operations are not allowed when a volume group is varied on in the passive
state:
Operations on file systems, such as mount
Any open or write operation on logical volumes
Synchronizing volume groups
Active mode is similar to a non-concurrent volume group being varied online with the
varyonvg command. It provides full read/write access to all logical volumes and file systems,
and it supports all LVM operations.
Passive mode is the LVM equivalent of disk fencing. Passive mode allows readability only of
the VGDA and the first 4 KB of each logical volume. It does not allow read/write access to file
systems or logical volumes. It also does not support LVM operations.
When a resource group, containing an enhanced concurrent volume group, is brought online,
the volume group is first varied on in passive mode and then it is varied on in active mode.
The active mode state applies only to the current resource group owning node. As any other
resource group member node joins the cluster, the volume group is varied on in passive
mode.
When the owning/home node fails, the fallover node changes the volume group state from
passive mode to active mode through the LVM. This change takes approximately 10 seconds
and is at the volume group level. It can take longer with multiple volume groups with multiple
disks per volume group. However, the time impact is minimal compared to the previous
method of breaking SCSI reserves.
The active and passive mode flags to the varyonvg command are not documented because
they should not be used outside a PowerHA environment. However, you can find it in the
hacmp.out log.
Active mode varyon command:
varyonvg -n -c -A -O app2vg
Passive mode varyon command:
varyonvg -n -c -P app2vg
Important: Also, do not run these commands unless directed to do so from IBM support
and not without the cluster services running.
To determine if the volume group is online in active or passive mode, verify the VG PERMISSION
field from the lsvg command output, as shown in Figure 3-8 on page 111.
There are other distinguishing LVM status features that you will notice for volume groups that
are being used in a fast disk takeover configuration. For example, the volume group will show
online in concurrent mode on each active cluster member node by using the lspv command.
However, the lsvg -o command reports only the volume group online to the node that has it
varied on in active mode. An example of how passive mode status is reported is shown in
Figure 3-8 on page 111.
When a non-concurrent style resource group is brought online, PowerHA checks one of the
volume group member disks to determine whether it is an enhanced concurrent volume
group. PowerHA determines this with the lqueryvg -p devicename -X command. A return
output of 0 (zero) indicates a regular non-concurrent volume group. A return output of 32
indicates an enhanced concurrent volume group.
In Figure 3-9, hdisk0 is a rootvg member disk that is non-concurrent. The hdisk6 instance is
an enhanced concurrent volume group member disk.
relies on LVM and storage mechanisms (RAID) to protect against disk failures, therefore it is
imperative that you make the disk infrastructure highly available.
After you establish a highly available disk infrastructure, also consider the following items
when designing your shared volume groups:
All shared volume groups have unique logical volume and file system names. This
includes the jfs and jfs2 log files.
PowerHA 7.2.7 also supports JFS2 with INLINE logs and this is generally recommended
for use instead of dedicated log devices.
Major numbers for each volume group are unique within a node. Though optional, unless
using NFS, it is generally recommended to make the major numbers for each volume
group match on all nodes.
JFS2 encrypted file systems (EFS) are supported. For more information about using EFS
with PowerHA, see 8.5, “Federated security for cluster-wide security management” on
page 346.
Figure 3-10 on page 113 shows the basic components in the external storage. Notice that all
logical volumes and file system names are unique, as is the major number for each volume
group. The data is made highly available through the use of SAN disk and redundant paths to
the devices.
Enhanced Concurrent
Volume groups
app1vg app2vg
Major #90 Major #91
vpath0 vpath1
Document the shared volume groups and physical disks as shown in Table 3-9.
Node01 Node02
COMMENTS All disks are seen by both nodes. app1vg normally resides on Node01, app2vg
normally resides on Node02.
AESPArg app1vg NA
Major Number = 90
log = app1vglog
Logical Volume 1 = app1lv1
Filesystem 1 = /app1 (20 GB)
NMIXXrg app2vg NA
Major Number = 91
log = app2vglog
Logical Volume 1 = app2lv1
Filesystem 1 = /app2 (20 GB)
COMMENTS Create the shared Volume Group using C-SPOC after ensuring
PVIDs exist on each node.
When planning for an application to be highly available, be sure you understand the resources
required by the application and the location of these resources in the cluster. This helps you
provide a solution that allows them to be handled correctly by PowerHA if a node fails.
You must thoroughly understand how the application behaves in a single-node and multi-node
environment. Be sure that, as part of preparing the application for PowerHA, you test the
execution of the application manually on both nodes before turning it over to PowerHA to
manage. Do not make assumptions about the application’s behavior under fallover conditions.
Note: The key prerequisite to making an application highly available is that it first must run
correctly in stand-alone mode on each node on which it can reside.
Be sure that the application runs on all required nodes properly before configuring it to be
managed by PowerHA.
When you plan for an application to be protected in a PowerHA 7.2.7 cluster, consider the
following actions:
Ensure that the application is compatible with the version of AIX used.
Ensure that the application is compatible with the shared storage solution, because this is
where its data will reside.
Have adequate system resources (CPU, memory), especially in the case when the same
node will be hosting all the applications part of the cluster.
Ensure that the application runs successfully in a single-node environment. Debugging an
application in a cluster is more difficult than debugging it on a single server.
Lay out the application and its data so that only the data resides on shared external disks.
This arrangement not only prevents software license violations, but it also simplifies failure
recovery.
If you plan to include multitiered applications in parent/child-dependent resource groups in
your cluster, such as a database and application server, PowerHA provides a SMIT menu
where you can specify this relationship.
Write robust scripts to both start and stop the application on the cluster nodes. The startup
script must be able to recover the application from an abnormal termination. Ensure that
they run properly in a single-node environment before including them in PowerHA.
Confirm application licensing requirements. Some vendors require a unique license for
each processor that runs an application, which means that you must license-protect the
application by incorporating processor-specific information into the application when it is
installed.
As a result, even though the PowerHA software processes a node failure correctly, it might
be unable to restart the application on the fallover node because of a restriction on the
number of licenses for that application available within the cluster. To avoid this problem,
be sure that you have a license for each system unit in the cluster that might potentially
run an application.
Verify that the application uses a proprietary locking mechanism if you need concurrent
access.
Tip: When you plan the application, remember that If the application requires any manual
intervention, it is not suitable for a PowerHA cluster.
After you create an application controller, associate it with a resource group (RG). PowerHA
then uses this information to control the application. See 2.4.4, “Application controller scripts”
on page 50 for more details.
When defining your custom monitoring method, consider the following points:
You can configure multiple application monitors, each with unique names, and associate
them with one or more application servers.
The monitor method must be an executable program, such as a shell script, that tests the
application and exits, returning an integer value that indicates the application’s status. The
return value must be zero if the application is healthy, and must be a non-zero value if the
application failed.
PowerHA does not pass arguments to the monitor method.
The monitoring method logs messages to the following monitor log file:
/var/hacmp/log/clappmond.application_name.resource_group_name.monitor.log
Also, by default, each time the application runs, the monitor log file is overwritten.
Do not over complicate the method. The monitor method is terminated if it does not return
within the specified polling interval.
Important: Because the monitoring process is time-sensitive, always test your monitor
method under different workloads to arrive at the best polling interval value.
For more information, see 7.7.10, “Measuring application availability” on page 335.
Update the application worksheet to include all required information, as shown in Table 3-11.
APP1
VERIFICATION Run the following command and ensure APP1 is active. If not, send
COMMANDS AND notification.
PROCEDURES
APP2
VERIFICATION Run the following command and ensure APP2 is active. If not, send
COMMANDS AND notification.
PROCEDURES
Update the application monitoring worksheet to include all the information required for the
application monitoring tools (Table 3-12).
APP1
Instance Count 1
Stabilization Interval 30
Restart Count 3
Restart Interval 95
APP2
Instance Count 1
Stabilization Interval 30
Restart Count 3
Restart Interval 95
The following rules and restrictions apply to resources and resource groups:
In order to be made highly available by PowerHA a cluster resource must be part of a
resource group. If you want a resource to be kept separate, you can define a group for that
resource alone. A resource group can have one or more resources defined.
A resource cannot be included in more than one resource group.
Put the application server and its required resources in the same resource group (unless
otherwise needed).
If you include a node in participating node lists for more than one resource group, make
sure the node can sustain all resource groups simultaneously.
After you decide what components to group into a resource group, plan the behavior of the
resource group.
Table 3-13 summarizes the basic startup, fallover, and fallback behaviors for resource groups
in PowerHA.
Online on home node only Fallover to next priority Never fall back
(OHNO) for the resource group. node in the list Fall back to higher priority
Fallover using Dynamic node in the list
Node Priority
Online using node distribution Fallover to next priority Never fall back
policy. node in the list
Fallover using Dynamic
Node Priority
Online on first available node Fallover to next priority Never fall back
(OFAN). node in the list Fall back to higher priority
Fallover using Dynamic node in the list
Node Priority
Bring offline (on error node
only)
Online on all available nodes. Bring offline (on error node Never fall back
only)
If the node that is starting is a home node for this resource group, the settling time period is
skipped and PowerHA immediately attempts to acquire the resource group on this node. More
details and this feature including how to modify it can be found at 10.2, “Settling time attribute”
on page 426.
Note: This is a cluster-wide setting and will be set for all OFAN resource groups.
If you decide to define dynamic node priority policies using RMC resource variables to
determine the fallover node for a resource group, consider the following points about dynamic
node priority policy:
It is most useful in a cluster where all nodes have equal processing power and memory.
It is irrelevant for clusters of fewer than three nodes.
It is irrelevant for concurrent resource groups.
Remember that selecting a takeover node also depends on conditions such as the availability
of a network interface on that node. For more information about configuring DNP with
PowerHA, see 10.5, “Dynamic node priority (DNP)” on page 436.
currently resides on a non-home node falls back to the higher priority node at the specified
time. More details about this feature are in 10.6, “Delayed fallback timer” on page 446.
By default, all resource groups are processed in parallel, PowerHA processes dependent
resource groups according to the order dictated by the dependency, and not necessarily in
parallel. Resource group dependencies are honored cluster-wide and override any
customization for serial order of processing of any resource groups included in the
dependency. Dependencies between resource groups offer a predictable and reliable way of
building clusters with multi-tiered applications.
For more information about resource group dependences, see 10.7, “Resource group
dependencies” on page 449.
Startup Policy Online on Home Node Only Online on Home Node Only
(OHNO) (OHNO)
Fallover Policy Fallover to Next Priority Node in Fallover to Next Priority Node in
List (FONP) List (FONP)
Fallback Policy Fallback to Higher Priority Node Fallback to Higher Priority Node
(FBHP) (FBHP)
Settling Time
Runtime Policies
Tape Resources
Miscellaneous Data
Table 3-15 outlines a sample test plan that can be used to test our cluster.
The Cluster Test Tool uses the PowerHA Cluster Communications daemon to communicate
between cluster nodes to protect the security of your PowerHA cluster.
Full details about using the Cluster Test Tool, and details about the tests it can run, can be
found in 6.8, “Cluster Test Tool” on page 222.
If this is a new installation, allow time to configure and test the basic cluster. After the cluster
is configured and tested, you can integrate the required applications during a scheduled
maintenance window.
Referring back to Figure 3-1 on page 75, you can see that there is a preparation step before
installing PowerHA. This step is intended to ensure the infrastructure is ready for PowerHA.
This typically involves using your planning worksheets and cluster diagram to prepare the
nodes for installation. Ensure that these items are in place:
The node software and operating system prerequisites are installed.
The network connectivity is properly configured.
The shared disks are properly configured.
The chosen applications are able to run on either node.
The preparation step can take some time, depending on the complexity of your environment
and the number of resource groups and nodes to be used. Take your time preparing the
environment because there is no purpose in trying to install PowerHA in an environment that
is not ready; you will spend your time troubleshooting a poor installation. Remember that a
well configured cluster is built upon solid infrastructure.
After the cluster planning is complete and environment is prepared, the nodes are ready for
PowerHA to be installed.
The installation of PowerHA 7.2.7 code is straight forward. If you use the installation CD, use
SMIT to install the required file sets. If you use a software repository, you can use NFS to
mount the directory and use SMIT to install from this directory. You can also install through
NIM.
Ensure you have licenses for any features you install, such as PowerHA 7.2.7 Enterprise
Edition.
After you install the required file sets on all cluster nodes, use the previously completed
planning worksheets to configure your cluster. Here you have a few tools available to use to
configure the cluster:
The PowerHA SMUI.
The ASCII SMIT menus.
The clmgr command line.
Note: When you configure the cluster, be sure to start by configuring the cluster topology.
This consists of the nodes, repository disk, and heartbeat type. After the cluster topology is
configured, verify and synchronize the cluster. This will create the CAA cluster.
After the topology is successfully verified and synchronized, start the cluster services and
verify that all is running as expected. This will allow you to identify any networking issues
before moving forward to configuring the cluster resources.
After you configure, verify, and synchronize the cluster, run the automated Cluster Test Tool to
validate cluster functionality. Review the results of the test tool and if it was successful, run
any custom tests you want to perform further verification.
After successful testing, take a mksysb of each node and a cluster snapshot from one of the
cluster nodes. The cluster will be ready for production. Standard change and problem
management processes now apply to maintain application availability.
The cluster snapshot does not save any user-customized scripts, applications, or other non
PowerHA configuration parameters. For example, the names of application servers and the
locations of their start and stop scripts are stored in the HACMPserver Configuration
Database object class. However, the scripts themselves and also any applications they might
call are not saved.
The cluster snapshot utility stores the data it saves in two separate files:
ODM data file (.odm):
This file contains all the data stored in the HACMP Configuration Database object classes
for the cluster. This file is given a user-defined basename with the.odm file extension.
Because the Configuration Database information is largely the same on every cluster
node, the cluster snapshot saves the values from only one node.
Cluster state information file (.info):
This file contains the output from standard AIX and PowerHA commands. This file is given
the same user-defined base name with the .info file extension. By default, this file no
longer contains cluster log information. Note that you can specify in SMIT that PowerHA
collect cluster logs in this file when cluster snapshot is created.
For a complete backup, take a mksysb of each cluster node according to your standard
practices. Pick one node to perform a cluster snapshot and save the snapshot to a safe
location for disaster recovery purposes. It is recommended to create the snapshot before
taking the mksysb of the node so that it is included in the system backup.
Important: You can take a snapshot from any node in the cluster, even if PowerHA is
down. However, you can apply a snapshot to a cluster only if all nodes are running the
same version of PowerHA and all are available. Details on creating a snapshot can be
found at:
https://www.ibm.com/docs/en/powerha-aix/7.2?topic=configurations-creating-snaps
hot-cluster-configuration
Though not related of PowerHA configuration data specifically, PowerHA 7.2.3 and later offers
the capability to perform cloud backups to the IBM Cloud and AWS. More details about this
capability can be found at:
https://www.ibm.com/docs/en/powerha-aix/7.2?topic=data-planning-backing-up
We suggest that you maintain an accurate cluster diagram which can be used for change and
problem management. In addition, PowerHA provides updates to the clmgr command to
enable creating an HTML based report from the cluster.
The output can be generated for the whole cluster configuration or limited to special
configuration items such as these:
nodeinfo
rginfo
lvinfo
fsinfo
vginfo
dependencies
Figure 3-12 shows the generated report. The report is far longer than depicted. On a real
report, you can scroll through the report page for further details.
Tip: For a full list of available options use the clmgr build in help:
clmgr view report -h
Effective change and problem management processes are imperative to maintaining cluster
availability. To be effective, you must have a current cluster configuration handy. You can use
the clmgr HTML tool to create an HTML version of the configuration and, as we also suggest,
a current cluster diagram.
Any changes to the cluster should be fully investigated as to their effect on the cluster
functionality. Even changes that do not directly affect PowerHA, such as the addition of extra
non PowerHA workload, can affect the cluster. The changes should be planned, scheduled,
documented, and then tested on a test cluster before ever implementing in production.
To ease your implementation of changes to the cluster, PoweHA provides the Cluster Single
Point of Control (C-SPOC) SMIT menus. Whenever possible, use the C-SPOC menus to
make changes. With C-SPOC, you can make changes from one node and the change will be
propagated to the other cluster nodes.
Problems with the cluster should be quickly investigated and corrected. Because the primary
job of PowerHA is to mask any errors from applications, it is quite possible that unless you
have monitoring tools in place, you might be unaware of a fallover. Ensure that you make use
of error notification to notify the appropriate staff of failures.
Note: Besides the cluster configuration, the planning phase should also provide a
cluster testing plan. Use this testing plan in the final implementation phase, and also
during periodic cluster validations.
– Configure an additional 1 GB shared LUN between all cluster nodes for cluster
repository disk.
6. Installing and configuring application software.
The application software must be configured and tested to run as a stand-alone system.
Also perform a manual movement and testing of the application on all nodes designated
for application in the HA cluster, as follows:
a. Create and test the application start and stop scripts (before integrating it to the
PowerHA); make sure the application is able to recover from unexpected failures, and
that the application start/stop scripts function as expected on all nodes designated for
running this application. Also check the time for the start/stop execution – for event
time-out tunables.
b. Create and test the application monitoring scripts (if you want) on all nodes designated
to run the application. This can be a simple script to monitor a “#ps -ef” process if still
running.
7. Installing the PowerHA software.
Installing PowerHA can be performed using SMIT, installp, or NIM. A reboot is no longer
required by PowerHA. However, it might be required by RSCT prerequisites. Example 4-1
shows a list of installed PowerHA filesets.
Note: There is a cluster test tool included, see 6.8, “Cluster Test Tool” on page 222, to
simulate cluster component failure but we suggest that you perform a manual testing of
the cluster.
Before you decide which approach to use, make sure that you have done the necessary
planning, and that the documentation for your cluster is available for use. See Chapter 3,
“Planning” on page 73 for more information.
The clmgr command line utility can be used for configuring a cluster. The commands for
creating a basic cluster are shared throughout this chapter. However for more details about
using clmgr see the PowerHA SystemMirror for AIX guide.
In the following scenario we configure a typical two-node hot standby cluster using the
standard method within SMIT.
The following prerequisites, assumptions, and defaults apply for the Standard Configuration
Path:
PowerHA software must be installed on all nodes of the cluster. See Example 4-1 on
page 133.
All network interfaces must be completely configured at the operating system level. You
must be able to communicate from one node to each of the other nodes and vice versa.
Check cluster node communication on both nodes as follows:
#clrsh -n <node1_name> date
#clrsh -n <node2_name> date
All boot and service IP addresses must be configured in /etc/hosts.
When you use the standard configuration path and information that is required for
configuration resides on remote nodes, PowerHA automatically discovers the necessary
cluster information for you. Cluster discovery is run, on all not just the local node,
automatically while you use the standard configuration path.
PowerHA assumes that all network interfaces on a physical network belong to the same
PowerHA network. If any interface is not desired to be part of the cluster it must be listed in
the /etc/cluster/ifrestrict file, or added to a private network.
The host name must resolve to an interface and by default will be the same as the cluster
node names.
One free shared disk, of at least 1 GB, to be specifically used for cluster repository disk.
CAA services checks to be performed on all nodes as follows:
# egrep "caa|clusterconf" /etc/services /etc/inetd.conf /etc/inittab
/etc/services:clcomd_caa 16191/tcp
/etc/services:caa_cfg 6181/tcp
/etc/inetd.conf:caa_cfg stream tcp6 nowait root /usr/sbin/clusterconf
clusterconf >>/var/adm/ras/clusterconf.log 2>&1
/etc/inittab:clusterconf:23456789:once:/usr/sbin/clusterconf
To check the PVIDs, volume group state, UUIDs, and UDIDs of the physical volumes
execute lspv -u.
Verify that all shared disks, including the repository, has ODM Reservation Policy is set to
NO RESERVE on all nodes as shown in Example 4-2 on page 137. If any of them say
anything else. the attribute should be changed where needed using this command:
chdev -l hdisk# -a reserve_policy=no_reserve -P
Note the -P is only needed if the disk is busy in an active volume. It sets the value to
become active on next reboot.
Using the options under the Custom Configuration menu you can add the basic components
of a cluster to the PowerHA configuration database, and also other types of behaviors and
resources. Use the custom configuration path to customize the cluster components such as
policies and options that are not included in the standard menu.
Use the custom configuration path if you plan to use any of the following options:
Custom cluster initial setup
Custom cluster Tunables and Heartbeat
Custom Disk Methods
Custom Volume Group Methods
Custom File System Methods
Customize Resource Recovery
Customize Inter-Site Resource Group Recovery
Create User Defined Events
Modify Pre/Post Event commands
Remote Notification Warnings
Change Warning Notification time
Change System Events (rootvg)
Advance method of Cluster Verification and Synchronization
Before configuring the cluster ensure that /etc/cluster/rhosts is populated with the host
name IP addresses for each node in the cluster on every node in the cluster. Also the clcomd
daemon must be running and refreshed on each node as shown in Example 4-3 on page 138.
When you use the standard configuration path, the node name and system host name are
expected to be the same. If you want them to differ, change them manually.
After you select the options and press Enter, the discovery process runs. This discovery
process automatically configures the networks so you do not have to do it manually.
[Entry Fields]
* Cluster Name [r-testcluster]
New Nodes (via selected communication paths) [hacmp60]
+
Currently Configured Node(s) hacmp59
Figure 4-1 Add cluster and nodes
To create a create the base cluster with repository disk from the clmgr command line execute:
clmgr add cluster <clustername> repository=hdiskX
nodes=<node1_hostname>,<node2_hostname> HEARTBEAT_TYPE={unicast|multicast}
TYPE={NSC|SC|LC}
For the TYPE field, if using SC (Stretched Cluster) or LC (Linked Cluster) then sites must also
be defined. If using LC then a repository disk at each site must also be defined.
[Entry Fields]
* Cluster Name r-testcluster
* Heartbeat Mechanism Unicast +
* Repository Disk [(00f87c4bf90dad3b)] +
Cluster Multicast Address []
(Used only for multicast heartbeat)
Figure 4-2 Add repository and heartbeat method
In our example, we use unicast instead of multicast. For the repository disk, we highlight the
field and press F4 to see a list of all free shared disks between the two nodes. The disk size
must be at least 1 GB, however PowerHA discovery does not verify that the size is adequate.
We run smitty sysmirror, select Cluster Nodes and Networks → Verify and Synchronize
Cluster Configuration, and press Enter three times for synchronization to begin. This can
take several minutes for it to create the CAA cluster. or you can use clmgr sync cluster
command.
NODE hacmp59:
Network net_ether_01
hacmp59 10.1.1.59
Network net_ether_02
hacmp59_hb 192.168.100.59
NODE hacmp60:
Network net_ether_01
hacmp60 10.1.1.60
Network net_ether_02
hacmp60 192.168.100.60
Same PVID: Historically this has been a generally recommended practice. However many
versions of PowerHA, both in C-SPOC and even initial cluster creation when choosing a
repository disk, will scan all disks with no PVID and match them up by UUID and then
automatically assign PVIDs to them.
To create a service IP labels run smitty sysmirror, select Cluster Applications and
Resources → Resources → Configure Service IP Labels/Addresses → Add a
Service IP Label/Address, and then press Enter. Then choose the network from the list. The
final SMIT menu is displayed, as shown in Figure 4-3.
[Entry Fields]
* IP Label/Address Austinserv +
Netmask(IPv4)/Prefix Length(IPv6) []
* Network Name net_ether_01
Figure 4-3 Add a Service IP Label
For the IP Label/Address field press F4 and a picklist will be generated of entries in the
/etc/hosts file that are not already defined to the cluster.
[Entry Fields]
* Resource Group Name [demoRG]
* Participating Nodes (Default Node Priority) [hacmp59 hacmp60] +
Complete the fields as shown in Figure 4-4. To complete the Participating Nodes field, enter
the information, separated by a space, or select from a picklist by first highlighting the field
and then pressing the F4 key.
Important: When selecting nodes from the picklist, they are both displayed, and will be
listed in the Participating Nodes field, in alphanumeric order, This could lead to an
unintended result. For the example in Figure 4-4, if we chose both nodes from the picklist
they will be in the order shown. If we really wanted hacmp60 to be listed first we have to
manually type it in the field.
For more information about resource group policy options, see 2.4.8, “Resource groups” on
page 52.
We run smitty cspoc, select Storage → Volume Groups → Create a Volume Group,
choose both nodes, choose one or more disks from the picklist, and choose a volume group
type from the picklist. The final SMIT menu is displayed, as shown in Figure 4-5.
Notice the Resource Group Name field. This gives the option to automatically create the
resource group and put the volume group resource into the resource group.
Important: When you choose to create a new resource group from C-SPOC, the resource
group will be created with the following default policies:
Startup: Online on home node only
Fallover: Fallover to next priority node in the list
Fallback: Never fallback
Repeat this procedure for all volume groups that will be configured in the cluster.
To create a shared volume group from the clmgr command line execute:
clmgr add volume_group [ <vgname> ] \
[ NODES="<node#1>,<node#2>[,...>]" ]\
[ PHYSICAL_VOLUMES="<hdisk#1>[,<hdisk#2>,...]" ]\
[ TYPE={original|big|scalable|legacy} ] \
[ RESOURCE_GROUP=<RESOURCE_GROUP> ] \
[ PPART_SIZE={4|1|2|8|16|32|64|128|256|512|1024} ] \
[ MAJOR_NUMBER=## ] \
[ MAX_PHYSICAL_PARTITIONS={32|64|128|256|512|768|1024} ] \
[ MAX_LOGICAL_VOLUMES={256|512|1024|2048} ] \
[ CRITICAL={false|true} ] \
[ ENABLE_LV_ENCRYPTION={yes|no} ]
Example 4-5 C-SPOC creating new logical volume disk pick list
+--------------------------------------------------------------------------+
| Select the Physical Volumes to hold the new Logical Volume |
| |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| # Reference node Physical Volume Name |
| hacmp59 hdisk6 |
| |
| F1=Help F2=Refresh F3=Cancel |
| F7=Select F8=Image F10=Exit |
F1| Enter=Do /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
The new logical volume, karimlv, is created and information is propagated on the other
cluster nodes. Repeat this step as needed for each logical volume.
LOGICAL_PARTITIONS=## \
[ DISKS="<hdisk#1>[,<hdisk#2>,...]" ] \
[ TYPE={jfs|jfs2|sysdump|paging|jfslog|jfs2log|aio_cache|boot} ] \
[ POSITION={outer_middle|outer_edge|center|inner_middle|inner_edge } ] \
[ ENABLE_LV_ENCRYPTION={yes|no} ] \
[ AUTH_METHOD={keyserv|pks} ] \
[ METHOD_DETAILS=<key server ID> ] \
[ AUTH_METHOD_NAME=<Alias name for auth method>
Tip: If intending to use inline logs this step can be skipped. That option is specified at the
time of file system creation.
Important: If logical volume type jfs2log is created, C-SPOC automatically runs the logform
command so that the type can be used.
Example 4-8 C-SPOC creating jfs2 file system on an existing logical volume
Add an Enhanced Journaled File System on a Previously Defined Logical Volume
Important: File systems are not allowed on volume groups that are a resource in an
“Online on All Available Nodes” type resource group.
The /demofs file system is now created. The contents of /etc/filesystems on both nodes are
now updated with the correct jfs2log. If the resource group and volume group are online the
file system is mounted automatically after creation.
Tip: With JFS2, you also have the option to use inline logs that can be configured from the
options in the previous example.
Make sure the mount point name is unique across the cluster. Repeat this procedure as
needed for each file system.
To add an application controller, run smitty sysmirror, select Cluster Applications and
Resources → Resources → Configure User Applications (Scripts and Monitors) →
Application Controller Scripts → Add Application Controller Scripts, and then press
Enter.
[Entry Fields]
* Application Controller Name [bannerapp]
* Start Script [/usr/bin/banner start]
* Stop Script [/usr/bin/banner stop]
Application Monitor Name(s)
+
Application startup mode [background] +
Figure 4-6 Create application controller
In our case, we do not have a real application so we use the banner command instead.
Repeat as needed for each application. To create the application controller from the
command line execute:
clmgr add application_controller bannerapp startscript="/usr/bin/banner start"
stopscript="/usr/bin/banner stop"•
Choose the resource group from the pop-up list. The final SMIT menu opens and is shown in
Figure 4-7 on page 147.
You can press F4 on each of the resource types and choose from the generated picklist of
previously created resources. This ensures that they indeed exist and minimizes the chance
of errors, such as a typographical error.
Run smitty sysmirror, select Custom Cluster Configuration → Verify and Synchronize
Cluster Configuration (Advanced), and then press Enter. The menu of options are listed as
shown in Figure 4-8 on page 148.
Although most options are self-explanatory, one needs further explanation: Automatically
correct errors found during verification. This option is useful and can be used only from this
advanced option. It can correct certain problems automatically, or if you can have it run
interactively it will prompt for approval before correcting.
For more information about this option, see 7.6.6, “Running automatic corrective actions
during verification” on page 302.
[Entry Fields]
* Verify, Synchronize or Both [Both] +
* Include custom verification library checks [Yes] +
* Automatically correct errors found during [Interactively] +
verification?
To use clmgr command with verbose output use the command clmgr sync cluster FIX=yes
VERIFY=yes. After successful synchronization, you can start testing the cluster. For more
information about cluster testing, see 6.8, “Cluster Test Tool” on page 222.
The SMUI server does not need to be a cluster node member, and often is a stand alone AIX
virtual server. It does not have to be dedicated as a SMUI server. For example in our
environment it is also running on the same system as our NIM server.
Browser Support
PowerHA SystemMirror GUI is supported in the following web browsers:
Google Chrome Version 57, or later
Firefox Version 54, or later
AIX requirements
For the cluster nodes, cluster.es.smui.agent and cluster.es.smui.common filesets must be
running one of the following versions of the AIX operating system:
AIX Version 7.1 Service Pack 6, or later
AIX Version 7.2 Service Pack 1, or later
AIX 7.3 Service Pack 1, or later
SMUI Filesets
The SMUI filesets are within the PowerHA installation images. The filesets and their details
are as follows:
cluster.es.smui.agent: The SMUI agent fileset should be installed on all nodes to be
managed by the PowerHA SystemMirror GU.
cluster.es.smui.common: As the name implies, it is common to both and should be
installed on both the cluster nodes and the SMUI server.
cluster.es.smui.server: This fileset only needs to be installed on the designated SMUI
server system.
If the SMUI filesets are already installed on all desired cluster nodes then skip this section and install the
SMUI Server. If not, go to the PowerHA installation media path and run the execute the following:
1. smitty install_all
2. Enter full path of PowerHA install images
3. Press F4 and get list of filesets
4. Find cluster.es.smui and highlight and press F7 as shown in Figure 4-9
5. Press Enter twice to finish selection execute.
After the installation is completed, you can verify the filesets are installed as shown in
Example 4-10. Unlike the clients, there is additional work required on the server. You must
execute the smuiinst.ksh script. As default, this script requires internet access to download
additional required packages. However it is not uncommon for this server to not have internet
access so we will provide the steps of how to perform an offline install in “Offline installation”
on page 152.
Additional packages are required and are shown in Example 4-11 on page 151. The levels
ultimately installed will vary based on the exact PowerHA and AIX level. For example, at time
of writing the script was not downloading AIX v7.3 specific rpms and we had to download
some of them manually.
Online installation
If the designated SMUI server does have internet access then the smuiinst.ksh script can be
executed locally. It will automatically download and install the required packages, start the
SMUI server, phauiserver, and provide the URL information needed to access the SMUI.
Similar to the output shown in
After SMUI is installed and you access the URL provided, a login window is displayed as
shown in Figure 4-11. The initial login credentials will be using root and its password. After
that additional uses can be configured if desired.
Offline installation
To perform an offline install using the smuiinst.ksh script execute the following:
1. Copy the /usr/es/sbin/cluster/ui/server/bin/smuiinst.ksh file from the PowerHA
SystemMirror GUI server to a system that is running the same operating system and has
internet access.
2. From the system that has internet access run the smuiinst.ksh -d /directory command
where /directory is the location where you want to the download the files listed in
Example 4-11 on page 151.
3. Copy the downloaded files from /directory to a directory on the PowerHA SystemMirror
GUI server.
In our example scenario it is /home/sbodily/smui727.
4. From the SMUI server, run the smuiinst.ksh -i /directory command where /directory is
the location where you copied the downloaded files.
Demo: A demo of performing an offline installation of the PowerHA SMUI is available at:
https://youtu.be/cPclpxyzNG4
During the smuiinst.ksh execution, the rpms are installed and the SMUI server service is
started. It also displays a URL for the PowerHA SystemMirror GUI server similar to what is
shown in Example .
https://sbodilysmui.labsys.com:8080/#/login
After you log in, you can add existing clusters in your environment to the
PowerHA SystemMirror GUI.
Chapter 5. Migration
This chapter covers the most common migration scenarios to PowerHA 7.2.7.
Before the migration, always a have a backout plan in case any problems are encountered.
Some general suggestions are as follows:
Create a backup of rootvg.
In most cases of upgrading PowerHA, updating or upgrading AIX is also required. So
always save your existing rootvg. Our preferred method is to create a clone by using
alt_disk_copy to another free disk on the system. That way, a simple change to the
bootlist and a reboot can easily return the system to the beginning state.
Other options are available, such as mksysb, alt_disk_install, and multibos.
Save the existing cluster configuration.
Create a cluster snapshot before the migration. By default it is stored in the following
directory; make a copy of it and also save a copy from the cluster nodes for additional
insurance.
/usr/es/sbin/cluster/snapshots
Save any user-provided scripts.
This most commonly refers to custom events, pre- and post-events, application controller,
and application monitoring scripts.
Verify, by using the lslpp -h cluster.* command, that the current version of PowerHA is in
the COMMIT state and not in the APPLY state. If not, run smit install_commit before you
install the most recent software version.
Software requirements
The AIX requirement is one of the following as the bare minimum:
AIX7.1 TL05 SP10
AIX7.2 TL01 SP6
AIX7.2 TL02 SP6
AIX7.2 TL03 SP7
AIX7.2 TL04 SP6
AIX7.2 TL05 SP5
AIX7.3 TL00 SP2
AIX7.3 TL01 SP1
Important: Additional APARs are also recommended but the list does change on
occasion. Details can be found at
https://www.ibm.com/docs/en/powerha-aix/7.2?topic=powerha-systemmirror-aix-code
-level-reference-table
Hardware requirements
Use IBM systems that run IBM POWER8®, IBM POWER9, or IBM POWER10
technology-based processors.
Note: The TME attribute is not supported on 16 Gb or faster Fibre Channel adapters. For
the most current list of supported Fibre Channel adapters, contact your IBM
representative.
Note: In our experience, even a three-node cluster does not use more than 500 MB; it
uses 448 MB of the repository disk. However, we suggest to simply use a 1 GB disk for up
to four node clusters and maybe add 256 MB for each additional node.
Multicast or unicast
PowerHA v7.2.x offers the choice of using either multicast or unicast for heartbeating.
However, if you want to use multicast, ensure that your network both supports and has
multicasting enabled. For more information, see 12.1, “Multicast considerations” on page 478.
Path: /etc/objrepos
cluster.es.assist.db2 7.2.7.0 COMMITTED PowerHA SystemMirror Smart
Assist for DB2
cluster.es.assist.dhcp 7.2.7.0 COMMITTED PowerHA SystemMirror Smart
Assist for DHCP
cluster.es.assist.dns 7.2.7.0 COMMITTED PowerHA SystemMirror Smart
Assist for DNS
cluster.es.assist.domino 7.2.7.0 COMMITTED PowerHA SystemMirror
SmartAssist for IBM Lotus
domino server
cluster.es.assist.filenet 7.2.7.0 COMMITTED PowerHA SystemMirror Smart
Assist for FileNet P8
cluster.es.assist.ihs 7.2.7.0 COMMITTED PowerHA SystemMirror Smart
Assist for IBM HTTP Server
cluster.es.assist.maxdb 7.2.7.0 COMMITTED PowerHA SystemMirror Smart
Assist for SAP MaxDB
cluster.es.assist.oraappsrv
7.2.7.0 COMMITTED PowerHA SystemMirror Smart
Assist for Oracle Application
Server
cluster.es.assist.oracle 7.2.7.0 COMMITTED PowerHA SystemMirror Smart
Assist for Oracle
cluster.es.assist.printServer
7.2.7.0 COMMITTED PowerHA SystemMirror Smart
Assist for Print Subsystem
cluster.es.assist.sap 7.2.7.0 COMMITTED PowerHA SystemMirror Smart
Assist for SAP
cluster.es.assist.tds 7.2.7.0 COMMITTED PowerHA SystemMirror Smart
Assist for IBM Tivoli
Directory Server
cluster.es.assist.tsmadmin
7.2.7.0 COMMITTED PowerHA SystemMirror
SmartAssist for IBM TSM Admin
center
cluster.es.assist.tsmclient
7.2.7.0 COMMITTED PowerHA SystemMirror
SmartAssist for IBM TSM Client
cluster.es.assist.tsmserver
7.2.7.0 COMMITTED PowerHA SystemMirror
SmartAssist for IBM TSM Server
cluster.es.assist.websphere
7.2.7.0 COMMITTED PowerHA SystemMirror Smart
Assist for WebSphere
cluster.es.assist.wmq 7.2.7.0 COMMITTED PowerHA SystemMirror Smart
Assist for MQ Series
cluster.es.client.clcomd 7.2.7.0 COMMITTED Cluster Communication
Infrastructure
cluster.es.client.lib 7.2.7.0 COMMITTED PowerHA SystemMirror Client
Libraries
cluster.es.client.rte 7.2.7.0 COMMITTED PowerHA SystemMirror Client
Runtime
cluster.es.cspoc.rte 7.2.7.0 COMMITTED CSPOC Runtime Commands
cluster.es.migcheck 7.2.7.0 COMMITTED PowerHA SystemMirror Migration
support
Path: /usr/share/lib/objrepos
cluster.man.en_US.es.data 7.2.7.0 COMMITTED Man Pages - U.S. English
10.Verify with halevel -s that the halevel correct level is displayed as shown in Example 5-2.
Exiting cl_convert.
--------- end of log file for cl_convert: Tue Nov 1 17:22:16 CDT 2022
12.Start cluster services on node jordan (smitty clstart or clmgr start jordan).
Also upon startup since the cluster versions are mixed cluster verification will be skipped
and the startup information will state as such as shown in Example 5-15 on page 172.
Output of the lssrc -ls clstrmgrES command on node jordan is shown in Example 5-4.
13.On jessica, stop cluster services with the “Move Resource Groups” option (smitty
clstop). This results in the resource becoming active on node jordan as shown in
Example 5-5.
14.If your environment requires updating or upgrading AIX, perform that step now but do not
upgrade PowerHA at this point.
15.Post reboot, verify caavg_private is active (lspv or lscluster -i).
16.Verify that clcomd is active using the command lssrc -s clcomd.
If not, activate it using the command startsrc -s clcomd.
17.Verify contents of /etc/cluster/rhosts.
Enter either cluster node host names or IP addresses – one per line.
18.If rhosts file was updated, then refresh clcomd using command refresh -s clcomd.
19.Install all PowerHA 7.2.x file sets (use smitty update_all). Be sure to accept new license
agreements.
20.Verify all cluster filesets are at the new level by lslpp -l cluster.* as shown in
Example 5-1 on page 160.
21.Verify with halevel -s the correct level is displayed as shown in Example 5-2 on
page 163.
22.Verify no errors reported in /tmp/clconvert.log as shown in Example 5-3 on page 163.
23.Start cluster services on node jessica (smitty clstart or clmgr start node jessica).
24.Verify that the cluster has completed the migration on both nodes as shown in
Example 5-6.
Note: Both nodes must show the same CLversion otherwise, the migration did not
complete successfully. Call IBM support.
26.Since node jessica was the original hosting/primary node it may be desirable to move the
resource group. This can be accomplished by executing clmgr move rg bdbrg
node=jessica.
27.Verify cluster is stable and resource group is online as shown in Example 5-22 on
page 175.
Upon completing migration, if additional service packs are available and need to be installed
they can be done via 5.3.5, “Nondisruptive migration” on page 170 or 5.3.6, “Migration using
cl_ezupdate” on page 175 without any downtime required.
[Entry Fields]
* Cluster Snapshot Name [itzysnapshot] /
Custom-Defined Snapshot Methods [] +
* Cluster Snapshot Description [see name]
2. Stop cluster services on all nodes and bring resource groups offline (smitty clstop or
clmgr stop cluster).
3. Upgrade AIX (if needed). See “Software requirements” on page 156. If not using live
update then a reboot will be required after updating AIX.
4. Verify caavg_private is active (lspv or lscluster -i).
5. Verify that clcomd is active by running lssrc -s clcomd.
If not, activate it via startsrc -s clcomd
6. Verify contents of /etc/cluster/rhosts.
Enter either cluster node host names or IP addresses; only one per line.
7. If rhosts file was updated, then also refresh clcomd by running refresh -s clcomd.
8. Uninstall the current version of PowerHA by using smitty remove and specify the
cluster.* option.
9. Remove the CAA cluster (optional) by executing rmcluster -r reposhdiskname
10.Install the new PowerHA version including service packs (smitty install_all). Be sure
to accept new license agreements.
11.Verify all cluster filesets are at the new level by lslpp -l cluster.* as shown in
Example 5-1 on page 160.
12.Verify with halevel -s the correct level is displayed as shown in Example 5-2 on
page 163.
Setup:
Create temporary directory: /tmp/tmpodmdir
Original directory: /tmp/tmpsnapshotdir
Copy odm's from original to temporary directory.
Changing ODMDIR to /tmp/tmpodmdir.
********************************
*** ODM Manager version 0.2 ***
********************************
***************************
*** End of ODM Manager ***
***************************
********************************
*** ODM Manager version 0.2 ***
********************************
***************************
*** End of ODM Manager ***
***************************
********************************
*** ODM Manager version 0.2 ***
********************************
Cleanup:
Restoring original ODMDIR to /etc/objrepos.
Removing temporary directory /tmp/tmpsnapshotdir.
--------- end of log file for clconvert_snapshot: Wed Nov 2 18:31:28 CDT 2022
15.Restore the cluster configuration from the converted snapshot by running the smitty
sysmirror command and then selecting Cluster Nodes and Networks → Manage the
Cluster → Snapshot Configuration → Restore the Cluster Configuration From a
Snapshot → Choose previously created snapshot from the picklist, verify the SMIT panel.
[Entry Fields]
Cluster Snapshot Name itzysnapshot
Cluster Snapshot Description see name
Un/Configure Cluster Resources? [Yes] +
Force apply if verify fails? [No] +
The restore process automatically synchronizes the cluster as shown in the output in
Example 5-13.
Retrieving data from available cluster nodes. This could take a few minutes. Start data
collection on node jessica
Start data collection on node jordan
Collector on node jordan completed
Collector on node jessica completed
Data collection complete
WARNING: No backup repository disk is UP and not already part of a VG for nodes:
- jessica
- jordan
..............
cldare: Configuring a 2 node cluster in AIX may take up to 2 minutes. Please wait.
1 tunable updated on cluster redbook_cluster.
Adding any necessary PowerHA SystemMirror for AIX entries to /etc/inittab and /etc/rc.net
for IP Address Takeover on node jordan.
Verification has completed normally.
16.Verify cluster version is the same on all nodes as shown in Example 5-6 on page 164.
17.Restart cluster services (smitty clstart or clmgr start cluster).
18.Verify cluster is stable and resource group still online as shown in Example 5-22 on
page 175.
19.Perform cluster validation testing.
1. For possible recovery purposes create a snapshot. Create a cluster snapshot if you have
not previously created one and save copies of it off of the cluster nodes. A snapshot can
be created at anytime whether or not the cluster is up, but the nodes themselves should be
running.
Upon completing migration, if additional service packs are available and need to be installed
they can be done via non-disruptive or cl_ezupdate method without any downtime required.
Note: To use the nondisruptive option, the AIX levels must already be at the supported
levels that are required for the version of PowerHA you are migrating to.
# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
bdbrg UNMANAGED jessica
UNMANAGED jordan
3. Install all PowerHA 7.2.x file sets (use smitty update_all). Be sure to accept new license
agreements.
4. Verify all cluster filesets are at the new level by lslpp -l cluster.* as shown in
Example 5-1 on page 160.
5. Verify with halevel -s the correct level is displayed as shown in Example 5-2 on
page 163.
6. Verify no errors reported in /tmp/clconvert.log as shown in Example 5-3 on page 163.
7. Start cluster services on node jessica (use smitty clstart or clmgr start node
jessica).
Important: If you are not using the application monitoring tool within PowerHA and you
stop and unmanage a node with a resource group containing an application controller, then
the cluster start with default of AUTO manage for the resource group will invoke the
application start script. This may lead to undesirable results.
Ideally the application controller should be a smart start script which checks if the app is
running and exits accordingly. However, you can edit the script and simply insert an “exit 0”
at the top of the script. When the cluster stabilizes you can remove this line.
Also upon startup, since the cluster versions are mixed, cluster verification will be skipped
and the startup information will state as such as shown in Example 5-15.
The output of the lssrc -ls clstrmgrES and clRGinfo commands on node jessica is
shown in Example 5-16. Notice the vrmf has changed but the CLversion has NOT. That is
because all nodes in the cluster have not been upgraded yet.
# clRGinfo
-----------------------------------------------------------------------------
Important: While the cluster is in a mixed state do not make any cluster changes,
including CSPOC, or synchronize the cluster.
8. On node jordan (the second and last node to be migrated), stop cluster services (smitty
clstop) with the Unmanage Resource Groups option. This can also be executed from the
command line via clmgr stop node jordan manage=unmanage. After it completes, node
jordan is listed in the Forced down list and the resource group unmanaged, as shown in
Example 5-17. However, since it is the standby node, it is not hosting any resource groups,
it could have been simply stopped normally without the unmanage option.
9. Install all PowerHA 7.2.x file sets (smitty update_all). Be sure to accept new license
agreements.
10.Verify all cluster filesets are at the new level by lslpp -l cluster.* as shown in
Example 5-1 on page 160.
11.Verify with halevel -s the correct level is displayed as shown in Example 5-2 on
page 163.
12.Verify no errors reported in /tmp/clconvert.log as shown in Example 5-18.
Exiting cl_convert.
---------- end of log file for cl_convert: Wed Nov 2 09:15:55 CDT 2022
13.Start cluster services on secondary node, jordan (smitty clstart or clmgr start
jordan). Note upon node startup that the cluster verification is skipped as shown in
Example 5-19.
14.Check for the updated CLversion info from clstrmgrES info as shown in Example 5-20.
15.Verify that the cluster has completed migration on both nodes as shown in Example 5-21.
Note: Both nodes must show CLversion: 23, otherwise the migration has not
completed successfully. Call IBM support if necessary.
# clcmd halevel -s
-------------------------------
NODE jordan
-------------------------------
7.2.7 GA
-------------------------------
NODE jessica
-------------------------------
7.2.7 GA
16.Verify cluster is stable and resource group still online as shown in Example 5-22.
# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
bdbrg ONLINE jessica
OFFLINE jordan
Capabilities of cl_ezupdate
The cl_ezupdate command can be used to perform the following tasks:
Query information about the cluster, nodes, NIM server, or service packs and interim fixes
that are located in a specified installation location. The query can be run on the entire
cluster or on a specific subset of nodes in the cluster.
Apply and reject updates for AIX service packs or interim fixes. The cl_ezupdate
command cannot be used to update the cluster to newer AIX technology levels.
Apply and reject updates for PowerHA SystemMirror service packs and technology levels,
and interim fixes located in a specified installation location. This process is performed on
the entire cluster or on a specific subset of nodes in the cluster. You can also apply
updates in preview mode. When you use preview mode, all the prerequisites for
installation process are checked, but the cluster updates are not installed on the system.
Reject AIX service packs, PowerHA service packs, and interim fixes that were already
installed on the system. This task is performed on the entire cluster or on a specific subset
of cluster nodes.
The initial and most common use case is to perform a non-disruptive upgrade (NDU) across
the cluster. Though it can be used for AIX updates, to utilize it for NDU the AIX levels must
already be at the required levels to support the planned update or upgrade. Just like a NDU it
performs the following:
If node is hosting a resource group, it stops cluster in “Unmanaged” state.
If node is NOT hosting a resource group it gracefully stops cluster on that node.
Performs the update (update_all).
If node was gracefully stopped, it is restarted in Manual mode.
If node was stopped Unmanaged (forced) it is restarted in Automatic mode.
Important: Important: If you are not using the application monitoring tool within PowerHA
and you stop and unmanage a node with a resource group containing an application
controller, then the cluster start with default of AUTO manage for the resource group will
invoke the application start script. This may lead to undesirable results.
Ideally the application controller should be a smart start script which checks if the app is
running and exits accordingly. However, you can edit the script and simply insert an “exit 0”
at the top of the script. When the cluster stabilizes you can remove this line.
Pre-update checks
Similar to when updating AIX, a preview install is available. This is encouraged to use in an
effort to maximize the chance for success. During a preview, and also in an actual update,
numerous checks are performed. These include – but are not limited to – the following:
PowerHA images are supported on the AIX levels installed in the cluster.
Clcomd communications is functional.
Cluster, node, and resource group state.
NIM server communications is functional and NIM resource exists and usable.
Tests NFS mounting from the NIM server.
Compares and validates installed PowerHA filesets are the same on all nodes.
Ensures no current PowerHA filesets need to be Committed or Rejected.
Performs a preview installation of the update package.
If you run the cl_ezupdate tool and if an error occurs in a node during an installation or
uninstallation process, you can use the rollback feature of the cl_ezupdate tool to return the
node to the previous state. When you use the rollback feature, you can choose to rollback
only the node that encountered the error or roll back all nodes that were updated.
The rollback process creates a copy of the rootvg volume group on each node using the
alt_disk_copy command and reboots the copy of the rootvg volume group when an error
occurs during the installation or removal of service images. For the rollback process to work,
one hdisk must be present on each node that is capable of containing a copy of the rootvg
volume group.
Limitations of cl_ezupdate
The cl_ezupdate command has the following limitations:
If you have previously installed any interim fixes, those fixes might be overwritten or
removed when you apply a new service pack. If the previously installed interim fix has
locked the fileset, you can override that lock and install the service pack by using the -F
flag.
You cannot install a new PowerHA SystemMirror technology level (TL) in the applied state.
Filesets installed as part of new TL are automatically moved into committed state. This
means that the installation image cannot be rejected. The cl_ezupdate tool cannot be
used to uninstall technology levels.
If you want to update the software by using a NIM resource, the NIM client must be
configured first and must be available to all nodes where you want to use the cl_ezupdate
tool.
The cl_ezupdate tool requires an existing PowerHA SystemMirror and Cluster Aware AIX
(CAA) cluster definition.
The Cluster Communications daemon (clcomd) must be enabled to communicate with all
nodes in the cluster. The cl_ezupdate tool attempts to verify clcomd communications
before installing any updates.
If a cluster node update operation fails, the cl_ezupdate script ends immediately and exits
with an error. To troubleshoot the issue, an administrator must restart the update operation
or undo the completed update operations.
You must place any interim fixes in the emgr/ppc directory of the NIM lpp_source resource.
The cl_ezupdate tool runs only on AIX version 7, or later.
The cl_ezupdate tool cannot be used with the AIX multibos utility.
If you are running the cl_ezupdate tool on a cluster node that is not included as an option
of the –N flag and if the –S flag specifies the file system path as an option, the cluster node
on which you are running the command is the source node for install image propagation.
This cluster node must have the file system path specified in the –S option.
Syntax of cl_ezupdate
The following are the flags and options available from the cl_ezupdate command in PowerHA
7.2.7. If using a different version consult the man page on your cluster.
-A Applies the updates that are available in the location that is specified
by the S flag.
-C Commits software updates to the latest installed version of PowerHA
SystemMirror or the AIX operating system.
-F Forces installation of the service pack. If an interim fix has locked a
fileset and if the updates are halted from installation, this flag removes
the lock and installs the service pack. Note: This flag must always be
used with the A flag.
-H Displays the help information for the cl_ezupdate command.
-I Specifies an interactive mode. If you specify the value as yes, you
must specify whether the rollback feature must continue to run when
an error is shown. The interactive mode is active by default. If you
specify the value as no, the interactive mode is turned off and you are
not prompted before you start the rollback operation.
-N Specifies the node names where you want to install updates. If you
specify multiple node names, you must separate each node name with
a comma. By default, updates are installed on all nodes in a cluster. If
the -U or -u flag is specified to enable the rollback feature, the -N flag
specifies a <node name>:hdisk pair. If a node has multiple hdisks for
rootvg volume group, multiple N arguments are required to map the
node to each of the hdisks. As an example:
-N node1:hdisk1 N node1:hdisk2 N node1:hdisk3 N node2:hdisk1
-P Runs the cluster installation in preview mode. When you use preview
mode, all of the installation prerequisites are checked, but updates are
not installed on the system.
-Q Queries the status of the Network Installation Management (NIM)
setup, cluster software, or available updates. The value option is
cluster, node, nim, or lpp.
-R Rejects non-committed service pack that is installed and stored in the
location that is specified by the -S flag.
-S Specifies location of the update image that are to be installed. If you
specify a file system name, the path must begin with a Forward Slash
key (/). If you do not specify a Forward Slash key (/), the lpp_source
location of the NIM server will be used for installing updates.
-T Specifies the timeout value for the backup operation of the rootvg
volume group in minutes. If the rootvg volume group was not copied
before the specified timeout value, the operation exits. The default
value of this flag is infinite.
-U Enables rollback of all modified nodes when an error occurs during an
Apply or Reject operation.
-u Enables rollback of only the node that encountered an error during an
Apply or Reject operation.
-V Displays extended help information.
-X Exits after creating a copy of rootvg volume group by using the
alt_disk_copy command on each node. You must use the -x
argument to use the alternative copies of rootvg volume group for
rollback operation on subsequent runs.
-x Specifies not to create the copy of rootvg volume group by using the
alt_disk_copy command on each node for rollback operation. If the
rootvg volume group fails, you can use disks that are specified in the
-N argument for the rollback operation.
In this example the cluster is active on both nodes, the resources are hosted on node jessica
as shown in Figure 5-1. The PowerHA updated filesets are located in an NFS mount, /mnt,
from the NIM server. Another option is that it could be in a defined resource to the NIM server.
First execute a preview install to find any existing issues via cl_ezupdate -PS /mnt as shown
in Example 5-23.
Before to install filesets and or Ifixes, the node: jordan will be stopped in
unmanage mode.
There is nothing to commit or reject on node: jordan from source: /mnt
Installing fileset updates in preview mode on node: jessica...
Succeeded to install preview updates on node: jessica.
Installing fileset updates in preview mode on node: jordan...
Succeeded to install preview updates on node: jordan.
After reviewing the output and no errors reported the migration can be initiated by executing
cl_ezupdate -AS /mnt. It will perform all the same checks as the preview install did, however,
these checks were omitted from the output in Example 5-24 for clarity of the actions being
performed. First, node jessica is stopped unmanaged, upgraded, and restarted. Then node
jordan is stopped fully, not unmanaged, upgraded, and restarted. Once the cluster stabilizes
the cluster migration is complete. In this example the entire process approximately ten
minutes.
jordan: 0513-059 The clevmgrdES Subsystem has been started. Subsystem PID is
16908768.
jordan: PowerHA: Cluster services started on Fri Nov 4 01:51:21 CDT 2022
jordan: event serial number 22791
..
"jordan" is now online.
Once completed, perform the same cluster migration levels verification checks as with any
other migration type as follows:
Verify all cluster filesets are at the new level by lslpp -l cluster.* as shown in
Example 5-1 on page 160.
Verify with halevel -s the correct level is displayed as shown in Example 5-2 on
page 163.
Verify no errors reported in /tmp/clconvert.log as shown in Example 5-18 on page 173.
Verify that the cluster has completed the migration on both nodes as shown in
Example 5-21 on page 174.
Verify cluster is stable and resource group still online as shown in Example 5-22 on
page 175.
Perform cluster validation testing.
Important: Though this procedure can work, it is as-is. This is not an officially supported
method for performing a PowerHA upgrade, so if you have any problems do not expect any
help from IBM support.
One key benefit in utilizing this method is the total amount of time, or downtime, involved in
performing the upgrade is reduced. The main reason being is that the majority of the work
and time involved is performed before ever stopping any node. This does require an additional
free disk large enough to accommodate rootvg and the additional updates to be installed.
More information on
This migration scenario will closely mimic the steps of an offline migration. The test
environment consisted of the following levels:
Beginning
– AIX 7.1 TL5 SP1
– PowerHA 7.2.4 SP5
Ending
– AIX 7.1 TL5 SP6
– PowerHA 7.2.7
1. Create cloned rootvg and apply AIX updates. All updates are located in an NFS mount of
/mnt/aixupdates. This can be performed while cluster is still active. In this scenario we
performed the following step shown for each node in Example 5-25.
2. Validate the bootlist is now set to the newly created cloned and updated alternate rootvg
as shown in Example 5-26.
3. Stop cluster services on all nodes. (smitty clstop) and choose to bring resource groups
offline. This can also be executed from the command line via clmgr stop cluster.
4. Reboot from newly cloned and updated rootvg disk (shutdown -Fr)
5. Verify caavg_private is active (lspv or lscluster -i)
6. Verify that clcomd is active using lssrc -s clcomd, If not, activate it using startsrc -s
clcomd.
7. Install all PowerHA 7.2.x file sets (use smitty update_all). Be sure to accept new license
agreements.
8. Verify all cluster filesets are at the new level by lslpp -l cluster.* as shown in
Example 5-1 on page 160.
9. Verify with halevel -s the correct level is displayed as shown in Example 5-2 on
page 163.
10.Verify no errors reported in /tmp/clconvert.log as shown in Example 5-3 on page 163.
11.Restart cluster services (smitty clstart or clmgr start cluster).
12.Verify that the cluster has completed migration on both nodes as shown in Example 5-21
on page 174
13.Verify cluster is stable and resource group still online as shown in Example 5-22 on
page 175.
14..Perform cluster validation testing.
Important: Though this procedure can work, it is as-is. This is not an officially supported
method for performing a PowerHA upgrade, so if you have any problems do not expect any
help from IBM support.
This migration scenario will closely mimic the steps of a rolling migration. The test
environment consisted of the following levels:
Beginning
– AIX 7.1 TL5 SP1
– PowerHA 7.2.4 SP5
Ending
– AIX 7.3 TL0 SP2
– PowerHA 7.2.7
1. Perform AIX upgrades by using nimadm. This can be performed while cluster is still active.
In this scenario we performed the following step as shown in Example 5-27.
2. Validate the bootlist is now set to the newly created cloned and upgraded alternate rootvg
as shown in Example 5-28.
3. Stop cluster services on first node, in this scenario the standby node jordan, (smitty
clstop) and choose to bring resource groups offline. This can also be executed from the
command line via clmgr stop node jordan.
4. Reboot from newly cloned and upgraded rootvg disk (shutdown -Fr)
5. Verify caavg_private is active (lspv or lscluster -i)
6. Verify that clcomd is active using lssrc -s clcomd, if not, activate it using startsrc -s
clcomd
7. Install all PowerHA 7.2.x file sets (use smitty update_all). Be sure to accept new license
agreements.
8. Verify all cluster filesets are at the new level by lslpp -l cluster.* as shown in
Example 5-1 on page 160.
9. Verify with halevel -s the correct level is displayed as shown in Example 5-2 on
page 163.
10.Verify no errors reported in /tmp/clconvert.log as shown in Example 5-3 on page 163.
11.Restart cluster services (smitty clstart or clmgr start node jordan)
12.Make sure cluster stabilizes.
– PowerHA 7.2.7
13.If not already previously performed in parallel with the first node, perform AIX upgrades by
using nimadm on second node, in this scenario primary node jessica. This can be
performed while cluster is still active. In this scenario we performed the step shown in
Example 5-29.
14.Validate the bootlist is now set to the newly created cloned and upgraded alternate rootvg
as shown in Example 5-30.
15.On second node, jessica, stop cluster services with the “Move Resource Groups” option
(smitty clstop). This results in the resource becoming active on node jordan as shown in
Example 5-5 on page 164.
16.Reboot from newly cloned and upgraded rootvg disk (shutdown -Fr)
17.Verify caavg_private is active (lspv or lscluster -i)
18.Verify that clcomd is active:
lssrc -s clcomd
If not, activate it via startsrc -s clcomd
19.Install all PowerHA 7.2.x file sets (use smitty update_all). Be sure to accept new license
agreements.
20.Verify all cluster filesets are at the new level by lslpp -l cluster.* as shown in
Example 5-1 on page 160.
21.Verify with halevel -s the correct level is displayed as shown in Example 5-2 on
page 163.
22.Verify no errors reported in /tmp/clconvert.log as shown in Example 5-3 on page 163.
23.Restart cluster services (smitty clstart or clmgr start node jessica)
24.Verify that the cluster has completed migration on both nodes as shown in Example 5-21
on page 174
25.Verify cluster is stable and resource group still online as shown in Example 5-22 on
page 175.
26..Perform cluster validation testing.
Important: Though this procedure can work, it is as-is. This is not an officially supported
method for performing a PowerHA upgrade, so if you have any problems do not expect any
help from IBM support.
In this scenario a Live Update of AIX is performed on each node, then either a non-disruptive
update or cl_ezupdate is performed to complete the PowerHA migration.
First, both the AIX and PowerHA levels must have Live Update support. PowerHA added the
integrated support for it in v7.2.0. However, even though the default is enabled on new
installs, it may not be on upgrades, so it should be verified that it is enabled before using it.
If not enabled, enable by setting the value to true using clmgr modify node jordan
ENABLE_LIVE_UPDATE=true. Repeat for each node as necessary and synchronize the
cluster.
3. Pick a node, any node, but only one node, and perform an AIX Live Update.
Notice that during the live update the node will go unmanaged immediately prior to the
upgrade, then back to auto managed post upgrade. This is normal and exactly the function
that PowerHA provides during a Live Update operation.
4. Upon successful AIX upgrade, verify caavg_private is active (lspv or lscluster -i)
5. Verify that clcomd is active:
lssrc -s clcomd
If not, activate it via startsrc -s clcomd
6. Repeat steps 3-5 as needed for each node.
7. Upon successful AIX upgrade of all nodes, perform Migration using cl_ezupdate for
PowerHA.
8. Verify all cluster filesets are at the new level by lslpp -l cluster.* as shown in
Example 5-1 on page 160.
9. Verify with halevel -s the correct level is displayed as shown in Example 5-2 on
page 163.
10.Verify no errors reported in /tmp/clconvert.log as shown in Example 5-3 on page 163.
11.Verify that the cluster has completed migration on both nodes as shown in Example 5-21
on page 174
12.Verify cluster is stable and resource group still online as shown in Example 5-22 on
page 175.
13.Perform cluster validation testing.
Note: Usually a complete stop, sync and verify, and restart of the cluster completes the
migration.But a sync may not be possible. You may modify the odm manually but the
preferred action is to contact IBM support.
Part 3
In this chapter, AIX best practices for troubleshooting, including monitoring the error log, are
assumed. However, we do not cover how to determine what problem exists, whether dealing
with problems either after they are discovered or as preventative maintenance.
6.1.1 Scope
Change control is not within the scope of the documented procedures in this book. It
encompasses several aspects and is not optional. Change control includes, but is not limited
to these items:
Limited root access
Thoroughly documented and tested procedures
Proper planning and approval of all changes
Although many current PowerHA customers have a test cluster, or at least begin with a test
cluster, over time these cluster nodes become used within the company in some form. Using
these systems requires a scheduled maintenance window much like the production cluster. If
that is the case, do not be fooled, because it truly is not a test cluster.
A test cluster, ideally, is at least the same AIX, PowerHA, and application level as the
production cluster. The hardware should also be as similar as possible. In most cases, fully
mirroring the production environment is not practical, especially when there are multiple
production clusters. Several approaches exist to maximize a test cluster when multiple
clusters have varying levels of software.
Using logical partitioning (LPAR), Virtual I/O Servers (VIOS), and multiple various rootvg
images, by using alt_disk_install or multibos, are common practice. Virtualization allows a
test cluster to be easily created with few physical resources and can even be within the same
physical machine. With the multi-boot option, you can easily change cluster environments by
simply booting the partition from another image. This also allows testing of many software
procedures such as these:
Applying AIX maintenance.
Applying PowerHA fixes.
Applying application maintenance.
This type of test cluster requires at least one disk, per image, per LPAR. For example, if the
test cluster has two nodes and three different rootvg images, it requires a minimum of six hard
drives. This is still easier than having six separate nodes in three separate test clusters.
A test cluster also allows testing of hardware maintenance procedures. These procedures
include, but are not limited to the following updates and replacement:
System firmware updates
Adapter firmware updates
• Adapter replacement
Disk replacement
More testing can be accomplished by using the Cluster Test Tool and error log emulation. See
6.8, “Cluster Test Tool” on page 222 for more information.
After PowerHA is installed, the cluster manager process (clstrmgrES) is always running,
regardless of whether the cluster is online. It can be in one of the following states as displayed
by running the lssrc -ls clstrmgrES command:
NOT_CONFIGURED The cluster is not configured or node is not synchronized.
ST_INIT The cluster is configured but not active on this node.
ST_STABLE The cluster services are running with resources online.
ST_JOINING The cluster node is joining the cluster.
ST_VOTING The cluster nodes are voting to decide event execution.
ST_RP_RUNNING The cluster is running a recovery program.
RP_FAILED A recovery program event script failed.
ST_BARRIER The clstrmgr process is between events, waiting at the barrier.
ST_CBARRIER The clstrmgr process is exiting a recovery program.
ST_UNSTABLE The cluster is unstable usually do to an event error.
Changes in the state of the cluster are referred to as cluster events. The Cluster Manager
monitors local hardware and software subsystems on each node for events such as an
application failure event. In response to such events, the Cluster Manager runs one or more
event scripts such as a restart application script. Cluster Managers running on all nodes
exchange messages to coordinate required actions in response to an event.
During maintenance periods, you might need to stop and start cluster services. But before
you do that, be sure to understand the node interactions it causes and the impact on your
system’s availability. The cluster must be synchronized and verification should detect no
errors. The following section briefly describes the processes themselves and then the
processing involved in startup or shutdown of these services. In 6.2.2, “Starting cluster
services” on page 194, we describe the procedures necessary to start cluster services on a
node and for shutting down services see 6.2.3, “Stopping cluster services” on page 197.
[Entry Fields]
* Start now, on system restart or both now +
Start Cluster Services on these nodes [Maddi,Patty] +
* Manage Resource Groups Automatically +
BROADCAST message at startup? true +
Startup Cluster Information Daemon? true +
Ignore verification errors? false +
Automatically correct errors found during Interactively +
cluster start?
The reason for this is directly related to what happens after system failure. If a
resource group owning system crashes, and AIX is set to reboot after crash, it can
restart cluster services in the middle of a current takeover. Depending on the cluster
configuration this might cause resource group contention, resource group
processing errors, or even a fallback to occur. All of which can extend an outage.
However during test and maintenance periods, and even on dedicated standby
nodes, using this option might be convenient.
Note: There are situations when choosing Interactively will correct some errors.
More details are in 7.6.6, “Running automatic corrective actions during verification”
on page 302.
After you complete the fields and press Enter, the system starts the cluster services on the
nodes specified, activating the cluster configuration that you defined. The time that it takes the
commands and scripts to run depends on your configuration (that is, the number of disks, the
number of interfaces to configure, the number of file systems to mount, and the number of
applications being started).
During the node_up event, resource groups are acquired. The time it takes to run each
node_up event is dependent on the resource processing during the event. The node_up
events for the joining nodes are processed sequentially.
When the command completes running and PowerHA cluster services are started on all
specified nodes, SMIT displays a command status window. Note that when the SMIT panel
indicates the completion of the cluster startup, event processing in most cases has not yet
completed. To verify the nodes are up you can use clstat or even tail on the hacmp.out file
on any node. More information about this is in 7.7.1, “Cluster status checking utilities” on
page 305.
Warning: "WHEN" must be specified. Since it was not, a default of "now" will be
used.
Warning: "MANAGE" must be specified. Since it was not, a default of "auto" will
be used.
jordan: 0513-059 The clevmgrdES Subsystem has been started. Subsystem PID is
11338064.
jordan: PowerHA: Cluster services started on Sat Nov 26 14:15:26 CST 2022
jordan: event serial number 25911
[Entry Fields]
* Stop now, on system restart or both now +
Stop Cluster Services on these nodes [Maddi,Patty]+
BROADCAST cluster shutdown? true +
* Select an Action on Resource Groups Bring Resource Group>+
Warning: "WHEN" must be specified. Since it was not, a default of "now" will be
used.
Understanding each of these actions is important, along with stopping and starting cluster
services, because they are often used during maintenance periods.
In the following topics, we assume that cluster services are running, the resource groups are
online, the applications are running, and the cluster is stable. If the cluster is not in the stable
state, then the operations related to resource group are not possible.
All three resource group options we describe can be done by using the clRGmove command.
However, in our examples, we use C-SPOC. They also all have similar SMIT panels and pick
lists. In an effort to streamline this documentation, we show only one SMIT panel in each of
the following sections.
2. Select a resource group from the list and press Enter. Another pick list is displayed (Select
an Online Node). The pick list contains only the nodes that are currently active in the
cluster and that currently are hosting the previously selected resource group.
3. Select an online node from the pick list and press Enter.
4. The final SMIT menu opens with the information that was selected in the previous pick
lists, as shown in Figure 6-4. Verify the entries you previously specified and then press
Enter to start the processing of the resource group to be brought offline.
[Entry Fields]
Resource Group to Bring Offline Maddi_rg
Node On Which to Bring Resource Group Offline Maddi
After processing is completed, the resource group be offline, but cluster services remain
active on the node. The standby will not acquire the resource group.
This option is also available by using either the clRGinfo or clmgr command. For more
information about these commands, see the man pages.
Upon successful completion, PowerHA displays a message and the status, location, and a
type of location of the resource group that was successfully started on the specified node.
+--------------------------------------------------------------------------+
| Select a Destination Node |
| |
| Move cursor to desired item and press Enter. |
| |
| jessica |
| jordan |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
PowerHA also has the ability to move a resource group to another site. The concept is the
same as moving it between local nodes. For our example, we use the option to move to
another node rather than to another site. As is the case with most operations, this can be
performed from the command line, SMUI, or SMIT.
[Entry Fields]
Resource Group(s) to be Moved xsiteGLVMRG
Destination Node jordan
4. Verify the entries that are previously specified and then press Enter to start the moving of
the resource group.
Upon successful completion, PowerHA displays a message and the status, location, and a
type of location of the resource group that was successfully stopped on the specified node, as
shown in Figure 6-7 on page 204.
COMMAND STATUS
[MORE...7]
Resource group xsiteGLVMRG is online on node jordan.
[BOTTOM]
This option is also available by using either the clRGinfo or clmgr command. For more
information about these commands, see the man pages.
Any time that a resource group is moved to another node, application monitoring for the
applications is suspended during the application stop. After the application has restarted on
the destination node, application monitoring will resume. Additional information can be found
in 6.3.4, “Suspending and resuming application monitoring” on page 205.
[Entry Fields]
* Application Controller Name demoapp
* Resource Group []
+
+--------------------------------------------------------------------------+
| Resource Group |
| |
| Move cursor to desired item and press Enter. |
| |
| redbookrg |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
| /=Find n=Find Next |
+--------------------------------------------------------------------------+
The monitoring remains suspended until either manually resumed or until the resource group
is stopped and restarted.
[Entry Fields]
* Application Controller Name demoapp
* Resource Group [redbookrg]
Figure 6-9 Resume application monitoring via SMIT
Application monitoring continues to stay active until either manually suspended or until the
resource group is brought offline.
6.4 Scenarios
In this section, we cover the following common scenarios:
PCI hot-plug replacement of a NIC
Installing AIX and PowerHA fixes
Replacing an LVM mirrored disk
Application maintenance
Note: Although PowerHA continues to provide this facility, with virtualization primarily
being used, this procedure is rarely used.
Special considerations
Consider the following factors before you replace a PCI hot-pluggable network interface card:
You should manually record the IP address settings of the network interface being
replaced to prepare for unplanned failures.
Be aware that if a network interface you are hot-replacing is the only available keepalive
path on the node where it resides, you must shut down PowerHA on this node to prevent a
partitioned cluster while the interface is being replaced.
This situation is easily avoidable by having a working non-IP network between the cluster
nodes.
SMIT gives you the option of doing a graceful shutdown on this node. From this point, you
can manually hot-replace the network interface card.
Hot-replacement of Ethernet network interface cards is supported.
Do not attempt to change any configuration settings while hot-replacement is in progress.
The SMIT interface simplifies the process of replacing a PCI hot-pluggable network interface
card. PowerHA supports only one PCI hot-pluggable network interface card replacement
using SMIT at one time per node.
Note: If the network interface was alive before the replacement process began, then
between the initiation and completion of the hot-replacement, the interface being replaced
is in a maintenance mode. During this time, network connectivity monitoring is suspended
on the interface for the duration of the replacement process.
Go to the node on which you want to replace a hot-pluggable PCI network interface card and
use the following steps.
1. Run smitty cspoc and then select Communication Interfaces → PCI Hot Plug Replace
a Network Interface Card. Press Enter.
Tip: You can also get to this panel with the fast path smitty cl_pcihp.
SMIT displays a list of available PCI network interfaces that are hot-pluggable.
2. Select the network interface you want to hot-replace. Press Enter. The service address of
the PCI interface is moved to the available non-service interface.
3. SMIT prompts you to physically replace the network interface card. After you replace the
card, confirm that replacement occurred.
– If you select Yes, the service address is moved back to the network interface that was
hot-replaced. On aliased networks, the service address does not move back to the
original network interface, but remains as an alias on the same network interface. The
hot-replacement is complete.
– If you select No, you must manually reconfigure the interface settings to their original
values:
i. Run the drslot command to take the PCI slot out of the removed state.
ii. Run mkdev on the physical interface.
iii. Use ifconfig manually (rather than smitty chinet, cfgmgr, or mkdev) to avoid
configuring duplicate IP addresses or an unwanted boot address.
– If you choose not to move the resource group to another node, it will be offline for the
duration of the replacement process.
3. SMIT prompts you to physically replace the network interface card. After you replace the
card, confirm that replacement occurred.
– If you select Yes, the hot-replacement is complete.
– If you select No, you must manually reconfigure the interface settings to their original
values:
i. Run the drslot command to take the PCI slot out of the removed state.
ii. Run mkdev on the physical interface.
iii. Use ifconfig manually (rather than smitty chinet, cfgmgr, or mkdev) to avoid
configuring duplicate IP addresses or an unwanted boot address.
iv. If applicable, move the resource group back to the node from which you moved it in
Step 2.
We begin again from the fast path of smitty cl_pcihp as in the previous scenario:
1. Select the network interface that you want to hot-replace and press Enter.
SMIT prompts you to physically replace the network interface card. After you replace it,
confirm that replacement occurred.
– If you select Yes, the hot-replacement is complete.
– If you select No, you must manually reconfigure the interface settings to their original
values:
i. Run the drslot command to take the PCI slot out of the removed state.
ii. Run mkdev on the physical interface.
iii. Use ifconfig manually (rather than smitty chinet, cfgmgr, or mkdev) to avoid
configuring duplicate IP addresses or an unwanted boot address.
Some AIX fixes can be loaded dynamically without a reboot. Kernel and device driver updates
often require a reboot because installing updates to them runs a bosboot. One way to
determine if a reboot is required is to check the .toc file that is created by using the inutoc
command before installing the fixes. The file contains file set information similar to
Example 6-8 on page 210.
In the example, the file set bos.64bit requires a reboot as indicated by the “b” character in
fourth column. The “N” character indicates that a reboot is not required.
Applying PowerHA fixes is similar to AIX fixes, but rebooting after installing base file sets is
not required. However, other base prerequisites like RSCT might require it. Always check with
the support line if you are unsure about the effects of loading certain fixes.
When you update AIX or PowerHA software, be sure to perform the following tasks:
Take a cluster snapshot and save it somewhere off the cluster.
Back up the operating system and data before performing any upgrade. Prepare a backout
plan in case you encounter problems with the upgrade.
Always perform procedures on a test cluster before running them in production.
Use alt_disk update or Live Update if possible.
Note: Follow this same general rule for fixes to the application; follow specific instructions
for the application.
The general procedure for applying AIX fixes that require a reboot is as follows:
1. Stop cluster services on standby node.
2. Apply, do not commit, TL or SP to the standby node (and reboot as needed).
3. Start cluster services on the standby node.
4. Stop cluster services on the production node using Move Resource Group option to the
standby machine.
5. Apply TL or SP to the primary node (and reboot as needed).
6. Start cluster services on the primary node.
If you install either AIX or PowerHA fixes that do not require a reboot, using the Unmanage
Resource Groups option is now possible when stopping cluster services, as described in
6.2.3, “Stopping cluster services” on page 197. The general procedure for doing this for a
two-node hot-standby cluster is as follows:
1. Stop cluster services on standby by using the Unmanage option.
2. Apply, do not commit, SP to the standby node.
3. Start cluster services on the standby node.
4. Stop cluster services on the production node by using the Unmanage option.
5. Apply SP to the primary node.
6. Start cluster services on the primary node.
Important: Never unmanage more than one node at a time. Complete the procedures
thoroughly on one node before beginning on another node. Of course, be sure to test these
procedures in a test environment before ever attempting them in production.
Similarly, the cl_ezupdate tool can also be used to perform nondisruptive updates for both
AIX and PowerHA in a semi-automated fashion. More information can be found in 5.3.6,
“Migration using cl_ezupdate” on page 175.
6.4.3 Storage
Most shared storage environments today use some level of RAID for data protection and
redundancy. In those cases, individual disk failures normally do not require AIX LVM
maintenance to be performed. Any procedures required are often external to cluster nodes
and do not affect the cluster itself. However, if protection is provided by using LVM mirroring,
then LVM maintenance procedures are required.
C-SPOC provides the Cluster Disk Replacement facility to help in the replacement of failed
LVM mirrored disk. This facility does all the necessary LVM operations of replacing an LVM
mirrored disk. To use this facility, ensure that the following conditions are met:
You have root privilege.
The affected disk, and preferably the entire volume group, is mirrored.
The replacement disk you want is available to the each node and a PVID is already
assigned to it and is shown on each node with the lspv command.
To physically replace an existing disk, remove the old disk and replace the new one in its
place. This of course assumes that the drive is hot-plug replaceable, which is common.
The replacepv command updates the volume group in use in the disk replacement
process (on the reference node only).
Note: During the command execution, SMIT tells you the name of the recovery
directory to use should replacepv fail. Make note of this information as it is required in
the recovery process.
Configuration of the destination disk on all nodes in the resource group occurs at this time.
If a node in the resource group fails to import the updated volume group, you can use the
C-SPOC Import a Shared Volume Group facility as shown in “Importing volume groups using
C-SPOC” on page 269.
C-SPOC does not remove the failed disk device information from the cluster nodes. This must
done manually by running the rmdev -dl <devicename> command.
6.4.4 Applications
Each application varies, however most application maintenance requires the application to be
brought offline. This can be done in several ways. The most appropriate method for any
particular environment depends on the overall cluster configuration.
It is common to help minimize the overall downtime of the application by performing the
application maintenance first on non-production nodes for that application. Traditionally this
means on a standby node, however it is not common that a backup/fallover node truly is a
standby only. If it is not a true standby node, then any work load or applications currently
running on that node must be accounted for to minimize any adverse affects of installing the
maintenance. This should have all been tested previously in a test cluster.
In most cases, stopping cluster services is not needed. You can bring the resource group
offline as described in 6.3.1, “Bringing a resource group offline” on page 199. If the shared
volume group must be online during the maintenance, you can suspend application
monitoring and start the application stop-server script to bring the application offline.
However, this will keep the service IP address online, which might not be desirable.
In a multiple resource group or multiple application environment, all running on the same
node, stopping cluster services on the local node might not be feasible. Be aware of the
possible effects caused by not stopping cluster services on the node in which application
maintenance is being performed.
If during the maintenance period, the system encounters a catastrophic error resulting in a
crash, a fallover will occur. This might be undesirable if the maintenance was not performed
on the fallover candidates first and if the maintenance is incomplete on the local node.
Although this might be a rare occurrence, the possibility exists and must be understood.
Another possibility is that if another production node fails during the maintenance period, a
fallover can occur successfully on the local node without adverse affects. If this is not an
acceptable result and if there are multiple resource groups, then you might want to move the
other resource groups to another node first and stop cluster services on the local node.
If you use persistent addresses, and you stop cluster services, local adapter swap protection
is no longer provided. Although again rare, the possibility then exists that when using the
persistent address to do maintenance and the hosting NIC fails, your connection will be
dropped.
After application maintenance, always test the cluster again. Depending on what actions you
selected to stop the application, you must take actions to reverse; restart cluster services,
bring the resource group back online through C-SPOC, or manually run the application start
server script and resume application monitoring as needed.
Beginning in PowerHA 7.1.3 SP1, clmgr was updated to allow CAA to be stopped on either a
node, cluster, or site level. These steps can be done on one node at a time, or on the entire
cluster. Our scenarios show how to do it either way.
The results of step 1 are shown in Example 6-9. Notice that CAA is inactive, and that the CAA
cluster and caavg_private no longer exist. This result is the same for all nodes in the cluster.
[cassidy:root] / # lspv
hdisk0 00f70c99013e28ca rootvg active
hdisk1 00f6f5d015a4310b None
hdisk2 00f6f5d015a44307 None
hdisk3 00f6f5d01660fbd1 None
hdisk4 00f6f5d0166106fa xsitevg
hdisk5 00f6f5d0166114f3 xsitevg
hdisk6 00f6f5d029906df4 xsitevg
hdisk7 00f6f5d0596beebf xsitevg
hdisk8 00f70c995a1bc94a None
Note: In some cases, this step might change the device numbering. This does not
cause a problem because PowerHA and CAA know the repository disk by the PVID.
However, also check the disk device attributes (such as reserve_policy, queue_depth,
and others) to sure they are still what you want.
3. Start cluster services by running the following command on any node in the cluster:
clmgr online cluster WHEN=now MANAGE=auto START_CAA=yes
Important: If you use third-party storage multipathing device drivers, contact the vendor
for support assistance. Consult IBM only if you use native AIX MPIO.
4. After you perform the desired maintenance, restart the cluster services as shown in
Example 6-10. Notice that the CAA cluster and caavg_private are back and active.
Example 6-10 Starting cluster services cluster wide after maintenance performed
[cassidy:root] / # clmgr online cluster WHEN=now MANAGE=auto START_CAA=yes
jessica:
jessica: Nov 22 2022 12:31:26 Checking for srcmstr active...
jessica: Nov 22 2022 12:31:26 complete.
jessica: Nov 22 2022 12:31:27
jessica: /usr/es/sbin/cluster/utilities/clstart: called with flags -m -G -b -P
cl_rc_cluster -B -A
jessica:
jessica: Nov 26 2022 17:16:42
jessica: Completed execution of /usr/es/sbin/cluster/etc/rc.cluster
jessica: with parameters: -boot -N -A -b -P cl_rc_cluster.
jessica: Exit status = 0
jessica:
[cassidy:root] / # lspv
hdisk0 00f70c99013e28ca rootvg active
hdisk1 00f6f5d015a4310b caavg_private active
hdisk2 00f6f5d015a44307 None
hdisk3 00f6f5d01660fbd1 None
hdisk4 00f6f5d0166106fa xsitevg concurrent
hdisk5 00f6f5d0166114f3 xsitevg concurrent
hdisk6 00f6f5d029906df4 xsitevg concurrent
hdisk7 00f6f5d0596beebf xsitevg concurrent
hdisk8 00f70c995a1bc94a None
Note: In some cases, this step might change the device numbering. This does not
cause a problem because PowerHA and CAA know the repository disk by the PVID.
However, also check the disk device attributes (such as reserve_policy, queue_depth,
and others) to be sure they are still what you want.
3. Start cluster services on the selected node by running the following command:
clmgr online node <nodename> WHEN=now MANAGE=auto START_CAA=yes
Important: If you use third-party storage multipathing device drivers, contact the
vendor for support assistance. Consult IBM support only if you use IBM device drivers.
4. Repeat these steps as needed from start to finish on one node at time.
The results of step 1 are shown in Example 6-11. Notice that CAA is inactive, but the CAA
cluster and caavg_private no longer exist on node cassidy. This applies only to the individual
node in this case. Also as shown, the cluster exists and is still active on node jessica.
[jessica:root] / # lspv
hdisk0 00f6f5d00146570c rootvg active
hdisk1 00f6f5d015a4310b caavg_private active
hdisk2 00f6f5d01660fbd1 amyvg
hdisk3 00f6f5d015a44307 amyvg
hdisk4 00f6f5d0166106fa xsitevg concurrent
hdisk5 00f6f5d0166114f3 xsitevg concurrent
hdisk6 00f6f5d029906df4 xsitevg concurrent
hdisk7 00f6f5d0596beebf xsitevg concurrent
5. Then after performing the maintenance, restart the cluster services on node cassidy as
shown in Example 6-12. Notice after that the CAA cluster and caavg_private are back and
active.
Example 6-12 Starting cluster services on individual node after maintenance performed
[cassidy:root] / # clmgr start node cassidy WHEN=now MANAGE=auto START_CAA=yes
....
"cassidy" is now online.
[cassidy:root] / # lspv
hdisk0 00f70c99013e28ca rootvg active
hdisk1 00f6f5d015a4310b caavg_private active
hdisk2 00f6f5d015a44307 None
hdisk3 00f6f5d01660fbd1 None
hdisk4 00f6f5d0166106fa xsitevg concurrent
hdisk5 00f6f5d0166114f3 xsitevg concurrent
hdisk6 00f6f5d029906df4 xsitevg concurrent
hdisk7 00f6f5d0596beebf xsitevg concurrent
hdisk8 00f70c995a1bc94a None
Refer to the following website for the PowerHA repository disk requirements:
https://www.ibm.com/docs/en/powerha-aix/7.2?topic=planning-repository-disk-cluster
-multicast-ip-addressps
[Entry Fields]
Site Name fortworth
* Repository Disk [(00f61ab216646614] +
+--------------------------------------------------------------------------+
| Repository Disk |
| |
| Move cursor to desired item and press Enter. |
| |
| hdisk2 (00f61ab216646614) on all nodes at site fortworth |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F8=Image F10=Exit Enter=Do |
F5| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 6-10 Add a repository disk
5. Select the new repository disk by pressing F4. See Figure 6-11.
6. Synchronize the cluster.
This procedure of replacing a repository disk can also be accomplished by using the clmgr
command, as shown in 6.7, “Critical volume groups” on page 220. Of course if you are not
using sites, you can exclude the site option from the syntax.
[Entry Fields]
Site Name fortworth
* Repository Disk [00f61ab216646614] +
+--------------------------------------------------------------------------+
| Repository Disk |
| |
| Move cursor to desired item and press Enter. |
| |
| 00f61ab216646614 |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F8=Image F10=Exit Enter=Do |
F5| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Then of course, synchronize the cluster as stated in the output, via clmgr sync cluster.
Critical volume groups safeguard the Oracle RAC voting disks. PowerHA continuously
monitors the read-write accessibility of the voting disks. You can set up one of the following
recovery actions if you lose access to a volume group:
Notify only.
Halt the node.
Fence the node so that the node remains up but it cannot access the Oracle database.
Shut down cluster services and bring all resource groups offline.
Important: The critical volume groups and the Multi-Node Disk Heart Beat do not replace
the SAN-based disk heartbeat. These technologies are used for separate purposes.
If you have Oracle RAC, you must have at least one designated volume group for voting.
Follow these steps to configure a critical volume group:
1. Set up a concurrent resource group for two or more nodes:
– Startup policy: Online on all available nodes
– Fallover policy: Bring offline
– Fallback policy: Never fallback
2. Create an enhanced concurrent volume group, which is accessible for all nodes in the
resource group. This volume group stores the Oracle RAC voting files.
3. Add the volume group to the concurrent resource group.
4. Synchronize your cluster.
5. Start smitty cspoc and select Storage → Volume Groups → Manage Critical Volume
Groups → Mark a Volume Group as Critical.
6. Select the volume group from the pick list as shown in Figure 6-12 on page 221.
+--------------------------------------------------------------------------+
| Select the Volume Group to mark Critical |
| |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| #Volume Group Resource Group Node List |
| leevg redbookrg jessica,jordan |
| |
| F1=Help F2=Refresh F3=Cancel |
| F7=Select F8=Image F10=Exit |
| Enter=Do /=Find n=Find Next |
+--------------------------------------------------------------------------+
7. Upon pressing Enter verification of creation is displayed along with reminder to sync
cluster and what default action is as shown in Figure 6-13.
COMMAND STATUS
cl_chvg: The HACMP configuration has been changed - CRITICAL Volume Group leevg has been
added. The configuration must be synchronized to make this
change effective across the cluster
cl_chvg: The default action for Critical Volume Group leevg on loss of quorum is to halt
the node.
The SMIT panel to 'Configure failure action for CRITICAL Volume Groups' can be used to
change this action
8. Configure the failure action: Start smitty cspoc and select Storage → Volume Groups →
Manage Critical Volume Groups → Configure failure actions for Critical Volume
Groups.
9. Select the volume group from the pop-up picklist.
10.Select a recovery action on the loss of disk access:
– Notify Only
– Halt the node
– Fence the node
– Shutdown Cluster Services and bring all Resource Groups Offline
11.Optionally you can also specify a notification method as shown in Figure 6-14.
12.Synchronize the cluster via clmgr sync cluster.
[Entry Fields]
On loss of access Notify Only +
Optional notification method []
Volume Group leevg
+--------------------------------------------------------------------------+
| On loss of access |
| |
| Move cursor to desired item and press Enter. |
| |
| Notify Only |
| Halt the node |
| Fence the node |
| Shutdown Cluster Services and bring all Resource Groups Offline |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
| /=Find n=Find Next |
+--------------------------------------------------------------------------+
You can start a test, let it run unattended, and return later to evaluate the results of your
testing. You should run the utility under both low load and high load conditions to observe how
system load affects your PowerHA cluster.
You run the Cluster Test Tool from SMIT on one node in a PowerHA cluster. For testing
purposes, this node is referred to as the control node. From the control node, the tool runs a
series of specified tests, some on other cluster nodes, gathers information about the success
or failure of the tests processed, and stores this information in the Cluster Test Tool log file for
evaluation or future reference.
Important: If you uninstall PowerHA, the program removes any files that you might have
customized for the Cluster Test Tool. If you want to retain these files, copy them before you
uninstall PowerHA.
time spent is restarting the applications. In our test, without a real application, the entire
series of automated test completed in about seven minutes.
Individual tests can take approximately three minutes to run. The following conditions affect
the length of time to run the tests:
Cluster complexity.
Testing in complex environments takes considerably longer.
Network latency.
Cluster testing relies on network communication between the nodes. Any degradation in
network performance slows the performance of the Cluster Test Tool.
Use of verbose logging for the tool.
Custom user defined resources or events.
If you customize verbose logging to run additional commands from which to capture
output, testing takes longer to complete. In general, the more commands you add for
verbose logging, the longer a test procedure takes to complete.
Manual intervention on the control node.
At some points in the test, you might need to intervene.
Running custom tests.
If you run a custom test plan, the number of tests run also affects the time required to run
the test procedure. If you run a long list of tests, or if any of the tests require a substantial
amount of time to complete, then the time to process the test plan increases.
6.8.2 Considerations
The Cluster Test Tool has several considerations. It does not support testing of the following
PowerHA cluster-related components:
Resource groups with dependencies
Replicated resources
You can perform general cluster testing for clusters that support sites, but not testing that is
specific to PowerHA sites or any of the PowerHA/EE products. Here are some situations
regarding cluster testing:
Replicated resources:
You can perform general cluster testing for clusters that include replicated resources, but
not testing specific to replicated resources, including GLVM, or any of the PowerHA/EE
products.
Dynamic cluster reconfiguration:
You cannot run dynamic reconfiguration while the tool is running.
Pre-events and post-events:
These events run in the usual way, but the tool does not verify that they were run or that
the correct action was taken.
In addition, the Cluster Test Tool might not recover from the following situations:
A node that fails unexpectedly, that is, a failure not initiated by testing.
The cluster does not stabilize.
The automated test procedure runs a predefined set of tests on a node that the tool randomly
selects. The tool ensures that the node selected for testing varies from one test to another.
You can run the automated test procedure on any PowerHA cluster that is not currently in
service.
The automated test procedure runs sets of predefined tests in this order:
1. General topology tests
2. Resource group tests on non-concurrent resource groups
3. Resource group tests on concurrent resource groups
4. IP-type network tests for each network
5. Non-IP network tests for each network
6. Volume group tests for each resource group
7. Site-specific tests
8. Catastrophic failure test
The Cluster Test Tool discovers information about the cluster configuration and randomly
selects cluster components, such as nodes and networks, to be used in the testing.
Which nodes are used in testing varies from one test to another. The Cluster Test Tool can
select some nodes for the initial battery of tests, and then, for subsequent tests it can
intentionally select the same nodes, or choose from nodes on which no tests were run
previously. In general, the logic in the automated test sequence ensures that all components
are sufficiently tested in all necessary combinations.
The automated test procedure runs a node_up event at the beginning of the test to ensure
that all cluster nodes are up and available for testing.
The Cluster Test Tool uses legacy terminology for stopping cluster services,
When the automated test procedure starts, the tool runs each of the following tests in the
order shown:
1. NODE_UP, ALL, Start cluster services on all available nodes.
2. NODE_DOWN_GRACEFUL, node1, Stop cluster services gracefully on a node.
3. NODE_UP, node1, Restart cluster services on the node that was stopped.
4. NODE_DOWN_TAKEOVER, node2, Stop cluster services with takeover on a node.
5. NODE_UP, node2, Restart cluster services on the node that was stopped.
6. NODE_DOWN_FORCED, node2, Stop cluster services forced on a node.
7. NODE_UP, node3, Restart cluster services on the node that was stopped.
The Cluster Test Tool runs each of the following tests, in the order listed here, for each
resource group:
1. Bring a resource group offline and online on a node:
RG_OFFLINE, RG_ONLINE
2. Bring a local network down on a node to produce a resource group fallover:
NETWORK_DOWN_LOCAL, rg_owner, svc1_net, Selective fallover on local network
down
3. Recover the previously failed network:
NETWORK_UP_LOCAL, prev_rg_owner, svc1_net, Recover previously failed network
4. Move a resource group to another node:
RG_MOVE
5. Bring an application server down and recover from the application failure:
SERVER_DOWN, ANY, app1, /app/stop/script, Recover from application failure
Network tests
The tool runs tests for IP networks and for non-IP networks. For each IP network, the tool
runs these tests:
Bring a network down and up:
NETWORK_DOWN_GLOBAL, NETWORK_UP_GLOBAL
Fail a network interface, join a network interface. This test is run for the service interface
on the network. If no service interface is configured, the test uses a random interface
defined on the network:
FAIL_LABEL, JOIN_LABEL
Site-specific tests
If sites are present in the cluster, the tool runs tests for them. The automated testing
sequence that the Cluster Test Tool uses contains two site-specific tests:
auto_site: This sequence of tests runs if you have any cluster configuration with sites. For
example, this sequence is used for clusters with cross-site LVM mirroring configured that
does not use XD_data networks. The tests in this sequence include:
SITE_DOWN_GRACEFUL Stops the cluster services on all nodes in a site while taking
resources offline.
SITE_UP Restarts the cluster services on the nodes in a site.
SITE_DOWN_TAKEOVER Stops the cluster services on all nodes in a site and move the
resources to nodes at another site.
SITE_UP Restarts the cluster services on the nodes at a site.
RG_MOVE_SITE Moves a resource group to a node at another site.
auto_site_isolation: This sequence of tests runs only if you configured sites and an
XD-type network. The tests in this sequence include:
SITE_ISOLATION Isolates sites by failing XD_data networks.
SITE_MERGE Merges sites by bringing up XD_data networks.
When the tool terminates the Cluster Manager on the control node, you most likely will need
to reactivate the node.
You create a custom test plan – a file that lists a series of tests to be run – to meet
requirements specific to your environment and apply that test plan to any number of clusters.
You specify the order in which tests run and the specific components to be tested. After you
set up your custom test environment, you run the test procedure from SMIT and view test
results in SMIT and in the Cluster Test Tool log file.
Your test procedure should bring each component offline then online, or cause a resource
group fallover, to ensure that the cluster recovers from each failure. Start your test by running
a node_up event on each cluster node to ensure that all cluster nodes are up and available for
testing.
Note: The Cluster Test Tool uses existing terminology for stopping cluster services as
follows:
Graceful = Bring Resource Groups Offline
Takeover = Move Resource Groups
Forced = Unmanage Resource Groups
NODE_DOWN_TAKEOVER Stops cluster services with the resources acquired by another node.
NODE_DOWN_FORCED Stops cluster services on the specified node with the Unmanage
Resource Group option.
CLSTRMGR_KILL Terminates the Cluster Manager on the specified node.
RG_MOVE Moves a resource group that is already online to a specific node.
RG_MOVE_SITE Moves a resource group that is already online to an available node at a
specific site.
RG_OFFLINE Brings a resource group offline that is already online.
RG_ONLINE Brings a resource group online that is already offline.
SERVER_DOWN Brings a monitored application server down.
SITE_ISOLATION Brings down all XD_data networks in the cluster at which the tool is
running, thereby causing a site isolation.
SITE_MERGE Brings up all XD_data networks in the cluster at which the tool is
running, thereby simulating a site merge. Run the SITE_MERGE test
after running the SITE_ISOLATION test.
SITE_UP Starts cluster services on all nodes at the specified site that are
currently stopped.
SITE_DOWN_TAKEOVER Stops cluster services on all nodes at the specified site and moves the
resources to nodes at another site by launching automatic rg_move
events.
SITE_DOWN_GRACEFUL Stops cluster services on all nodes at the specified site and takes the
resources offline.
VG_DOWN Emulates an error condition for a specified disk that contains a volume
group in a resource group.
WAIT Generates a wait period for the Cluster Test Tool.
Note: One of the success indicators for each test is that the cluster becomes stable.
Test syntax
The syntax for a test is as follows:
TEST_NAME, parameter1, parametern|PARAMETER, comments
Where:
The test name is in uppercase letters.
Parameters follow the test name.
Italic text indicates parameters expressed as variables.
Commas separate the test name from the parameters and the parameters from each
other. A space around the commas is also supported.
The syntax line shows parameters as parameter1 and parametern with n representing the
next parameter. Tests typically have 2 - 4 parameters.
The vertical bar, or pipe character (|), indicates parameters that are mutually exclusive
alternatives.
Optional: The comments part of the syntax is user-defined text that appears at the end of
the line. The Cluster Test Tool displays the text string when the Cluster Test Tool runs.
Node tests
The node tests start and stop cluster services on specified nodes.
The following command starts the cluster services on a specified node that is offline or on
all nodes that are offline:
NODE_UP, node | ALL, comments
Where:
node The name of a node on which cluster services start
ALL Any nodes that are offline have cluster services start
comments User-defined text to describe the configured test
Example
NODE_UP, node1, Bring up node1
Entrance criteria
Any node to be started is inactive.
Success indicators
The following conditions indicate success for this test:
– The cluster becomes stable.
– The cluster services successfully start on all specified nodes.
– No resource group enters the error state.
– No resource group moves from online to offline.
Ÿ The following command stops cluster services on a specified node and brings resource
groups offline:
NODE_DOWN_GRACEFUL, node | ALL, comments
Where:
node The name of a node on which cluster services stop.
ALL All nodes that are online to have cluster services stop. At least
one node in the cluster must be online.
comments User-defined text to describe the configured test.
Example
NODE_DOWN_GRACEFUL, node3, Bring down node3 gracefully
Entrance criteria
Any node to be stopped is active.
Success indicators
The following conditions indicate success for this test:
– The cluster becomes stable.
– Cluster services stop on the specified nodes.
Entrance criteria
The specified node is active and has at least one active interface on the specified network.
Success indicators
The following conditions indicate success for this test:
– The cluster becomes stable.
– Cluster services continue to run on the cluster nodes where they were active before
the test.
– Resource groups on other nodes remain in the same state; however, some might be
hosted on a different node.
– If the node hosts a resource group for which the recovery method is set to notify, the
resource group does not move.
The following command brings up the specified network on all nodes that have interfaces
on the network. The specified network can be an IP network or a serial network.
The only time you can have a resource group online and the service label hosted on an
inactive interface is when the service interface fails but there was no place to move the
resource group, in which case it stays online.
Example
JOIN_LABEL, app_serv_address, Bring up app_serv_address on node 2
Entrance criteria
Specified interface is currently active on the specified node.
Success indicators
The following conditions indicate success for this test:
– The cluster becomes stable.
– Specified interface comes up on specified node.
– Cluster services continue to run on the cluster nodes where they were active before the
test.
– Resource groups that are in the ERROR state on the specified node and that have a
service IP label availab.le on the network can go online, but should not enter the
ERROR state.
– Resource groups on other nodes remain in the same state.
The following command brings down a network interface that is associated with a
specified label on a specified node:
FAIL_LABEL, iplabel, comments
Where:
iplabel The IP label of the interface.
comments User-defined text to describe the configured test.
Example
FAIL_LABEL, app_serv_label, Bring down app_serv_label, on node 2
Entrance criteria
The specified interface is currently inactive on the specified node.
Success indicators
The following conditions indicate success for this test:
– The cluster becomes stable.
– Any service labels that were hosted by the interface are recovered.
– Resource groups that are in the ERROR state on the specified node and that have a
service IP label available in the network can go online, but should not enter the
ERROR state.
– Resource groups remain in the same state; however, the resource group can be hosted
by another node.
Site tests
These tests are for the site.
The following command fails all the XD_data networks, causing the site_isolation event:
SITE_ISOLATION, comments
Where:
comments User-defined text to describe the configured test.
Example
SITE_ISOLATION, Fail all the XD_data networks
Entrance criteria
At least one XD_data network is configured and is up on any node in the cluster.
Success indicators
The following conditions indicate success for this test:
– The XD_data network fails, no resource groups change state.
– The cluster becomes stable.
The following command runs when at least one XD_data network is up to restore
connections between the sites, and remove site isolation. Run this test after running the
SITE_ISOLATION test:
SITE_MERGE, comments
Where:
comments User-defined text to describe the configured test.
Example
SITE_MERGE, Heal the XD_data networks
Entrance criteria
At least one node must be online.
Success indicators
The following conditions indicate success for this test:
– No resource groups change state.
– The cluster becomes stable.
The following command stops cluster services and moves the resource groups to other
nodes, on all nodes at the specified site:
SITE_DOWN_TAKEOVER, site, comments:
Where:
site The site that contains the nodes on which cluster services will
be stopped.
comments User-defined text to describe the configured test.
Example
SITE_DOWN_TAKEOVER, site_1, Stop cluster services on all nodes at site_1,
bringing the resource groups offline and moving the resource groups.
Entrance criteria
At least one node at the site must be online.
Success indicators
The following conditions indicate success for this test:
– Cluster services are stopped on all nodes at the specified site. All primary instance
resource groups mover to the another site.
– All secondary instance resource groups go offline.
– The cluster becomes stable.
The following command starts cluster services on all nodes at the specified site:
SITE_UP, site, comments
Where:
site Site that contains the nodes on which cluster services will be
started
comments User-defined text to describe the configured test
Example
SITE_UP, site_1, Start cluster services on all nodes at site_1.
Entrance criteria
At least one node at the site must be offline.
Success indicators
The following conditions indicate success for this test:
– Cluster services are started on all nodes at the specified site.
– Resource groups remain in the same state.
– The cluster becomes stable.
General tests
Other tests that are available to use in PowerHA cluster testing are as follows:
Bring down an application server
Terminate the Cluster Manager on a node
Add a wait time for test processing.
Note: If CLSTRMGR_KILL is run on the local node, you might need to reboot the node. On
startup, the Cluster Test Tool automatically starts again. You can avoid manual
intervention to reboot the control node during testing by doing these tasks:
Editing the /etc/cluster/hacmp.term file to change the default action after an
abnormal exit. The clexit.rc script checks for the presence of this file and, if the file
is executable, the script calls it instead of halting the system automatically.
Configuring the node to automatic Initial Program Load (IPL) before running the
Cluster Test Tool (it stops).
For the Cluster Test Tool to accurately assess the success or failure of a CLSTRMGR_KILL
test, do not do other activities in the cluster while the Cluster Test Tool is running.
Example
CLSTRMGR_KILL, node5, Bring down node5 hard
Entrance criteria
The specified node is active.
Success indicators
The following conditions indicate success for this test:
– The cluster becomes stable.
– Cluster services stop on the specified node.
– Cluster services continue to run on other nodes.
– Resource groups that were online on the node where the Cluster Manager fails move
to other nodes.
– All resource groups on other nodes remain in the same state.
The following command generates a wait period for the Cluster Test Tool for a specified
number of seconds:
WAIT, seconds, comments
Where:
seconds Number of seconds that the Cluster Test Tool waits before
proceeding with processing
comments User-defined text to describe the configured test
Example
WAIT, 300, We need to wait for five minutes before the next test
Entrance criteria
Not applicable
Success indicators
Not applicable
It also includes a WAIT interval. The comment text at the end of the line describes the action
that the test will do.
Variables File Optional. This field contains the full path to the variables file for the
Cluster Test Tool. This file specifies the variable definitions used in
processing the test plan.
Verbose Logging When set to Yes (default), the log file includes extra information
that might help you to judge the success or failure of some tests.
Select No to decrease the amount of information logged by the
Cluster Test Tool.
Cycle Log File When set to Yes (default), uses a new log file to store output from
the Cluster Test Tool. Select No to append messages to the current
log file.
Abort On Error When set to No (default), the Cluster Test Tool continues to run
tests after some of the tests fail. This might cause subsequent tests
to fail because the cluster state differs from the state expected by
one of those tests. Select Yes to stop processing after the first test
fails.
Note: The tool stops running and issues an error if a test fails and Abort On Error is set
to Yes.
[Entry Fields]
* Test Plan [/cluster/custom] /
Variables File [/cluster/testvars] /
Verbose Logging [Yes] +
Cycle Log File [Yes] +
Abort On Error [No] +
Important: If you uninstall PowerHA, the program removes any files that you might have
customized for the Cluster Test Tool. If you want to retain these files, copy them before you
uninstall PowerHA.
Log files
If a test fails, the Cluster Test Tool collects information in the automatically created log files.
You evaluate the success or failure of tests by reviewing the contents of the Cluster Test Tool
log file, /var/hacmp/log/cl_testtool.log. PowerHA never deletes the files in this directory.
For each test plan that has any failures, the tool creates a new directory under
/var/hacmp/log/. If the test plan has no failures, the tool does not create a log directory. The
directory name is unique and consists of the name of the Cluster Test Tool plan file, and the
time stamp when the test plan was run.
Note: Detailed output from an automated cluster test is in Appendix B, “Cluster Test Tool
log” on page 591.
The tool also rotates the files: the oldest file is overwritten. If you do not want the tool to rotate
the log files, you can disable this feature from SMIT.
Highly available environments require special consideration when you plan changes to the
environment. Be sure to follow a strict change management discipline.
Before we describe cluster management in more detail, we emphasize the following general
preferred practices for cluster administration:
Where possible, use the PowerHA C-SPOC facility to make changes to the cluster.
Document routine operational procedures (for example, shutdown, startup, and increasing
the size of a file system).
Restrict access to the root password to trained PowerHA administrators. Utilize the SMUI
and create users allowed to perform specific tasks as shown at
https://youtu.be/NYBUa5bWIK4
Always take a snapshot of your existing configuration before making any changes. If
performing live changes, known as DARE, a cluster snapshot of the active configuration is
automatically created.
Monitor your cluster regularly. More information on monitoring can be found in 7.7,
“Monitoring PowerHA” on page 304.
The C-SPOC function is provided through its own set of cluster administration commands,
accessible through SMIT menus. The commands are in the /usr/es/sbin/cluster/cspoc
directory. C-SPOC uses the Cluster Communications daemon (clcomdES) to run commands
on remote nodes. If this daemon is not running, the command might not be run and C-SPOC
operation might fail.
Note: After PowerHA is installed clstrmgrES is started from inittab, so it is always running
whether cluster services are started or not.
C-SPOC operations fail if any target node is down at the time of execution or if the selected
resource is not available. It requires a correctly configured cluster in the sense that all nodes
within the cluster can communicate.
If node failure occurs during a C-SPOC operation, an error is displayed to the SMIT panel and
the error output is recorded in the C-SPOC log file (cspoc.log). Check this log if any C-SPOC
problem occurs. For more information about PowerHA logs, see 7.7.5, “Log files” on
page 316.
Storage
This option contains utilities to assist with the cluster-wide administration of shared volume
groups, logical volumes, file systems, physical volumes, and mirror pools. For more details
about this topic, see 7.4.4, “C-SPOC Storage” on page 273.
PowerHA SystemMirror Services
This option contains utilities to start and stop cluster services on selected nodes and also
the function to show running cluster services on the local node. For more details, see 6.2,
“Starting and stopping the cluster” on page 193 and “Checking cluster subsystem status”
on page 308.
Communication Interfaces
This option contains utilities to manage configuration of communications interfaces to AIX
and update PowerHA with these settings.
Resource Groups and Applications
This option contains utilities to manipulate resource groups in addition to application
monitoring and application availability measurement tools. For more information about
application monitoring, see 7.7.9, “Application monitoring” on page 325. For more
information about the application availability analysis tool, see 7.7.10, “Measuring
application availability” on page 335.
PowerHA SystemMirror logs
This option contains utilities to display the contents of some log files and change the
debug level and format of log files (standard HTML). You can also change the location of
cluster log files in this menu. For more details about these topics, see 7.7.5, “Log files” on
page 316.
File Collections
This option contains utilities to assist with file synchronization throughout the cluster. A file
collection is a user defined set of files. For more details about file collections, see 7.2,
“File collections” on page 247.
Security and Users
This option contains menus and utilities for various security settings and also users,
groups, and password management within a cluster. For more details about security, see
8.1, “Cluster security” on page 338. For details about user management, see 7.3, “User
administration” on page 254.
LDAP
This option is used to configures LDAP server and client for PowerHA SystemMirror
cluster environment. This LDAP will be used as a central repository for implementing most
of the security features.
Open a SMIT Session on a Node
This options allows you to run a remote SMIT session from another node in the cluster.
However, it is directed to the standard AIX default SMIT menu and not PowerHA or
CSPOC specific menu.
or remove files from it, and you can specify the frequency at which PowerHA synchronizes
these files.
PowerHA retains the permissions, ownership, and time stamp of the file on the local node and
propagates this to the remote nodes. You can specify ordinary files for a file collection. You
can also specify a directory and wild card file names. You cannot add the following items:
Symbolic links
Wild card directory names
Pipes
Sockets
Device files (/dev/*)
Files from the /proc directory
ODM files from /etc/objrepos/* and /etc/es/objrepos/*
Always use full path names. Each file can be added to only one file collection, except those
files that are automatically added to the HACMP_Files collection. The files should not exist
on the remote nodes, PowerHA creates them during the first synchronization. Any zero length
or non-existent files are not propagated from the local node.
PowerHA creates a backup copy of the modified files during synchronization on all nodes.
These backups are stored in the /var/hacmp/filebackup directory. Only one previous version
is retained and you can only manually restore them.
Important: You are responsible for ensuring that files on the local node (where you start
the propagation) are the most recent and are not corrupted.
Configuration_Files
This collection contains the essential AIX configuration files:
/etc/hosts
/etc/services
/etc/snmpd.conf
/etc/snmpdv3.conf
/etc/rc.net
/etc/inetd.conf
/usr/es/sbin/cluster/netmon.cf
/usr/es/sbin/cluster/etc/clhosts
/usr/es/sbin/cluster/etc/rhosts
/usr/es/sbin/cluster/etc/clinfo.rc
You can add to or remove files from the file collections. See “Adding files to a file collection”
on page 252 for more information.
HACMP_Files
If you add any of the following user-defined files to your cluster configuration, then they are
automatically included in the HACMP_Files file collection:
Application server start script
Application server stop script
Event notify script
Pre-event script
Post-event script
Event error recovery script
Application monitor notify script
Application monitor cleanup script
Application monitor restart script
Pager text message file
HA Tape support start script
HA Tape support stop script
User-defined event recovery program
Custom snapshot method script
For an example of how this works, our cluster has an application server, app_server_1 which
has the following three files:
A start script: /usr/app_scripts/app_start
A stop script: /usr/app_scripts/app_stop
A custom post-event script to the PowerHA node_up event:
/usr/app_scripts/post_node_up
These three files were automatically added to the HACMP_Files file collection when we
defined them during PowerHA configuration.
Note: You cannot manually add or remove files from this file collection. Also it cannot be
renamed. When using the HACMP_Files collection, be sure all scripts work as designed
on all nodes.
If you do not want to synchronize all of your user-defined scripts or if they are not the same on
all nodes, then disable this file collection and create another one, which includes only the
required files.
Example 7-1 How to list which files are included in an existing file collection
Change/Show a File Collection
[Entry Fields]
File Collection Name HACMP_Files
New File Collection Name []
File Collection Description [User-defined scripts >
+--------------------------------------------------------------------------+
| Collection files |
| |
| The value for this entry field must be in the |
| range shown below. |
| Press Enter or Cancel to return to the entry field, |
| and enter the desired value. |
| |
| /tmp/app_scripts/app_start |
| /tmp/app_scripts/app_stop |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F8=Image F10=Exit Enter=Do |
F5| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
If both “Propagate files” options are kept as No, no automatic synchronization will occur.
[Entry Fields]
* File Collection Name [application_files]
File Collection Description [Application config fi>
Propagate files during cluster synchronization? yes +
Propagate files automatically when changes are det no +
ected?
[Entry Fields]
File Collection Name Configuration_Files
New File Collection Name []
File Collection Description [AIX and HACMP configu>
Propagate files during cluster synchronization? no +
Propagate files automatically when changes are det no +
ected?
Collection files +
[Entry Fields]
File Collection Name app_files
File Collection Description Application configura>
Propagate files during cluster synchronization? no
Propagate files automatically when changes are det no
ected?
Collection files
* New File [/usr/app/config_file]/
+--------------------------------------------------------------------------+
| Select one or more files to remove from this File Collection |
| |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| /usr/app/data.conf |
| /usr/app/app.conf |
| /usr/app/config_file |
| |
| F1=Help F2=Refresh F3=Cancel |
| F7=Select F8=Image F10=Exit |
| Enter=Do /=Find n=Find Next |
+--------------------------------------------------------------------------
Figure 7-4 Removing files from a file collection
COMMAND STATUS
Here are a couple options to consider for user and password synchronization:
Using C-SPOC: PowerHA provides utilities in C-SPOC for easy user administration. See
7.3.1, “C-SPOC user and group administration” on page 254.
LDAP is the best solution for managing a large number of users in a complex environment.
LDAP can be set up to work together with PowerHA. For more information about LDAP,
see Understanding LDAP - Design and Implementation, SG24-4986.
Note: PowerHA C-SPOC does provide SMIT panels for configuring both LDAP servers
and clients. The fast path is smitty cl_ldap. However, we do not provide additional details
about that topic.
Adding a user
To add a user on all nodes in the cluster, follow these steps:
1. Start SMIT: Run smitty cspoc and then select Security and Users.
Or you can use the fast path by entering smitty cl_usergroup.
2. Select Users in a PowerHA SystemMirror cluster.
3. Select Add a User to the Cluster.
4. Select either LOCAL(FILES) or LDAP.
5. Select a resource group in which you want to create users. If the Select Nodes by
Resource Group field is kept empty, the user will be created on all nodes in the cluster. If
you select a resource group here, the user will be created only on the subset of nodes on
which that resource group is configured to run. In the case of a two-node cluster, leave this
field blank.
If you have more than two nodes in your cluster, you can create users that are related to
specific resource groups. If you want to create a user for these nodes only (for example,
user can log in to jordan and jessica, but user is not allowed to log in to harper or athena),
select the appropriate resource group name from the pick list. See Figure 7-6.
[Entry Fields]
Select nodes by Resource Group [] +
*** No selection means all nodes! ***
+-----------------------------------------------------------------------+
¦ Select nodes by Resource Group ¦
¦ *** No selection means all nodes! *** ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ bdbrg ¦
¦ concrg ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
F1¦ F8=Image F10=Exit Enter=Do ¦
F5¦ /=Find n=Find Next ¦
F9+-----------------------------------------------------------------------+
Figure 7-6 Select nodes by resource group
6. Create the user. Supply the user name and other relevant information just as you do when
creating any typical user. You can specify the user ID here, however, if the user ID is
already on a node the command will fail. If you leave the User ID field blank, the user will
be created with the first available ID on all nodes. See the SMIT panel in Figure 7-7 on
page 256.
Note: When you create a user’s home directory and if it is to reside on a shared file
system, C-SPOC does not check whether the file system is mounted or if the volume group
is varied. In this case, C-SPOC creates the user home directory under the empty mount
point of the shared file system.You can correct this by moving the home directory to under
the shared file system.
If a user’s home directory is on a shared file system, the user can only log in on the node
where the file system is mounted.
jordan root 0 /
jordan daemon 1 /etc
jordan bin 2 /bin
jordan sys 3 /usr/sys
jordan adm 4 /var/adm
jordan sshd 207 /var/empty
jordan sbodily249 /home/sbodily
jordan killer 303 /home/killer
jordan jerryc 305 /home/jerryc
jessica root 0 /
jessica daemon 1 /etc
jessica bin 2 /bin
jessica sys 3 /usr/sys
jessica adm 4 /var/adm
jessica sshd 207 /var/empty
jessica sbodily249 /home/sbodily
jessica killer 303 /home/killer
jessica jerryc 305 /home/jerryc
Removing a user
To remove a user, follow these steps:
1. Start C-SPOC Security and Users by entering smitty cl_usergroup → Users in an
PowerHA SystemMirror Cluster → Remove a User from the Cluster.
2. Select either LOCAL(FILES) or LDAP from pop-up list.
3. Select the nodes from which you want to remove a user. If you leave the Select Nodes by
Resource Group field empty, any user can be removed from all nodes.
If you select a resource group here, C-SPOC will remove the user from only the nodes that
belong to the specified resource group.
4. Enter the user name to remove or press F4 to select a user from the pick list.
5. For Remove AUTHENTICATION information, select Yes (the default) to delete the user
password and other authentication information. Select No to leave the user password in
the /etc/security/passwd file. See Figure 7-8.
[Entry Fields]
Select nodes by resource group
*** No selection means all nodes! ***
Table 7-1 is a cross-reference between resource groups, nodes, and groups. It shows
“support” present on all nodes (leave the Select Nodes by Resource Group field empty),
while groups such as dbadmin will be created only on jordan and jessica (select bdbrg in
the Select Nodes by Resource Group field).
4. Create the group. See SMIT panel in Figure 7-9 on page 260. Supply the group name,
user list, and other relevant information just as when you create any normal group. Press
F4 for the list of the available users to include in the group.
You can specify the group ID here. However, if it is already used on a node, the command
will fail. If you leave the Group ID field blank, the group will be created with the first
available ID on all cluster nodes.
[Entry Fields]
Select nodes by resource group
*** No selection means all nodes! ***
[Entry Fields]
Select nodes by resource group
Removing a group
To remove a group from a cluster, follow these steps:
1. Start C-SPOC Security and User by entering smitty cl_usergroup and selecting
Groups in a PowerHA SystemMirror Cluster → Remove a Group from the Cluster.
2. Select either LOCAL(FILES) or LDAP from pop-up menu list.
3. Select the nodes whose groups you want to change. If you leave the Select Nodes by
Resource Group option empty, C-SPOC will remove the selected group from all cluster
nodes. If you select a resource group here, C-SPOC will remove the group from only the
nodes which belong to the specified resource group. Select the group to remove.
4. Enter the name of the group you want to modify or press F4 to select from the pick list.
c. Select the nodes where you want to change the password utility. Leave this field blank
for all nodes. We suggest that you set up the cluster password utility on all nodes.
[Entry Fields]
* /bin/passwd utility is [Link to Cluster Passw> +
+-----------------------------------------------------------------------+
¦ /bin/passwd utility is ¦
¦ ¦
¦ Move cursor to desired item and press Enter. ¦
¦ ¦
¦ Original AIX System Command ¦
¦ Link to Cluster Password Utility ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
F1¦ F8=Image F10=Exit Enter=Do ¦
F5¦ /=Find n=Find Next ¦
F9+-----------------------------------------------------------------------+
2. Create a list of users who can change their own password from any cluster node:
a. Start C-SPOC Security and Users by entering smitty cl_usergroup and selecting
Passwords in an PowerHA SystemMirror cluster → Manage List of Users Allowed
to Change Password.
b. SMIT shows the users who are allowed to change their password cluster-wide (see
Figure 7-12).
[Entry Fields]
Users allowed to change password [logan longr toby] +
cluster-wide
Figure 7-12 Managing list of users allowed to change their password cluster-wide
c. To modify the list of the users who are allowed to change their password cluster-wide,
press F4 and select the user names from the pop-up list. Choose ALL_USERS to
enable all current and future cluster users to use C-SPOC password management. See
Figure 7-13.
We suggest that you include only real named users here, and manually change the
password for the technical users.
Manage List of Users Allowed to Change Password
[Entry Fields]
Users allowed to change password [logan longr toby] +
+-----------------------------------------------------------------------+
¦ Users allowed to change password ¦
¦ cluster-wide ¦
¦ Move cursor to desired item and press F7. ¦
¦ ONE OR MORE items can be selected. ¦
¦ Press Enter AFTER making all selections. ¦
¦ ¦
¦ ALL_USERS ¦
¦ logan ¦
¦ sbodily ¦
¦ killer ¦
¦ longr ¦
¦ toby ¦
¦ ¦
¦ F1=Help F2=Refresh F3=Cancel ¦
F1¦ F7=Select F8=Image F10=Exit ¦
F5¦ Enter=Do /=Find n=Find Next ¦
F9+-----------------------------------------------------------------------+
Note: If you enable C-SPOC password utilities for all users in the cluster, but
you have users who only exist on one node, an error message occurs similar
to this example:
# passwd shane
Changing password for "shane"
shane’s New password:
Enter the new password again:
jessica: clpasswdremote: User shane does not exist on node jessica
jessica: cl_rsh had exit code = 1, see cspoc.log or clcomd.log for
more information
4. Type the user name or press F4 to select a user from the pop-up list.
5. Set User must change password on first login to either true or false as you prefer.
See Figure 7-14.
[Entry Fields]
Selection nodes by resource group
*** No selection means all nodes! ***
Tip: You can still use the AIX passwd command to change a specific user’s password on
all nodes if a previously linked password has been enabled. Otherwise it must be executed
manually on each node.
[Entry Fields]
Selection nodes by resource group [] +
*** No selection means all nodes! ***
Tip: You can use the passwd command to change your password on all nodes if previously
linked password has been enabled. Otherwise it would have to be executed manually on
each node.
[Entry Fields]
* EFS keystore mode LDAP +
EFS admin password [redbook]
Volume group for Keystore [keyvg] +
Service IP [ashleysvcip] +
EFS Management
If you use C-SPOC to make LVM changes within a PowerHA cluster, the changes are
propagated automatically to all nodes selected for the LVM operation.
Note: Ownership and permissions on logical volume devices are reset when a volume
group is exported and then reimported. After exporting and importing, a volume group will
be owned by root:system. Some applications that use raw logical volumes might be
affected by this. You must check the ownership and permissions before exporting volume
group and restore them back manually in case they are not root:system as the default.
Instead of export and import commands, you can use the importvg -L vgname hdisk
command on the remote nodes, but be aware that the -L option requires that the volume
group has not been exported on the remote nodes. The importvg -L command preserves the
logical volume devices ownership.
Lazy update
In a cluster, PowerHA controls when volume groups are activated. PowerHA implements a
function called lazy update. With the LVM enhancements of adding the learning flag (-L) this
became known as better than lazy update.
This function examines the volume group time stamp, which is maintained in both the volume
group’s VGDA, and the local ODM. AIX updates both these timestamps whenever a change is
made to the volume group. When PowerHA is going to varyon a volume group, it compares
the copy of the time stamp in the local ODM with that in the VGDA. If the values differ,
PowerHA will cause the local ODM information on the volume group to be refreshed from the
information in the VGDA by executing
If a volume group under PowerHA control is updated directly (that is, without going through
C-SPOC), information of other nodes on that volume group will be updated when PowerHA
brings the volume group online on those nodes, but not before. The actual operations
performed by PowerHA will depend on the state of the volume group at the time of activation.
Note: Use C-SPOC to make LVM changes rather than relying on lazy update. C-SPOC will
import these changes to all nodes at the time of the C-SPOC operation is executed unless
a node is powered off. Also consider using the C-SPOC CLI. See 7.4.6, “C-SPOC
command-line interface (CLI)” on page 291 for more information.
To use this feature, run smitty sysmirror, select Cluster Applications and Resources →
Resource Groups → Change/Show Resources and Attributes for a Resource Group.
Then, select resource group and set Automatically Import Volume Groups option to true as
shown in Figure 7-18.
This operation runs after you press Enter. It also automatically switches the setting back to
false. This prevents unwanted future imports until you specifically set the option again.
The following guidelines must be met for PowerHA to import available volume groups:
Logical volumes and file systems must have unique names cluster wide.
All physical disks must be known to AIX and have appropriate PVIDs assigned.
The physical disks on which the volume group resides are available to all of the nodes in
the resource group.
To increase the size of a shared LUN allocated to your cluster, use the following steps:
1. Verify that the volume group is active in concurrent mode on each node in the cluster.
2. Increase the size of the LUNs.
3. On each node, run the cfgmgr command. If you use vSCSI, run cfgmgr on each VIOS first.
Note: This step might not be required because both VIOS and AIX are good at
automatically detecting the change. However, doing this step is a good practice.
4. Verify that the disk size is what you want by running the bootinfo -s hdisk# command.
5. Run the chvg -g vgname command on only the node that has the volume group in full
active, read/write mode.
DVE example
In this scenario, we have two disks, hdisk6 and hdisk7, that are originally 30 GB each in size
as shown in Example 7-6. They are both members of the leevg volume group.
Demonstration: See the demonstration about DVE in an active PowerHA v7.1.3 cluster:
http://youtu.be/iUB7rUG1nkw
-------------------------------
NODE jordan
-------------------------------
30624
-------------------------------
NODE jessica
-------------------------------
30624
-------------------------------
NODE jessica
-------------------------------
30624
We begin with the cluster active on both nodes and the volume group is online in concurrent,
albeit active/passive, mode as shown in Example 7-7. We parsed out the other irrelevant
fields to more easily show the differences after the changes are made. Notice that the volume
group is in active full read/write mode on node jessica. Also, notice that the total volume group
size is approximately 122 GB.
-------------------------------
NODE jordan
-------------------------------
hdisk4 00f6f5d0166106fa leevg concurrent
hdisk5 00f6f5d0166114f3 leevg concurrent
hdisk6 00f6f5d029906df4 leevg concurrent
hdisk7 00f6f5d0596beebf leevg concurrent
-------------------------------
NODE jessica
-------------------------------
hdisk4 00f6f5d0166106fa leevg concurrent
hdisk5 00f6f5d0166114f3 leevg concurrent
hdisk6 00f6f5d029906df4 leevg concurrent
hdisk7 00f6f5d0596beebf leevg concurrent
-------------------------------
NODE jordan
-------------------------------
VOLUME GROUP: leevg VG IDENTIFIER: 0f6f5d000004c00000001466765fb16
VG STATE: active PP SIZE: 32 megabyte(s)
VG PERMISSION: passive-only TOTAL PPs: 3828 (122496 megabytes)
MAX LVs: 256 FREE PPs: 3762 (120384 megabytes)
Concurrent: Enhanced-Capable Auto-Concurrent: Disabled
VG Mode: Concurrent
-------------------------------
NODE jessica
-------------------------------
VOLUME GROUP: leevg VG IDENTIFIER: 0f6f5d000004c00000001466765fb16
VG STATE: active PP SIZE: 32 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 3828 (122496 megabytes)
MAX LVs: 256 FREE PPs: 3762 (120384 megabytes)
Concurrent: Enhanced-Capable Auto-Concurrent: Disabled
VG Mode: Concurrent
We provision more space onto the disks (LUNs) by adding 9 GB to hdisk6 and 7 GB to
hdisk7. Next, we run cfgmgr on both nodes. Then, we use bootinfo -s to verify that the new
sizes are being reported properly, as shown in Example 7-8.
-------------------------------
NODE jordan
-------------------------------
39936
-------------------------------
NODE jessica
-------------------------------
39936
-------------------------------
NODE jordan
-------------------------------
37888
-------------------------------
NODE jessica
-------------------------------
37888
Now we need to update the volume group to be aware of the new space. We do so by running
chvg -g leevg on node jessica, which has the volume group active. Then, we verify the
results of the new hdisk size and the new total space to the volume group as shown in
Example 7-9. Notice that hdisk6 is now reporting 39 GB, hdisk7 is 37 GB, and the total
volume group size is now 138 GB.
-------------------------------
NODE jordan
-------------------------------
VOLUME GROUP: leevg VG IDENTIFIER: 0f6f5d000004c00000001466765fb16
VG STATE: active PP SIZE: 32 megabyte(s)
VG PERMISSION: passive-only TOTAL PPs: 4340 (138880 megabytes)
MAX LVs: 256 FREE PPs: 4274 (136768 megabytes)
LVs: 2 USED PPs: 66 (2112 megabytes)
Concurrent: Enhanced-Capable Auto-Concurrent: Disabled
VG Mode: Concurrent
-------------------------------
NODE jessica
-------------------------------
VOLUME GROUP: leevg VG IDENTIFIER: 0f6f5d000004c00000001466765fb16
VG STATE: active PP SIZE: 32 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 4340 (138880 megabytes)
MAX LVs: 256 FREE PPs: 4274 (136768 megabytes)
LVs: 2 USED PPs: 66 (2112 megabytes)
Concurrent: Enhanced-Capable Auto-Concurrent: Disabled
VG Mode: Concurrent
To select the LVM C-SPOC menu for logical volume management, run smitty cspoc and then
select Storage. The following menu options are available:
Volume Groups:
– List All Volume Groups
– Create a Volume Group
– Create a Volume Group with Data Path Devices
– Set Characteristics of a Volume Group
– Import a Volume Group
– Mirror a Volume Group
– Unmirror a Volume Group
– Manage Critical Volume Groups
– Synchronize LVM Mirrors
– Synchronize a Volume Group Definition
– Remove a Volume Group
– Manage Mirror Pools for Volume Groups
Logical Volumes:
– List All Logical Volumes by Volume Group
– Add a Logical Volume
– Show Characteristics of a Logical Volume
– Set Characteristics of a Logical Volume
– Change a Logical Volume
– Remove a Logical Volume
File Systems:
– List All File Systems by Volume Group
– Add a File System
– Change / Show Characteristics of a File System
– Remove a File System
Physical Volumes
– Remove a Disk From the Cluster
– Cluster Disk Replacement
– Cluster Data Path Device Management
– List all shared Physical Volumes
– Change/Show Characteristics of a Physical Volume
– Rename a Physical Volume
– Show UUID for a Physical Volume
– Manage Mirror Pools for Volume Groups
For more details about the specific tasks, see 7.4.5, “Examples” on page 274.
7.4.5 Examples
In this section, we present some scenarios of C-SPOC storage options to administer your
cluster. We show the following examples:
Adding a scalable enhanced concurrent volume group to the existing cluster.
Adding a concurrent volume group and new concurrent resource group to existing cluster.
Creating a new logical volume.
Creating a new jfs2log logical volume.
Creating a new file system.
Extending a file system for volume groups using cross-site LVM.
Increasing the size of a file system.
Mirroring a logical volume.
In our examples, we used (2) two-node clusters based on VIO clients, one Ethernet network
using IPAT through aliasing, and of course a shared repository disk, along with other shared
volumes. The storage is Storwize V7000 presented through VIO servers. Figure 7-20 shows
our test cluster setup.
Before creating a shared VG for the cluster using C-SPOC, we check that the following
conditions are true:
All disk devices are properly configured and in the available state on all cluster nodes.
Disks have a PVID.
It was generally a recommended and common historical practice to manually add PVIDs onto
the disks. However, PowerHA now can determine shared disks by UUID and will automatically
create the PVIDS that will be shown on the pick list of disks to chose from.
We add the enhanced concurrent capable volume group by using these steps:
1. Run smitty cspoc and then select Storage → Volume Groups → Create a Volume
Group.
2. Press F7, select nodes, and press Enter.
3. Press F7, select disk or disks, and press Enter.
4. Select a volume group type from the pick list.
As a result of the volume group type that we chose, we create a scalable volume group as
shown in Example 7-10 on page 276. From here, if we also want to add this new volume
group to a resource group, we can either select an existing resource group from the pick
list or we can create a new resource group. In this example we are adding to an existing
resource group.
Important: When you choose to create a new resource group from the C-SPOC
Logical Volume Management menu, the resource group will be created with the
following default policies. After the group is created, you may change the policies in the
Resource Group Configuration:
Startup: Online On Home Node Only
Fallover: Fallover To Next Priority Node In The List
Fallback: Never Fallback
After completion the cluster must be synchronized for it to take affect as the output indicates
in Example 7-11.
[TOP]
harper: mkvg: This concurrent capable volume group must be varied on manually.
harper: leevg2
harper: synclvodm: No logical volumes in volume group leevg2.
harper: Volume group leevg2 has been updated.
athena: synclvodm: No logical volumes in volume group leevg2.
athena: 0516-783 importvg: This imported volume group is concurrent capable.
athena: Therefore, the volume group must be varied on manually.
athena: 0516-1804 chvg: The quorum change takes effect immediately.
athena: Volume group leevg2 has been imported.
cl_mkvg: The PowerHA SystemMirror configuration has been changed - Volume Group
leevg2 has been added. The configuration must be synchronized to make this chan
ge effective across the cluster
[MORE...3]
Before creating a shared volume group for the cluster using C-SPOC, we check that the
following conditions are true:
All disk devices are properly configured on all cluster nodes and the devices are listed as
available on all nodes.
Disks have a PVID.
We add the concurrent volume group and resource group by using these steps:
1. Run smitty cspoc and then select Storage → Volume Groups → Create a Volume
Group.
2. Press F7, select nodes, and then press Enter.
3. Press F7, select disks, and then press Enter.
4. Select a volume group type from the pick list.
As a result of the volume group type that we chose, we created a big, concurrent volume
group as displayed in Example 7-12
Example 7-12 Create a new concurrent volume group and concurrent resource group
Create a Scalable Volume Group
Warning:
Changing the volume group major number may result
in the command being unable to execute
successfully on a node that does not have the
major number currently available. Please check
for a commonly available major number on all nodes
before changing this setting.
Example 7-13 shows the output from the command we used to create this volume group and
resource group. The cluster must now be synchronized for the resource group changes to
take effect, however, the volume group information was imported to all cluster nodes selected
for the operation immediately upon creation.
Important: When creating a new concurrent resource group from the C-SPOC Concurrent
Logical Volume Management menu, the resource group will be created with the following
default policies:
Startup: Online On All Available Nodes
Fallover: Bring Offline (On Error Node Only)
Fallback: Never Fallback
If the cluster is active at the time of creation, the cluster synchronization performs a dynamic
reconfiguration event (DARE) and brings the newly created concurrent resource group online
automatically. If other resources, like an application controller for example, are desired for the
resource group, add those manually to the resource group prior to synchronizing the cluster.
[TOP]
jordan: mkvg: This concurrent capable volume group must be varied on manually.
jordan: leeconcrg
jordan: synclvodm: No logical volumes in volume group leeconcrg.
jordan: Volume group leeconcrg has been updated.
jessica: synclvodm: No logical volumes in volume group leeconcrg.
jessica: 0516-783 importvg: This imported volume group is concurrent capable.
jessica: Therefore, the volume group must be varied on manually.
jessica: 0516-1804 chvg: The quorum change takes effect immediately.
jessica: Volume group leeconcrg has been imported.
INFO: Following default policies are used for resource group during volume group creation.
You can change the policies using modify resource group policy option.
Startup Policy as 'Online On All Available Nodes'.
Fallover Policy as 'Bring Offline (On Error Node Only)'.
Fallback Policy as 'Never Fallback'.
cl_mkvg: The PowerHA SystemMirror configuration has been changed - Resource Group bdbconcrg
has been added. The configuration must be synchronized to make this change effective
across the cluster
We add jerryclv in the volume group named leevg by performing these steps:
1. Run smitty cspoc and then select Storage → Logical Volumes → Add a Logical
Volume.
2. Select the volume group leeconcvg from the pick list.
3. On the subsequent panel, select devices for allocation, as shown in Example 7-14.
Method Details []
Auth Method Name [] +
Serlialize I/O? no
The new logical volume, jerryclv, is created and information is propagated on the
other cluster nodes. Output from this execution is shown in Example 7-16.
jordan: jerryclv
WARNING: Encryption for volume group "leeconcvg" is enabled, but the logical volume
"jerryclv" is not encrypted.
To enable the encryption for logical volume,
You can run "clmgr modify lv jerryclv [...]" or
"use Change a Logical Volume from smitty cl_lv menu".
Note: LVM Encryption is NOT enabled by default for the logical volume, unlike when
creating the volume group.
Important: If a logical volume of type jfs2log is created, C-SPOC automatically runs the
logform command so that the volume can be used. Also, though file systems are not
allowed in “Online on All Available Nodes” resource groups C-SPOC does allow its creation
as it is just a logical volume. However it would never be used in that case.
Important: File systems are not allowed on volume groups that are a resource in an
“Online on All Available Nodes” type resource group.
Figure 7-21 C-SPOC creating JFS2 file system on an existing Logical Volume
Note: This JFS2 file system was created using an inline log which is the generally
recommended best practice.
2. Choose a volume group from the pop-up list (leevg, in our case).
3. Choose the type of file system (Enhanced, Standard, Compressed, or Large File Enabled).
Select the previously created logical volume, jerryclv, from the pick list. Complete the
necessary fields,
Example 7-17 C-SPOC creating JFS2 file system on an existing Logical Volume results
COMMAND STATUS
The /jerrycfs file system is now created. The contents of /etc/filesystems is updated on
both nodes. If the resource group and volume group are online, the file system is mounted
automatically after creation. It also automatically enables mountguard on the file system as
shown in Example 7-17 on page 281.
[Entry Fields]
Volume Group Name leevg
Resource Group Name xsitelvmRG
* LOGICAL VOLUME name xsitelv1
Reference node
* NEW TOTAL number of logical partition 2 +
copies
PHYSICAL VOLUME names
POSITION on physical volume outer_middle +
RANGE of physical volumes minimum +
MAXIMUM NUMBER of PHYSICAL VOLUMES [] #
to use for allocation
Allocate each logical partition copy yes +
on a SEPARATE physical volume?
SYNCHRONIZE the data in the new no +
logical partition copies?
[Entry Fields]
* VOLUME GROUP name leevg
Resource Group Name xsitelvmRG
Node List jessica,jordan
Reference node
PHYSICAL VOLUME names
[Entry Fields]
VOLUME GROUP name leevg
Resource Group Name xsitelvmRG
* Node List jessica,jordan
[Entry Fields]
Volume Group Name leevg
Resource Group Name xsitelvmRG
* LOGICAL VOLUME name xsitelv1
Reference node
[Entry Fields]
VOLUME GROUP name leevg
Resource Group Name xsitelvmRG
Node List jessica,jordan
Reference node jessica
PHYSICAL VOLUME names hdisk3
Important: Always add more space to a file system by adding more space to the logical
volume first. Never add the extra space to the JFS first when using cross-site LVM
mirroring because the mirroring might not be maintained properly.
Similar to creating a logical volume, be sure to allocate the extra space properly to maintain
the mirrored copies at each site. To add mores space, complete the following steps:
1. Run smitty cl_lvsc, select Increase the Size of a Shared Logical Volume, and press
Enter.
2. Choose a volume group and resource group from the pop-up list (Figure 7-24 on
page 286).
+--------------------------------------------------------------------------+
| Select the Volume Group that holds the Logical Volume to Extend |
| |
| Move cursor to desired item and press Enter. |
| |
| #Volume Group Resource Group Node List |
| leevg xsitelvmRG jordan,jessica |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
3. Then choose the logical volume from the next pop-up list (Example 7-21). A list of disks is
displayed that belong to the same volume group as the logical volume previously chosen.
The list is similar to the list displayed when you create a new logical volume. Press F7,
choose the disks and press Enter
Example 7-21 Logical volume pop-up list selection
Set Characteristics of a Logical Volume
+--------------------------------------------------------------------------+
| Select the Logical Volume to Extend |
| |
| Move cursor to desired item and press Enter. Use arrow keys to scroll. |
| |
| #leevg: |
| # LV NAME TYPE LPs PPs PVs LV STATE MO |
| xsitelv1 jfs2 25 50 2 closed/syncd N/ |
| |
| F1=Help F2=Refresh F3=Cancel |
| F8=Image F10=Exit Enter=Do |
F1| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Important: Do not use the Auto-select option, which is at the top of the pop-up list.
4. After selecting the target disks, the final menu opens (shown in Figure 7-25). Set the
following options:
– RANGE of physical volumes: minimum
– Allocate each logical partition copy on a SEPARATE physical volume: superstrict
This is already set correctly if the logical volume was originally created correctly.
5. After adding extra space, verify that the partition mapping is correct by running the lslv -m
lvname command again, as shown Example 7-22. Highlighted in bold are the two new
partitions just added to the logical volume.
Figure 7-26 Increase the size of Shared Enhanced Journaled File System
9. Ensure that the size of the file system matches the size of the logical volume. If you are
unsure, use the lsfs -q mountpoint command as shown in Example 7-23.
[Entry Fields]
Volume group name concdbvg
Resource Group Name newconcRG
* File system name /jerrycfs
* Node Names jordan,jessica
Remove Mount Point yes
Upon completion the file system is removed from all nodes in the cluster. Since the volume
group is a resource in the resource group, and the volume group still exists, there is no need
to synchronize the cluster as technically the resources have not changed. However if
removing the last one from the volume group then it may be desirable to delete the volume
group which can be found at “Removing a volume group” on page 290.
To remove a shared logical volume with C-SPOC, complete the following steps:
1. Run smitty cspoc and then select Storage → Logical Volumes → Remove a Logical
Volume.
2. Select the volume group from the pick list.
3. Select the logical volume from the pick list.
4. Confirm the options selected on the next SMIT panel, shown in Figure 7-28, and then
press Enter.
[Entry Fields]
Volume Group Name leeconcvg
Resource Group Name bdbconcrg
* LOGICAL VOLUME name jerryclv
Upon completion the logical volume is removed from all nodes in the cluster. Since the
volume group is a resource in the resource group, and the volume group still exists, there is
no need to synchronize the cluster as technically the resources have not changed. However if
removing the last one from the volume group then it may be desirable to delete the volume
group which can be found at “Removing a volume group” on page 290.
Note: If the volume group is currently a resource in a resource group it is NOT required
to remove it from the resource group first as this procedure will do so automatically at
the end.
4. Run smitty cspoc and then select Storage → Volume Groups → Remove a Volume
Group.
5. Choose desired volume group from pop-up pick list and press Enter.
6. Press Enter again at the, ARE YOU SURE, confirmation dialogue screen. Once
completed a confirmation dialogue screen is displayed as shown in Figure 7-29 on
page 291. Notice that a cluster synchronization is required if the volume group was
removed from a resource group.
COMMAND STATUS
The CLI is oriented for root users who need to run certain tasks with shell scripts rather than
through a SMIT menu. The C-SPOC CLI commands are located in the
/usr/es/sbin/cluster/cspoc directory and they all have a name with the cli_ prefix.
Similar to the C-SPOC SMIT menus, the CLI commands log their operations in the cspoc.log
file on the node where the CLI command was run.
A list of the commands is shown in Example 7-24. Although the names are descriptive
regarding what function each one offers, full descriptions of each command is available in
PowerHA SystemMirror Commands.
The PowerHA cluster stores the information about all cluster resources and cluster topology,
and also several other parameters in PowerHA that are specific object classes in the ODM.
PowerHA ODM files must be consistent across all cluster nodes so cluster behavior will work
as designed. Cluster verification checks the consistency of PowerHA ODM files across all
nodes and also verifies if PowerHA ODM information is consistent with required AIX ODM
information. If verification is successful, the cluster configuration can be synchronized across
all the nodes. Synchronization is effective immediately in an active cluster. Cluster
synchronization copies the PowerHA ODM from the local nodes to all remote nodes.
Note: If the cluster is not synchronized and failure of a cluster topology or resource
component occurs, the cluster may be unable to fallover as designed. PowerHA provides
the capability to run cluster verification on a regular/daily basis. More information on this
feature can be found in 7.6.6, “Running automatic corrective actions during verification” on
page 302.
When you use the path, Cluster Nodes and Networks, synchronization will take place
automatically following successful verification of the cluster configuration. There are no
additional options in this menu. This option does NOT utilize the feature that will automatically
correct errors found during verification. For information about automatically correcting errors
that are found during verification, see 7.6.6, “Running automatic corrective actions during
verification” on page 302.
The Custom Cluster Configuration Verification and Synchronization path parameters depend
on the cluster services state. If any node is active in the cluster it is considered as an active
cluster. Only if ALL nodes are inactive is it considered as an inactive cluster.
Figure 7-30 shows the SMIT panel that is displayed when cluster services are active.
Performing a synchronization in an active cluster is also called Dynamic Reconfiguration
(DARE).
[Entry Fields]
* Verify changes only? [No] +
* Logging [Standard] +
In an active cluster, the SMIT panel parameters are as follows and also shown in Figure 7-30:
Verify changes only:
– Select No to run the full check of topology and resources
– Select Yes to verify only the changes made to the cluster configuration (PowerHA
ODM) since the last verification.
Logging:
– Select Verbose to send full output to the console, which otherwise is directed to
the clverify.log file.
Figure 7-31 on page 294 shows the SMIT panel that is displayed when cluster services
are not active. Change the field parameters (the default option is both verification and
synchronization, as shown in the example following) and press Enter.
[Entry Fields]
* Verify, Synchronize or Both [Both] +
* Include custom verification library checks [Yes] +
* Automatically correct errors found during [No] +
verification?
Logging:
– Select Verbose to send full output to the console, which is otherwise directed to the
clverify.log file.
Note: Synchronization can be initiated on either an active or inactive cluster. If some nodes
in the cluster are inactive, synchronization can be initiated only from an active node, using
DARE (Dynamic Reconfiguration). For more information about DARE, see 7.6.3, “Dynamic
cluster reconfiguration with DARE” on page 298.
If you are using the Problem Determination Tools path, you have more options for verification,
such as defining custom verification methods. However, synchronizing the cluster from here is
not possible. The SMIT panel of the Problem Determination Tools verification path is shown in
Figure 7-32.
Note: Verification, by using the Problem Determination Tools path, can be initiated either
from active or inactive nodes.
If verification fails, correct the errors and repeat verification to ensure that the problems are
resolved as soon as possible. The messages that are output from verification indicate where
the error occurred (for example, on a node, a device, or a command). In 7.6.5, “Verification log
files” on page 301, we describe the location and purpose of the verification logs.
Verify cluster
A simple verification can be executed from any node in the cluster by executing clmgr verify
cluster. Output from its execution, even with an existing problem, is shown in Example 7-25.
Retrieving data from available cluster nodes. This could take a few minutes.
Verifying XD Solutions...
Synchronize cluster
Anytime a cluster change has been made it isoften necessary to synchronize the cluster. A
cluster verification will report if there are any existing cluster problems. Also a check can be
performed for any unsynced changes exist by executing clmgr -a UNSYNCED_CHANGES query
cluster as shown in Example 7-26 on page 297. The possible settings are true or false and
are self explanatory. This can be executed on any node and the cluster and should yield the
same results regardless of which node it is executed.
Performing a cluster verification will often point to which node problem exists on and will infer
that syncing from the opposite node is needed. To perform a cluster synchronization from the
command line simply execute clmgr sync cluster as shown in Example 7-27. Then we verify
again that no unsynced changes exist.
Retrieving data from available cluster nodes. This could take a few minutes.
Verifying XD Solutions...
Considerations:
Be aware that when the cluster synchronization (DARE) takes place, action will be
taken on any resource or topology component that is to be changed or removed,
immediately.
Not supported is running a DARE operation on a cluster that has nodes running at
different versions of the PowerHA code, for example, during a cluster migration.
You cannot perform a DARE operation while any node in the cluster is in the
unmanaged state.
The following changes can be made to topology in an active cluster using DARE:
Add or remove nodes.
Add or remove network interfaces.
The following changes can be made to resources in an active cluster using DARE:
Add, remove, or change an application server.
Add, remove, or change application monitoring.
Add or remove the contents of one or more resource groups.
Add, remove, or change a tape resource.
Add or remove resource groups.
Add, remove, or change the order of participating nodes in a resource group.
Change the node relationship of the resource group.
Change resource group processing order.
Add, remove, or change the fallback timer policy associated with a resource. group. The
new fallback timer will not have any effect until the resource group is brought online on
another node.
Add, remove, or change the settling time for resource groups.
Add or remove the node distribution policy for resource groups.
Add, change, or remove parent/child or location dependencies for resource groups (some
limitations apply here).
Add, change, or remove inter-site management policy for resource groups.
Add, remove, or change pre-events or post-events.
The dynamic reconfiguration can be initiated only from an active cluster node, which means,
from a node that has cluster services running. The change must be made from a node that is
active so the cluster can be synchronized.
Before making changes to a cluster definition, ensure that these items are true:
The same version of PowerHA is installed on all nodes.
Some nodes are up and running PowerHA and they are able to communicate with each
other. No node should be in an UNMANAGED state.
The cluster is stable and the hacmp.out log file does not contain recent event errors or
config_too_long events.
Depending on the cluster configuration and on the specific changes desired to make in an
active cluster environment, there are many options and limitations when performing a
dynamic reconfiguration event. These must all be understand including the consequences of
changing an active cluster configuration. Be sure to read the Administering PowerHA
SystemMirror guide for further details before making dynamic changes in an active PowerHA
environment.
Important: If switching to multicast the physical network must already have multicast
enabled to allow its use, otherwise this change could result in undesired results including
an outage.
1. Verify that the existing CAA communication mode is set to multicast as shown in
Example 7-28.
[Entry Fields]
* Cluster Name xsite_cluster
* Heartbeat Mechanism Unicast +
Repository Disk 00f6f5d015a4310b
Cluster Multicast Address 228.168.100.51
(Used only for multicast heartbeat)
4. Check that the new CAA communication mode is now set to unicast, as shown in
Example 7-30.
Example 7-31 shows the /var/hacmp/clverify directory contents with verification log files.
On the local node, where you initiate the cluster verification command, detailed information is
collected in the log files, which contain a record of all data collected, the tasks performed, and
any errors. These log files are written to the following directories and are used by a service
technician to determine the location of errors:
If verification succeeds: /var/hacmp/clverify/pass/nodename/
If verification fails: /var/hacmp/clverify/fail/nodename/
Notes:
To be able to run, verification requires 4 MB of free space per node in the /var file
system. Typically, the /var/hacmp/clverify/clverify.log files require an extra
1 - 2 MB of disk space. At least 42 MB of free space is suggested for a four-node
cluster.
The default log file location for most PowerHA log files is now /var/hacmp, however there
are some exceptions. For more details, see the PowerHA Administration Guide.
The automatic corrective action feature can correct only some types of errors, which are
detected during the cluster verification. The following errors can be addressed:
PowerHA shared volume group time stamps are outdated on a node.
The /etc/hosts file on a node does not contain all PowerHA-managed IP addresses.
A file system is not created on a node, although disks are available.
A file systems auto mount is enabled.
Disks are available, but the volume group has not been imported to a node.
Shared volume groups configured as part of an PowerHA resource group have their
automatic varyon attribute set to Yes.
Required /etc/services entries are missing on a node.
Required PowerHA snmpd entries are missing on a node.
Required PowerHA network options setting.
Corrective actions when using IPv6.
With no prompting:
Correct error conditions that appear in /etc/hosts.
Correct error conditions that appear in /usr/es/sbin/cluster/etc/clhosts.client.
Update /etc/services with missing entries.
Update /etc/snmpd.peers and /etc/snmp.conf files with missing entries.
With prompting:
Update auto-varyon on this volume group.
Update volume group definitions for this volume group.
Keep PowerHA volume group timestamps in sync with the VGDA.
Auto-import volume groups.
Reimport volume groups with missing file systems and mount points.
File system automount flag is set in /etc/filesystems.
Set network option.
Set inoperative cluster nodes interfaces to the boot time interfaces.
For the clmgr CLI interface, the option for auto-corrective actions is an attribute simply known
as FIX. The options available for use with this attribute vary based on what exact action is
being invoked as shown in Example 7-32.
During automatic verification PowerHA will detect and, if on startup utilizing the auto-correct
option, correct several common configuration issues. This automatic behavior ensures that if
you did not manually verify and synchronize a node in your cluster before starting cluster
services, PowerHA will do so.
Using the SMIT menus, you can set the parameters for the periodic automatic cluster
verification checking utility, by running smitty sysmirror and then selecting Problem
Determination Tools → PowerHA SystemMirror Verification → Automatic Cluster
Configuration Monitoring.
Figure 7-33 on page 304 shows the SMIT panel for Automatic Cluster Configuration
Monitoring parameters setting, for running smitty clautover.dialog.
[Entry Fields]
* Automatic cluster configuration verification Enabled +
Node name Default +
* HOUR (00 - 23) [00] +
Debug yes +
You can check the verification result of automatic cluster verification in the autoverify.log file
located in the default /var/hacmp/log directory. For more about general verification log files,
see 7.6.5, “Verification log files” on page 301.
As a result, it is possible that a component in the cluster has failed and that you are unaware
of the fact. The danger here is that, while PowerHA can survive one or possibly several
failures, each failure that escapes your notice threatens the cluster’s ability to provide a highly
available environment, as the redundancy of cluster components is diminished.
To avoid this situation, we suggest that you regularly check and monitor the cluster. PowerHA
offers various utilities to help you with cluster monitoring and other items:
Automatic cluster verification. See 7.6.7, “Automatic cluster verification” on page 303.
Cluster status checking utilities
Resource group information commands
Topology information commands
Log files
Error notification methods
Application monitoring
Measuring application availability
Monitoring clusters from the enterprise system administration and monitoring tools
You can use either ASCII SMIT, PowerHA SMUI, or the clmgr command line to configure and
manage cluster environments.
SMUI
The PowerHA SystemMirror User Interface (SMUI) provides a single pane of glass to monitor
the status of multiple clusters in the enterprise. Additional demos including installing and
using the PowerHA SMUI can be found at https://www.youtube.com/PowerHAguy
This utility requires the clinfoES subsystem to be active on nodes where the clstat command
is initiated.
The clstat command is supported in two modes: ASCII mode and X Window mode. ASCII
mode can run on any physical or virtual ASCII terminal, including xterm or aixterm windows. If
the cluster node runs graphical X Window mode, clstat displays the output in a graphical
window. Before running the command, ensure that the DISPLAY variable is exported to the X
server and that X client access is allowed.
Consider the following information about the clstat command in the figure:
clstat -a runs the program in ASCII mode.
clstat -o runs the program once in ASCII mode and exits (useful for capturing output
from a shell script or cron job).
clstat -s displays service labels that are both up and down, otherwise it displays only
service labels, which are active.
Example 7-33 shows the clstat -o command output from our test cluster.
The cldump command does not have any arguments, so you simply run cldump from the
command line.
Tip: Common issues with possible resolutions are available in the help output of both
clstat and cldump via the -h flag of each.
Access permission
Check for access permission to the PowerHA portion of the SNMP Management Information
Base (MIB) in the SNMP configuration file:
1. Find the defaultView entries in the /etc/snmpdv3.conf file, shown in Example 7-34 on
page 307.
Beginning with AIX 7.1, as a security precaution, the snmpdv3.conf file is included with the
Internet access commented out (#). The preceding example shows the unmodified
configuration file; the Internet descriptor is commented out, which means that there is no
access to most of the MIB, including the PowerHA information. Other included entries
provide access to other limited parts of the MIB. By default, in AIX 7.1 and later, the
PowerHA SNMP-based status commands do not work, unless you edit the snmpdv3.conf
file. The two ways to provide access to the PowerHA MIB are by modifying the
snmpdv3.conf file as follows:
– Uncomment (remove the number sign, #, from) the following Internet line, which will
give you access to the entire MIB:
VACM_VIEW defaultView internet - included -
– If you do not want to provide access to the entire MIB, add the following line, which
gives you access to only the PowerHA MIB:
VACM_VIEW defaultView risc6000clsmuxpd - included -
2. After editing the SNMP configuration file, stop and restart snmpd, and then refresh the
cluster manager, by using the following commands:
stopsrc -s snmpd
startsrc -s snmpd
refresh -s clstrmgrES
3. Test the SNMP-based status commands again. If the commands work, you do not need to
go through the rest of the section.
IPv6 entries
If you use PowerHA SystemMirror 7.1.2 or later, check for the correct IPv6 entries in the
configuration files for clinfoES and snmpd. In PowerHA 7.1.2, an entry is added to the
/usr/es/sbin/cluster/etc/clhosts file to support IPv6. However, the required corresponding
entry is not added to the /etc/snmpdv3.conf file. This causes intermittent problems with the
clstat command.
b. Try the SNMP-based status commands again. If the commands work, you do not need
to go through the remainder of this section.
If you plan to use IPv6 in the future:
a. Add the following line to the /snmpdv3.conf file:
COMMUNITY public public noAuthNoPriv :: 0 -
b. If you are using a different community (other than public), substitute the name of that
community for the word public.
c. After editing the SNMP configuration file, stop and restart snmpd, and then refresh the
cluster manager, by using the following commands:
stopsrc -s snmpd
startsrc -s snmpd
refresh -s clstrmgrES
d. Try the SNMP-based status commands again.
Tip: Information on how to customize SNMP from the default public community can be
found at:
https://www.ibm.com/support/pages/node/6416107
Note: After PowerHA is installed and configured, the Cluster Manager daemon
(clstrmgrES) starts automatically at boot time. The Cluster Manager must be running
before any cluster services can start on a node. Because the clstrmgrES daemon is now a
long-running process, you cannot use the lssrc -s clstrmgrES command to determine
the state of the cluster. Use the following commands to check the clstrmgr state.
lssrc -ls clstrmgrES
Example 7-35 shows output of the clshowres command from our test cluster, when cluster
services are running.
You can also view the clshowsrv -v output through the SMIT menus by running smitty
sysmirror and then selecting System Management (C-SPOC) → PowerHA SystemMirror
Services → Show Cluster Services.
Enterprise monitoring solutions such as IBM Tivoli monitoring are often complex, have cost
implications, and might not provide the information you require in a format you require. An
effective solution is to write your own custom monitoring scripts tailored for your environment.
The following tool is publicly available but are not included with the PowerHA software:
Query HA (qha)
Note: Custom examples of qha and other tools are in the Guide to IBM PowerHA
SystemMirror for AIX Version 7.1.3, SG24-8167:
Query HA (qha)
Query HA tool, qha, was created in approximately 2001 and was updated and continues to
work on levels up to version 7.2.7 (at the time of writing this book). It primarily provides an
in-cluster status view, which is not reliant on the SNMP protocol and clinfo infrastructure.
Query HA can also be easily customized.
Rather than reporting about whether the cluster is running or unstable, the focus is on the
internal status of the cluster manager. Although not officially documented, see Chapter 6,
“Cluster maintenance” on page 191 for a list of the internal clstrmgr states. This status
information helps you understand what is happening within the cluster, especially during
event processing (cluster changes such as start, stop, resource groups moves, application
failures, and more). When viewed next to other information, such as the running event, the
resource group status, online network interfaces, and the varied on volume groups, it provides
an excellent overall status view of the cluster. It also helps with problem determination as to
understanding PowerHA event flow during, for example, node_up or fallover events and when
searching through cluster and hacmp.out files.
Note: The qha tool is usually available for download from the following site:
https://www.cleartechnologies.net/qha
Example 7-36 shows sample status output from qha -nevmc command.
You can also use SMIT menus to display various formats of the topology information:
Display by cluster as shown in Example 7-37:
To display the same output as shown from the default cltopinfo command, run smitty
sysmirror and then select Cluster Nodes and Networks → Manage the Cluster →
Display PowerHA SystemMirror Configuration
NODE jordan:
Network net_ether_01
hasvc 10.2.30.183
jordan 10.2.30.83
NODE jessica:
Network net_ether_01
hasvc 10.2.30.183
jessica 10.2.30.84
NODE jessica:
Network net_ether_01
hasvc 10.2.30.183
jessica 10.2.30.84
----------------------------------------------------------------------------
clRGinfo -p Displays the node that temporarily has the highest priority for this
instance.
clRGinfo -m Displays the status of application monitors in the cluster.
clRGinfo -i Displays any administrator directed online or offline operations.
If cluster services are not running on the local node, the command determines a node where
the cluster services are active and obtains the resource group information from the active
cluster manager.
The default locations of log files are used in this section. If you redirected any logs, check the
appropriate location.
/usr/es/sbin/cluster/ui/agent/logs/agent_deploy.log
The agent_deploy.log file contains information about the deployment configuration of the
SMUI agent on the local node.
/usr/es/sbin/cluster/ui/agent/logs/uiagent.log
The uiagent.log file contains information about the startup log of the agent on that node.
/usr/es/sbin/cluster/ui/server/logs/smui-server.log
The smui-server.log file contains information about the PowerHA SystemMirror GUI
server.
/usr/es/sbin/cluster/ui/server/logs/uiserver.log
The uiserver.log file contains information about the startup log of the SMUI server on that
node.
/var/hacmp/adm/cluster.log
The cluster.log file is the main PowerHA log file. PowerHA error messages and messages
about events related to PowerHA are appended to this log with the time and date at which
they occurred.
/var/hacmp/adm/history/cluster.mmddyyyy
The cluster.mmddyyyy file contains time-stamped, formatted messages that are
generated by PowerHA scripts. The system creates a cluster history file whenever cluster
events occur, identifying each file by the file name extension mmddyyyy, (where mm indicates
the month, dd indicates the day, and yyyy indicates the year).
/var/log/clcomd/clcomd.log clcomd.log.n (n indicates a number 1-6)
The clcomd.log file contains time-stamped, formatted messages generated by the CAA
communication daemon. This log file contains an entry for every connect request made to
another node and the return status of the request.
/var/log/clcomd/clcomddiag.log (n indicates a number 1-6)
The clcomddiag.log file contains time-stamped, formatted messages that are generated
by the CAA communication daemon when tracing is enabled. This log file is typically used
by IBM support personnel for troubleshooting.
/var/hacmp/clverify/clverify.log clverify.log.n (n indicates a number 1-5)
The clverify.log file contains verbose messages, output during verification. Cluster
verification consists of a series of checks performed against various PowerHA
configurations. Each check attempts to detect either a cluster consistency issue or an
error. The verification messages follow a common, standardized format, where feasible,
indicating such information as the nodes, devices, and command in which the error
occurred.
/var/hacmp/clverify/ver_collect_dlpar.log ver_collect_dlpar.log.n (n indicates a number
1-10)
The ver_collect_dlpar.log file contains information gathered for and from ROHA/DLPAR
operations and is generated from the /usr/es/sbin/cluster/diag/ver_collect_dlpar script.
/var/hacmp/clverify/ver_odmclean_dlpar.log ver_odmclean_dlpar.log.n (n indicates a
number 1-10)
The ver_odmclean_dlpar.log file contains output from a data collection cleaning request
from the execution of /usr/es/sbin/cluster/diag/ver_collect_dlpar -c.
/var/hacmp/availability/clavailability.log
The clavailability.log contains detailed information of statistics used by availability metrics
tool.
/var/hacmp/log/autoverify.log
The autoverify.log file contains logging for auto-verify and auto-synchronize.
/var/hacmp/log/async_release.log
The async_release.log is created from the asynchronous process of a DLPAR operation
during an acquire or release resource group event.
/var/hacmp/log/clavan.log
The clavan.log file keeps track of when each application that is managed by PowerHA is
started or stopped and when the node stops on which an application is running. By
collecting the records in the clavan.log file from every node in the cluster, a utility program
can determine how long each application has been up, and also compute other statistics
describing application availability time.
/var/hacmp/log/cl_event_summaries.txt
The cl_event_summaries.txt contains event summaries pulled from the hacmp.out log file
when the logs are cycled via clcycle cronjob. However it is not automatically truncated and
can grow to be quite large over time. It can be cleared by executing smitty cm_dsp_evs →
Delete Event Summary History. It may be desirable to save a copy of it first and can also
be done on the same initial menu.
/var/hacmp/log/clconfigassist.log
The clconfigassist.log contains detailed information generated by the Two-Node Cluster
Configuration Assistant.
/var/hacmp/log/cl2siteconfig_assist.log
The cl2siteconfig_assist.log contains detailed information generated by the Two-Site
Cluster Configuration Assistant.
/var/hacmp/log/clinfo.log clinfo.log.n (n indicates a number 1-7)
The clinfo.log file is typically installed on both client and server systems. Client systems do
not have the infrastructure to support log file cycling or redirection. The clinfo.log file
records the activity of the clinfo daemon.
/var/hacmp/log/cl_testtool.log
The testtool.log file stores output from the test when you run the Cluster Test Tool from
SMIT, which also displays the status messages.
/var/hacmp/log/cloudroha.log
This log contains any cloud related authentication issues. It also has data on cloud
operations such as query, acquires, or release performed on logical partitions (LPARs) of
Power Systems Virtual Servers.
/var/hacmp/log/clpasswd.log
This contains debug information from the /usr/es/sbin/cluster/utilities/cl_chpasswdutil
utility.
/var/hacmp/log/clstrmgr.debug clstrmgr.debug.n (n indicates a number 1 - 7)
The clstrmgr.debug log file contains time-stamped, formatted messages generated by
Cluster Manager activity. This file is typically used only by IBM support personnel.
/var/hacmp/log/clstrmgr.debug.long clstrmgr.debug.long.n (n indicates a number 1 - 7)
The clstrmgr.debug.long file contains high-level logging of cluster manager activity, in
particular its interaction with other components of PowerHA and with RSCT, which event is
currently being run, and information about resource groups (for example, state and actions
to be performed, such as acquiring or releasing them during an event).
/var/hacmp/log/clutils.log
The clutils.log file contains the results of the automatic verification that runs on one
user-selectable PowerHA cluster node once every 24 hours.
When cluster verification completes on the selected cluster node, this node notifies the
other cluster nodes with the following information:
– The name of the node where verification had been run.
– The date and time of the last verification.
– Results of the verification.
The clutils.log file also contains messages about any errors found and actions taken by
PowerHA for the following utilities:
– The PowerHA File Collections utility
– The Two-Node Cluster Configuration Assistant
– The Cluster Test Tool
– The OLPW conversion tool
/var/hacmp/log/cspoc.log
The cspoc.log file contains logging of the execution of C-SPOC commands on the local
node.
/var/hacmp/log/cspoc.log.long
The cspoc.log file contains logging of the execution of C-SPOC commands with verbose
logging enabled.
/var/hacmp/log/cspoc.log.remote
The cspoc.log.remote file contains logging of the execution of C-SPOC commands on
remote nodes with ksh option xtrace enabled (set -x). To enable this logging, you must set
the following environment variable on the local node where the C-SPOC operation is being
run:
VERBOSE_LOGGING_REMOTE=high
This creates a log file on the remote node named cspoc.log.remote and will contain set -x
output from the operations run there. This file is useful in debugging failed LVM operations
on the remote node.
/var/hacmp/log/hacmp.out hacmp.out.n (n indicates a number 1 - 7)
The hacmp.out file records the output generated by the event scripts as they run. This
information supplements and expands upon the information in the
/var/hacmp/adm/cluster.log file. To receive verbose output, set the debug level runtime
parameter to high (the default).
/var/hacmp/log/loganalyzer/loganalyzer.log
This is the output log file from the clanalyze tool.
/var/hacmp/log/migration.log
The migration.log file contains a high level of logging of cluster activity while the cluster
manager on the local node operates in a migration state.
/var/hacmp/log/clevents.log
The clevents.log file contains logging of IBM Systems Director interface.
/var/hacmp/log/clver_collect_gmvg_data.log
This log is only used with GLVM configurations to gather disk and gmvg data.
/var/hacmp/log/dnssa.log
The dnssa.log file contains logging of the Smart Assist for DNS.
/var/hacmp/log/dhcpsa.log
The dhcpsa.log file contains logging of the Smart Assist for DHCP.
/var/hacmp/log/domino_server.log
The domino_server.log file contains a logging of the Smart Assist for Domino server.
/var/hacmp/log/filenetsa.log.log
The domino_server.log file contains a logging of the Smart Assist for Filenet P8.
/var/hacmp/log/memory_statistics.log
This log contains output from the /usr/es/sbin/cluster/utilities/cl_memory_statistics utility to
collect the memory statistics and is invoked during cluster verification and
synchronization.
/var/hacmp/log/oraclesa.log
The oraclesa.log file contains logging of the Smart Assist for Oracle database facility.
/var/hacmp/log/oraappsa.log
The oraappsa.log file contains logging of the Smart Assist for Oracle application facility.
/var/hacmp/log/printServersa.log
The oraappsa.log file contains logging of the Smart Assist for Print Subsystem facility.
/var/hacmp/log/sa.log
The sa.log file contains logging generated by application discovery of Smart Assist.
/var/hacmp/log/sapsa.log
The sa.log file contains logging generated by Smart Assist for SAP Netweaver.
/var/hacmp/log/ihssa.log
The ihssa.log file contains logging of the Smart Assist for IBM HTTP Server.
/var/hacmp/log/sax.log
The sax.log file contains logging of the IBM Systems Director Smart Assist facility.
/var/hacmp/log/tsm_admin.log
The tsm_admin.log file contains logging of the Smart Assist for Tivoli Storage Manager
admin center.
/var/hacmp/log/tsm_client.log
The tsm_client.log file contains logging of the Smart Assist for Tivoli Storage Manager
client.
/var/hacmp/log/tsm_server.log
The tsm_server.log file contains logging of the Smart Assist for Tivoli Storage Manager
server.
/var/hacmp/log/emuhacmp.log
This is a legacy log for event emulation. However though the log is still created upon
install, the event emulation capability has been long deprecated.
/var/hacmp/log/maxdbsa.log
The maxdbsa.log file contains logging of the Smart Assist for MaxDB.
/var/hacmp/log/hswizard.log
The hswizard.log file contains logging of the Smart Assist for SAP LiveCache Hot Standby.
/var/hacmp/log/wmqsa.log
The wmqsa.log file contains logging of the Smart Assist for MQ Series.
/tmp/clconvert.log
This file contains a record of the conversion progress when upgrading PowerHA and is
created by the cl_convert utility.
For example, for a four-node cluster, you need the following amount of space in the /var
file system:
2 + (4x4) + 20 + (4x1) = 42 MB
Some additional log files that gather debug data might require further additional space in the
/var file system. This depends on other factors such as number of shared volume groups and
file system. Cluster verification will issue a warning if not enough space is allocated to the
/var file system. As of time of writing with version 7.2.7 the log verification check is as shown
in Example 7-43.
The SMIT fast path is smitty clusterlog_redir.select. The default log directory is changed
for all nodes in cluster. The cluster should be synchronized after changing the log parameters.
Note: We suggest using only local file systems if changing default log locations rather than
shared or NFS file systems. Having logs on shared or NFS file systems can cause
problems if the file system needs to unmount during a fallover event. Redirecting logs to
shared or NFS file systems can also prevent cluster services from starting during node
reintegration.
– Last error.
– All errors.
Progress indicator for long running analysis:
• Analysis can take some time if there is a lot of data.
• Analysis process writes progress information to a file, progress indicator process
reads and displays it.
• Granularity is limited but achieves the goal of demonstrating that the process is not
hung. The progress indicator message looks like:
49% analysis is completed. 150sec elapsed.
Sorted report for time line comparison.
The tool will also provide recommendations wherever possible. There are options to analyze
both a live cluster or stored log files. Additional information can be found at:
http://www.ibm.com/docs/en/powerha-aix/7.2?topic=commands-clanalyze-command
Examples
The following example shows the output from clanalyze –a –p “Nodefailure”. It also shows
an example of when the tool cannot provide recommendations as shown in Example 7-44.
Each log file is color coded for easy identification and comparison. Also for log files that have
multiple iterations (hacmp.out, hacmp.out.1, hacmp.out.2,etc) they are consolidated into one
large file, again for ease of use.
To view the logs, choose a cluster and then click the Logs header, then choose a specific log
file, choose a node and window opens. Then repeat as needed and utilize the search function
as desired. The SMUI allows multiple log windows to be viewed simultaneously as shown in
Figure 7-39 on page 325.
For more information about automatic error notification, with examples of using and
configuring it, see 11.6, “Automatic error notification” on page 470.
In addition, the introduction of the Unmanaged Resource Groups option, while stopping
cluster services (which leaves the applications running without cluster services), makes
application monitors a crucial factor in maintaining application availability.
When cluster services are restarted to begin managing the resource groups again, the
process of acquiring resources will check each resource to determine if it is online. If it is
running, acquiring that resource is skipped.
For the application, for example running the server start script, this check is done by using an
application monitor. The application monitor’s returned status determines whether the
application server start script will be run.
What if no application monitor is defined? If that case the cluster manager runs the
application server start script. This might cause problems for applications that cannot deal
with another instance being started which could happen if the start script is run again when
the application is already running.
For each PowerHA application server configured in the cluster, you can configure up to
128 application monitors, but the total number of application monitors in a cluster cannot
exceed 128.
In long-running mode, the monitor periodically checks that the application is running
successfully. The checking frequency is set through the Monitor Interval. The checking begins
after the stabilization interval expires, the Resource Group that owns the application server is
marked online, and the cluster has stabilized.
In startup mode, PowerHA checks the process (or calls the custom monitor), at an interval
equal to one-twentieth of the stabilization interval of the startup monitor. The monitoring
continues until the following events occur:
– The process is active.
– The custom monitor returns a 0.
– The stabilization interval expires.
If successful, the resource group is put into the online state, otherwise the cleanup method is
invoked. In both modes, the monitor checks for the successful startup of the application
server and periodically checks that the application is running successfully.
2. Select from either the Configure Process Application Monitors menu or the Configure
Custom Application Monitors menu.
Tip: The SMIT fast path for application monitor configuration is smitty cm_appmon.
When PowerHA finds that the monitored application process (or processes) are terminated, it
tries to restart the application on the current node until a specified retry count is exhausted.
To add a new process application monitor using SMIT, use one of the following steps:
Run smitty sysmirror and then select Cluster Applications and Resources →
Resources → Configure User Applications (Scripts and Monitors) → Application
Monitors → Configure Process Application Monitors → Add a Process Application
Monitor.
Use smitty cm_appmon fast path.
Figure 7-40 shows the SMIT panel with field entries for configuring an example process
application monitor.
In our example, the application monitor is called APP1_monitor and is configured to monitor
the APP1 application server. The default monitor mode, Long-running monitoring, was
selected. A stabilization interval of 120 seconds was selected.
Note: The stabilization interval is one of the most critical values in the monitor
configuration. It must be set to a value that is determined to be long enough that if it
expires, the application has definitely failed to start. If the application is in the process of a
successful start and the stabilization interval expires, cleanup will be attempted and the
resource group will be placed into ERROR state. The consequences of the cleanup
process will vary by application and the method might provide undesirable results.
The application processes being monitored are app1d and app1testd. These must be present
in the output of a ps -el command when the application is running to be monitored through a
process monitor. They are owned by root and only one instance of each is expected to be
running as determined by Process Owner and Instance Count values.
If the application fails, the Restart Method is run to recover the application. If the application
fails to recover to a running state after the number of restart attempts exceed the Retry
Count, the Action on Application Failure is taken. The action can be notify or fallover. If
notify is selected, no further action is taken after running the Notify Method. If fallover is
selected, the resource group containing the monitored application moves to the next available
node in the resource group.
The Cleanup Method and Restart Method define the scripts for stopping and restarting the
application after failure is detected. The default values are the start and stop scripts as
defined in the application server configuration.
To add a new custom application monitor using SMIT, use one of the following steps:
Run smitty sysmirror and then select Cluster Applications and Resources →
Resources → Configure User Applications (Scripts and Monitors) → Application
Monitors → Configure Custom Application Monitors → Add a Custom Application
Monitor.
Run smitty cm_cfg_custom_appmon fast path.
The SMIT panel and its entries for adding this method into the cluster configuration are similar
to the process application monitor add SMIT panel, as shown in Figure 7-40 on page 327.
The only different fields in configuring custom application monitors SMIT menu are as follows:
Monitor Method Defines the full path name for the script that provides a method to
check the application status. If the application is a database, this
script could connect to the database and run a SQL select
sentence for a specific table in the database. If the given result of
the SQL select sentence is correct, it means that database works
normally.
Monitor Interval Defines the time (in seconds) between each occurrence of Monitor
Method being run.
Hung Monitor Signal Defines the signal that is sent to stop the Monitor Method if it
doesn't return within Monitor Interval seconds. The default action
is SIGKILL(9).
If only one application monitor is defined for an application server, the process is as simple as
stated previously.
If more than one application monitor is defined, the selection priority is based on the Monitor
Type (custom or process) and the Invocation (both, long-running or startup). The ranking of
the combinations of these two monitor characteristics is as follows:
– Both, Process
– Long-running, Process
– Both, Custom
– Long-running, Custom
– Startup, Process
– Startup, Custom
The highest priority application monitor found is used to test the state of the application.
When creating multiple application monitors for an application, be sure that your highest
ranking monitor according to the foregoing list returns a status that can be used by the cluster
manager to decide whether to invoke the application server start script.
When more than one application monitor meets the criteria as the highest ranking, the sort
order is unpredictable (because qsort is used). It does consistently produce the same result,
though.
Fortunately, there is a way to test which monitor will be used. The routine that is used by the
cluster manager to determine the highest ranking monitor is as follows:
/usr/es/sbin/cluster/utilities/cl_app_startup_monitor
An example of using this utility for the application server called testmonApp, which has three
monitors configured, is as follows:
/usr/es/sbin/cluster/utilities/cl_app_startup_monitor -s testmonApp -a
The output for this command, shown in Example 7-46 on page 330, shows three monitors:
– Mon: Custom, Long-running
– bothuser_testmon: Both, Custom
– longproctestmon: Process, Long-running
application = [testmonApp]
monitor_name = [bothuser_testmon]
resourceGroup = [NULL]
MONITOR_TYPE = [user]
PROCESSES = [NULL]
PROCESS_OWNER = [NULL]
MONITOR_METHOD = [/tmp/Bothtest]
INSTANCE_COUNT = [NULL]
MONITOR_INTERVAL = [10]
HUNG_MONITOR_SIGNAL = [9]
STABILIZATION_INTERVAL = [20]
INVOCATION = [both]
application = [testmonApp]
monitor_name = [longproctestmon]
resourceGroup = [NULL]
MONITOR_TYPE = [process]
PROCESSES = [httpd]
PROCESS_OWNER = [root]
MONITOR_METHOD = [NULL]
INSTANCE_COUNT = [4]
MONITOR_INTERVAL = [NULL]
HUNG_MONITOR_SIGNAL = [9]
STABILIZATION_INTERVAL = [60]
INVOCATION = [longrunning]
In the example, three monitors can be used for initial status checking. The highest ranking is
the long-running process monitor, longproctestmon. Recall that the Monitor Type for custom
monitors is user.
Note: A startup monitor will be used for initial application status checking only if no
long-running (or both) monitor is found.
If necessary, the application server start script is invoked. Simultaneously, all startup monitors
are invoked. Only when all the startup monitors indicate that the application has started, by
returning successful status, is the application considered online (and can lead to the resource
group going to the ONLINE state).
The stabilization interval is the timeout period for the startup monitor. If the startup monitor
fails to return a successful status, the application's resource group goes to the ERROR state.
After the startup monitor returns a successful status, there is a short time during which the
resource group state transitions to ONLINE, usually from ACQUIRING.
For each long-running monitor, the stabilization interval is allowed to elapse and then the
long-running monitor is invoked. The long-running monitor continues to run until a problem is
encountered with the application.
If the long-running monitor returns a failure status, the retry count is examined. If it is
non-zero, it is decremented, the Cleanup Method is invoked, and then the Restart Method is
invoked. If the retry count is zero, the cluster manager will process either a fallover event or a
notify event. This is determined by the Action on Application Failure setting for the monitor.
After the Restart Interval expires, the retry count is reset to the configured value.
Both /tmp/longR and /tmp/start-up methods check for /tmp/App in the process table. If
/tmp/App is found in the process table, the return code (RC) is 0; if not found, the RC is 1.
The /tmp/Stop method finds and kills the /tmp/App process in the process table to cause a
failure.
To see what happened in more detail, the failure as logged in /var/hacmp/log/hacmp.out (on
final RC=1 from start-up monitor) is shown Example 7-49.
2009-03-11T11:33:13.358195
2009-03-11T11:33:13.410297
Reference string: Wed.Mar.11.11:33:13.EDT.2009.start_server.testmonApp.testmon.ref
+testmon:start_server[start_and_monitor_server+110] echo ERROR: Application
Startup did not succeed.
ERROR: Application Startup did not succeed.
+testmon:start_server[start_and_monitor_server+114] echo testmonApp 1
+testmon:start_server[start_and_monitor_server+114] 1>>
/var/hacmp/log/.start_server.700610
+testmon:start_server[start_and_monitor_server+116] return 1
+testmon:start_server[+258] awk {
if ($2 == 0) {
exit 1
}
}
+testmon:start_server[+258] cat /var/hacmp/log/.start_server.700610
+testmon:start_server[+264] SUCCESS=0
+testmon:start_server[+266] [[ REAL = EMUL ]]
+testmon:start_server[+266] [[ 0 = 1 ]]
+testmon:start_server[+284] awk {
if ($2 == 1) {
exit 1
}
if ($2 == 11) {
exit 11
}
}
+testmon:start_server[+284] cat /var/hacmp/log/.start_server.700610
+testmon:start_server[+293] SUCCESS=1
+testmon:start_server[+295] [[ 1 = 0 ]]
+testmon:start_server[+299] exit 1
Mar 11 11:33:13 EVENT FAILED: 1: start_server testmonApp 1
+testmon:node_up_local_complete[+148] RC=1
+testmon:node_up_local_complete[+149] : exit status of start_server testmonApp is:
1
This is can be done through the SMIT C-SPOC menus, by running smitty cspoc, selecting
Resource Group and Applications → Suspend/Resume Application Monitoring →
Suspend Application Monitoring, and then selecting the application server that is
associated with the monitor you want to suspend.
Use the same SMIT path to resume the application monitor. The output of resuming the
application monitor associated with the application server APP1 is shown in Example 7-50.
According to the information, collected by the application availability analysis tool, you can
select a time for measurement period and the tool displays uptime and downtime statistics for
a specific application during that period. Using SMIT you can display this information:
Percentage of uptime
Amount of uptime
Longest period of uptime
Percentage of downtime
Amount of downtime
Longest period of downtime
Percentage of time application monitoring was suspended
The application availability analysis tool reports application availability from the PowerHA
cluster perspective. It can analyze only those applications that were correctly configured in
the cluster configuration.
This tool shows only the statistics that reflect the availability of the PowerHA application
server, resource group, and the application monitor (if configured). It cannot measure any
internal failure in the application that can be detected by the user, if it is not detected by the
application monitor.
You can display the specific application statistics, generated from the Application Availability
Analysis tool with SMIT menus by running smitty sysmirror and then selecting System
Management (C-SPOC) → Resource Group and Applications → Application Availability
Analysis.
Figure 7-41 shows the SMIT panel displayed for the application availability analysis tool in our
test cluster. You can use smitty cl_app_AAA.dialog fast path to get to the SMIT panel.
[Entry
Fields]
* Select an Application [App1] +
* Begin analysis on YEAR (1970-2038) [2012] #
* MONTH (01-12) [03] #
* DAY (1-31) [24] #
* Begin analysis at HOUR (00-23) [16] #
* MINUTES (00-59) [20] #
* SECONDS (00-59) [00] #
* End analysis on YEAR (1970-2038) [2012] #
* MONTH (01-12) [03] #
* DAY (1-31) [24] #
* End analysis at HOUR (00-23) [17] #
* MINUTES (00-59) [42] #
* SECONDS (00-59) [00] #
Figure 7-41 Adding Application Availability Analysis SMIT panel
In the SMIT menu of the Application Availability Analysis tool, enter the selected application
server, enter start and stop time for statistics, and run the tool. Example 7-51 shows the
Application Availability Analysis tool output from our test cluster.
Log records terminated before the specified ending time was reached.
Application monitoring was suspended for 75.87% of the time period analyzed.
Application monitoring state was manually changed during the time period analyzed.
Cluster services were manually restarted during the time period analyzed.
Typically data center clusters are deployed in trusted environments and hence might not need
any security to protect cluster packets (which are custom to begin with and also have no
user-related data).
Additionally in version 7, the use of the repository disk provides CAA an inherent security. The
repository disk is a shared disk across all the nodes of the CAA cluster and is used
extensively and continuously by the CAA for health monitoring and configuration purposes.
The expectation is that individual nodes have connectivity to the repository disk through the
SAN fabric and pass all security controls of the SAN fabric, regarding host access, to the disk.
Hosts can join the CAA cluster and become a member only if they have access to the shared
repository disk. As a result, any other node trying to spoof and join the cluster cannot
succeed unless it has an enabled physical connection to the repository disk.
The repository disk does not host any file system. This disk is accessed by clustering
components in a raw-format to maintain their internal data structures. These structures are
internal to clustering software and is not published anywhere.
Because of these reasons, most customers might choose to deploy clusters without enabling
any encryption/decryption for the cluster. However an administrator can choose to deploy
CAA security; the various configuration modes supported are described in later sections.
All cluster utilities intended for public use have hacmp setgid turned on so they can read the
PowerHA ODM files. The hacmp group is created during PowerHA installation, if it is not
already there.
Message authentication and encryption rely on Cluster Security (CtSec) Services in AIX, and
use the encryption keys available from Cluster Security Services. PowerHA SystemMirror
message authentication uses message digest version 5 (MD5) to create the digital signatures
for the message digest. CAA encrypts packets exchanged between the nodes using a
Symmetric key. This symmetric key can be one of the types listed in Table 8-1.
CAA exchanges the symmetric key for certain configuration methods with host-specific
certificate and private key pairs using asymmetric encryption and decryption.
The PowerHA SystemMirror product does not include encryption libraries. Before you can use
message authentication and encryption, the following AIX filesets must be installed on each
cluster node and can be found on the AIX Expansion Pack:
For data encryption with DES message authentication: rsct.crypt.des
For data encryption standard Triple DES message authentication: rsct.crypt.3des
For data encryption with Advanced Encryption Standard (AES) message authentication:
rsct.crypt.aes256
If you install the AIX encryption filesets after you have PowerHA SystemMirror running, restart
the Cluster Communications daemon to enable PowerHA SystemMirror to use these filesets.
To restart the Cluster Communications daemon execute the following as shown in
Example 8-1.
CAA provides these methods of security setup regarding asymmetric or symmetric keys:
Self-signed certificate-private key pair
The administrator can choose this option for easy setup. When the administrator uses this
option, CAA generates a certificate and private key pair. The asymmetric key pair
generated will be of the type RSA (1024 bits). In this case, the administrator also provides
a symmetric key algorithm to be used (key size is determined by the symmetric algorithm
selected, as shown in Table 8-1 on page 339).
User-provided certificate private key pair
With this option, administrators provide their own certificate and private key pair for each
host. The administrator must store the pair in a particular directory (same directory) on
each host in the cluster and then invoke the security configuration interface. The certificate
and private key pair must be of type RSA and be 1024 bits. The key pair must be in the
Distinguished Encoding Rules (DER) format. The user also provides the symmetric key
algorithm to be used (key size is determined by the symmetric algorithm selected, as
shown in Table 8-1 on page 339).
Fixed symmetric
For this option, administrators can choose not to set up certificate and private key pair per
node and instead provide a fixed symmetric key of their own to be used for security. In this
case, the administrator creates the key and stores the information in a directory
(/etc/security/cluster/SymKey), provides that as input to the clctrl CAA command,
and then chooses the symmetric algorithm to be used.
CAA security also supports the following levels of security. Currently these levels are not
differentiated at a fine granular level.
Medium or High All cluster packets will be encrypted and decrypted.
Low or Disable CAA security will be disabled.
Various CAA security keys are stored in the /etc/security/cluster/ directory. Default files
are as follows (the location and file names are internal to CAA but should not be assumed):
Certificate: /etc/security/cluster/cacert.der
Private key: /etc/security/cluster/cakey.der
Symmetric key: /etc/security/cluster/SymKey
Enabling security
Example 8-2 shows how to enable security by using clmgr.
Enabling security through SMIT requires more than one step as follows:
1. Run smitty cspoc and then select Security and Users → PowerHA SystemMirror
Cluster Security → Cluster Security Level.
2. Choose your preferred security level through the F4 picklist and press Enter. In our
example, we select High as shown in Figure 8-1.
[Entry Fields]
* Security Level [High] +
+--------------------------------------------------------------------------+
| Security Level |
| |
| Move cursor to desired item and press Enter. |
| |
| Disable |
| Low |
| Medium |
| High |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F8=Image F10=Exit Enter=Do |
F5| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 8-1 SMIT setting security level to high
3. Either back up one menu or use the fast path of smitty clustsec, select Advanced
Cluster Security Configuration → Setup Node Configuration Setup, and then select
an algorithm from the F4 pick list. We selected the options shown in Figure 8-2 on
page 342.
4. Upon successful completion, verify that the settings exist on the other node, jordan, as
shown in Example 8-3 on page 342.
[Entry Fields]
* Symmetric Algorithm [AES] +
+--------------------------------------------------------------------------+
| Symmetric Algorithm |
| |
| Move cursor to desired item and press Enter. |
| |
| DES |
| 3DES |
| AES |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F8=Image F10=Exit Enter=Do |
F5| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 8-2 Set symmetric algorithm through SMIT
Notice that the mechanism automatically defaults to self-sign. This can be set in SMIT by
running smitty clustersec and selecting Advanced Cluster Security Configuration →
Choose Security Mechanism, as shown in Figure 8-3.
[Entry Fields]
* Security Level [High] +
* Auto create & distribute Security Configuration [Self Signed Certifica> +
Certificate Location []
Private Key Location []
Figure 8-3 Set security mechanism through SMIT
Important: The settings are effective immediately and dynamically. Synchronizing the
cluster is not required.
Disabling security
Disabling security can also be done with the clmgr command or with SMIT. Example 8-4
shows disabling by using the clmgr command.
To disable through SMIT, run smitty clustersec, and select Cluster Security Level, and
then select Disable from the F4 pick list, as shown in Figure 8-4.
[Entry Fields]
* Security Level [Disable] +
Figure 8-4 Disabling security through SMIT
The -c and -f flags are optional. Example 8-5 shows two samples of the command. The first
one with the bare minimum requirements and the other one with the exact file paths.
In either case the changes are effective immediately and are automatically updated across
the cluster. There is no need for cluster synchronization.
If administrators wants to replace the existing symmetric key with a new one, they can do the
same by updating CAA to the new key and algorithm. When security is already enabled on
the cluster, the user will request to enable the security with a different symmetric key
algorithm. This also applies the same security algorithm. The CAA security mechanism first
disables the existing security and then enables the security with the requested symmetric key
algorithm/key.
Note: An easy method to generate a symmetric key is to collect a random set of bytes and
store it in a file. The example shows a key being generated for AES 256 algorithm use.
To generate a 256-bit key from the random device to a symmetric key (SymKey) file, use
these steps:
We suggest using SSH. DLPAR operations require SSH also. SSH and Secure Sockets Layer
(SSL) together provide authentication, confidentiality, and data integrity. The SSH
authentication scheme is based on public and private key infrastructure; SSL encrypts
network traffic.
Through the federated security cluster, administrators are able to manage roles and the
encryption of data across the cluster.
EFS
Keystore @ LDAP , Encrypt cluster Enable EFS
Keystore @ data at filesystem level
shared filesystem through efs
LDAP
The LDAP method is used by cluster nodes to allow centralized security authentication and
access to user and group information.
The following supported LDAP servers can be configured for federated security:
IBM Tivoli Director server
Windows Active Directory server
All cluster nodes must be configured with the LDAP server and the client file sets. PowerHA
provides options to configure the LDAP server and client across all cluster nodes.
SSL: Secure Sockets Layer (SSL) connection is mandatory for binding LDAP clients to
servers. Remember to configure SSL in the cluster nodes.
The LDAP server and client configuration is provided through the PowerHA smitty option and
the System Director PowerHA plug-in.
For the LDAP server and client setup, SSL must be configured. The SSL connection is
mandatory for binding LDAP clients to servers.
LDAP server: The LDAP server must be configured on all cluster nodes. If an LDAP
server exists, it can be incorporated into PowerHA for federated security usage.
Details of the LDAP server and client configuration are explained in “Configuring LDAP” on
page 349.
RBAC
Cluster administration is an important aspect of high availability operations, and security in
the cluster is an inherent part of most system administration functions. Federated security
integrates the AIX RBAC features to enhance the operational security.
During LDAP client configuration, four PowerHA defined roles are created in LDAP. These
roles can be assigned to the user to provide restricted access to the cluster functionality
based on the role.
ha_admin Provides administrator authorization for the relevant cluster
functionality. For example, taking a cluster snapshot is under
administrator authorization.
ha_op Provides operator authorization for the relevant cluster functionality.
For example, “move cluster resource group” is under operator
authorization.
ha_mon Provides monitor authorization for the relevant cluster functionality.
For example, the command clRGinfo is under monitor authorization.
ha_view Provides viewer authorization. It has all read permissions for the
cluster functionality.
Role creation: PowerHA roles are created when you configure the LDAP client in the
cluster nodes.
From the federated security perspective, the EFS keystores are stored in LDAP. There is an
option to store the keystores through a shared file system in the cluster environment if LDAP
is not configured in the cluster.
Tip: Store the EFS keystore in LDAP. As an option, if the LDAP environment is not
configured, the keystore can be stored in a Network File System (NFS) mounted file
system.
The file sets for RBAC and EFS are available by default in AIX 6.1 and later versions, and no
specific prerequisites are required. The challenge is to configure LDAP.
More information: For complete DB2 and LDAP configuration details, see this website:
https://www.ibm.com/docs/en/db2/10.5?topic=support-configuring-transparent-ldap
-aix
Configuring LDAP
Use the following steps to install the LDAP configuration:
1. Install and Configure DB2.
2. Install the GSkit file sets.
3. Install Tivoli Director Server (LDAP server and client) file sets.
Installing DB2
The DB2 installation steps are shown in Example 8-7.
Ensure that the SSL file sets are configured as shown in Example 8-9.
LDAP configuration
LDAP configuration using the SMIT panel can be reached through System Management
(C-SPOC) → LDAP as shown in Figure 8-6 on page 351.
Storage
PowerHA SystemMirror Services
Communication Interfaces
Resource Group and Applications
PowerHA SystemMirror Logs
PowerHA SystemMirror File Collection Management
Security and Users
LDAP
Configure GPFS
Under the LDAP server configuration, two options are provided (Figure 8-7):
Configure a new LDAP server.
Add an existing LDAP server.
If an LDAP server is already configured, the cluster nodes can use the existing LDAP server
or configure a new LDAP server.
Option1
Cluste r nodes
Option2 Clus ter nodes
nodeA
nodeA
LDA P server &
LDA P server & L DAP client
L DAP client
External
L DAP Se rver
nodeB
nodeB
LDA P server &
LDA P server &
L DAP client
L DAP client
[Entry Fields]
* Hostname(s) +
* LDAP Administrator DN [cn=admin]
* LDAP Administrator password []
Schema type rfc2307aix
* Suffix / Base DN [cn=aixdata,o=ibm]
* Server port number [636] #
* SSL Key path [] /
* SSL Key password []
* Version +
* DB2 instance password []
* Encryption seed for Key stash files []
The success of the LDAP configuration can be verified by using the ODM command that is
shown in Example 8-11.
Example 8-11 ODM command to verify LDAP configuration for federated security
# odmget -q "group=LDAPServer and name=ServerList" HACMPLDAP
HACMPLDAP:
group = "LDAPServer"
type = "IBMExisting"
name = "ServerList"
value = "selma06,selma07"
[Entry Fields]
* LDAP server(s) []
* Bind DN [cn=admin]
* Bind password []
* Suffix / Base DN [cn=aixdata,o=ibm]
* Server port number [636] #
* SSL Key path [] /
* SSL Key password []
The success of adding an existing LDAP server is verified with the ODM command that is
shown in Example 8-12.
Example 8-12 ODM command to verify the existing LDAP configuration for federated security
# odmget -q "group=LDAPServer and name=ServerList" HACMPLDAP
HACMPLDAP:
group = "LDAPServer"
type = "IBMExisting"
name = "ServerList"
value = "selma06,selma07"
[Entry Fields]
* LDAP server(s) [] +
* Bind DN [cn=admin]
* Bind password []
Authentication type ldap_auth
* Suffix / Base DN [cn=aixdata,o=ibm]
* +--------------------------------------------------------------------------+#
* | LDAP server(s) |/
* | |
| Move cursor to desired item and press F7. |
| ONE OR MORE items can be selected. |
| Press Enter AFTER making all selections. |
| |
| quimby06 |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F7=Select F8=Image F10=Exit |
F5| Enter=Do /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Figure 8-10 LDAP client configuration parameters
Verify the client configuration by using the ODM command that is shown in Example 8-13.
HACMPLDAP:
group = "LDAPClient"
type = "ITDSClinet"
name = "ServerList"
value = "selma06,selma07"
You can also verify the client configuration by checking the LDAP client daemon status, by
using the command that is shown in Example 8-14 on page 355.
Example 8-14 Verify the client daemon status after LDAP client configuration
# ps -eaf | grep secldapclntd
root 4194478 1 2 04:30:09 - 0:10 /usr/sbin/secldapclntd
RBAC Configuration
During the LDAP client configuration, the PowerHA defined roles are created in the LDAP
server.
Verify the configuration of the RBAC roles in the LDAP server by using the ODM command
that is shown in Example 8-15.
Example 8-15 ODM command to verify RBAC configuration into LDAP server
# odmget -q "group=LDAPClient and name=RBACConfig" HACMPLDAP
HACMPLDAP:
group = "LDAPClient"
type = "RBAC"
name = "RBACConfig"
value = "YES"
Verify the four PowerHA defined roles that are created in LDAP, as shown in Example 8-16.
Example 8-16 shows that the RBAC is configured and can be used by the cluster users and
groups. The usage scenario of roles by cluster users and groups are defined in “EFS
Configuration” on page 355.
EFS Configuration
The EFS management scenario is shown in the flow chart in Figure 8-12 on page 356.
To configure the EFS management configuration (Figure 8-11), run smitty sysmirror and
select System Management (C-SPOC) → Security and Users → EFS Management.
No
Create efskeystor e
(shared) filesystem
Store EFSkeystor e
in shared filesystem
Under EFS management, the options are provided to enable EFS and to store keystores
either in LDAP or a shared file system.
Important: Federated security mandates that the LDAP configuration creates roles and
stores EFS keystores. You can store EFS keystores under the shared file system only if
LDAP is not configured.
Important: The volume group and service IP are invalid and ignored in LDAP mode.
[Entry Fields]
* EFS keystore mode LDAP +
EFS admin password []
Volume group for EFS Keystore [] +
Service IP [] +
+--------------------------------------------------------------------------+
| EFS keystore mode |
| |
| Move cursor to desired item and press Enter. |
| |
| LDAP |
| Shared Filesystem |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F8=Image F10=Exit Enter=Do |
F5| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
Verify the EFS enablement as understood by the cluster, by using the command that is shown
in Example 8-17.
HACMPLDAP:
group = "EFSKeyStore"
type = "EFS"
name = "mode"
value = "1"
As shown in Figure 8-13, to enable EFS and to store the EFS keystore in the shared file
system, provide the volume group and service IP details:
The volume group to store the EFS keystore in a file system.
The service IP to mount the file system where the keystore is stored so that it is highly
available to cluster nodes.
Important: The file system creation, mount point, and NFS export are performed internally
under the EFS keystore in a shared file system option.
Verify the configuration by using the ODM command that is shown in Example 8-18.
Example 8-18 ODM command to verify EFS configuration in shared file system mode
# odmget -q "group=EFSKeyStore AND name=mode" HACMPLDAP
HACMPLDAP:
group = "EFSKeyStore"
type = "EFS"
name = "mode"
value = "2"
Part 4
9.1 Virtualization
Virtualization is now common in the configuration of a POWER systems environment. Any
environment, whether virtual or not, requires detailed planning and documentation, enabling
administrators to effectively maintain and manage these environments.
When planning a virtual environment in which to run a PowerHA cluster, we must focus on
improving hardware concurrency at the Virtual I/O Server level and also in the PowerHA
cluster nodes. Typically, the Virtual I/O Server hosts the physical hardware being presented to
the cluster nodes, so a critical question to address is: What would happen to your cluster if
any of those devices were to fail?
The Virtual I/O Server is considered a single point of failure, so you should consider
presenting shared disk and virtual Ethernet to your cluster nodes from additional Virtual I/O
Server partitions. Figure 9-1 shows an example of considerations for PowerHA clusters in a
virtualized environment.
For more information about configuring Virtual I/O Servers, see IBM PowerVM Virtualization
Introduction and Configuration, SG24-7940.
Management of active cluster shared storage is done at the cluster node level. The Virtual
I/O Server presents this storage only to the cluster nodes.
Be sure that the reservation policy of all shared disks presented through the Virtual I/O
Server is set to no_reserve.
All volume groups that are created on VIO clients, used for PowerHA clusters, must be
enhanced concurrent-capable, whether they are to be used in concurrent mode or not.
Use of an HMC is required only if you want to use DLPAR with PowerHA.
Integrated Virtualization Manager (IVM) contains a restriction on the number of VLANs
that a Virtual I/O Server can have. The maximum number is 4 VLANs.
Several ways are available to configure AIX client partitions and resources for extra high
availability with PowerHA. We suggest that you use at least two Virtual I/O Servers for
maintenance tasks at that level. An example of a PowerHA configuration that is based on VIO
clients is shown in Figure 9-2.
If file systems are used on the standby nodes, they are not mounted until the point of fallover,
because the volume groups are in full active read/write mode only on the home node; the
standby nodes have the volume groups in passive mode, which does not allow access to the
logical volumes or file systems. If shared volumes (raw logical volumes) are accessed directly
in enhanced concurrent mode, these volumes are accessible from multiple nodes, so access
and locking must be controlled at a higher layer, such as databases.
All volume group creation and maintenance is done using the C-SPOC function of PowerHA
and the bos.clvm.enh file set must be installed.
Example configuration
The following steps describe an example of how to set up concurrent disk access for a SAN
disk that is assigned to two client partitions. Each client partition sees the disk through two
Virtual I/O Servers. On the disk, an enhanced concurrent volume group is created. This kind
of configuration can be used to build a two-node PowerHA test cluster on a single POWER
machine:
1. Create the disk on the storage device.
2. Assign the disk to the Virtual I/O Servers.
3. On the first Virtual I/O Server, do the following tasks:
a. Scan for the newly assigned disk:
$ cfgdev
b. Change the SCSI reservation of that disk to no_reserve so that the SCSI reservation
bit on that disk is not set if the disk is accessed:
$ chdev -dev hdiskN -attr reserve_policy=no_reserve
Where N is the number of the disk. Reservation commands are specific to the
multipathing disk driver in use.
c. Assign the disk to the first partition:
$ mkvdev -vdev hdiskN -vadapter vhostN [ -dev Name ]
Where N is the number of the disk; the vhost and the device name can be selected to
what you want, but can also be left out entirely. The system then creates a name
automatically.
d. Assign the disk to the second partition:
$ mkvdev -f -vdev hdiskN -vadapter vhostN [ -dev Name ]
4. On the second Virtual I/O Server, do the following tasks:
a. Scan for the disk:
$ cfgdev
b. Change the SCSI reservation of that disk:
$ chdev -dev hdiskN -attr reserve_policy=no_reserve
c. Assign the disk to the first cluster node:
$ mkvdev -vdev hdiskN -vadapter vhostN [ -dev Name ]
d. Assign the disk to the second cluster node:
$ mkvdev -f -vdev hdiskN -vadapter vhostN [ -dev Name ]
5. On the first cluster node, do the following tasks:
a. Scan for that disk:
# cfgmgr
b. Create an enhanced concurrent-capable volume group and a file system by using
C-SPOC.
You now see the volume groups and file systems on the second cluster node.
However, whenever using virtual Ethernet with an Shared Ethernet Adapter (SEA) backed
configuration another option is available on the virtual interface called poll_uplink. More
details can be found at 12.6, “Using poll_uplink” on page 498.
For cluster nodes that use virtual Ethernet adapters, multiple configurations are possible for
maintaining high availability at network layer. Consider the following factors:
Configure dual VIOS to ensure high availability of virtualized network paths.
Use the servers that are already configured with virtual Ethernet settings because no
special modification is required. For a VLAN-tagged network, the preferred solution is to
use SEA fallover; otherwise, consider using the network interface backup (NIB).
One client-side virtual Ethernet interface simplifies the configuration; however, PowerHA
might miss network events. For a more comprehensive cluster configuration, configure two
virtual Ethernet interfaces on the cluster LPAR to enable PowerHA. Two network interfaces
are required by PowerHA to track network changes, similar to physical network cards. Be
sure to have two client-side virtual Ethernet adapters that use different SEAs. This
ensures that any changes in the physical network environment can be informed to the
PowerHA cluster using virtual Ethernet adapters such as in a cluster with physical network
adapters.
each network. Dual redundancy is provided with SEA fallover and PowerHA. PowerHA can
also track network events.
Unlike virtual Ethernet, the poll_uplink does not apply to SR-IOV or VNIC devices. This is
primarily because the VIOS does not own the physical adapters or ports, they are assigned to
the PowerVM hypervisor. Additional information on configuring VNIC devices can be found
here.
Though not required, it is still generally recommended to utilize netmon.cf file as covered in
12.5, “Understanding the netmon.cf file” on page 495.
Important: You can perform LPM on a PowerHA SystemMirror LPAR that is configured
with SAN communication. However, when you use LPM, the SAN communication is not
automatically migrated to the destination system. You must configure the SAN
communication on the destination system before you use LPM. Full details can be found
here.
PowerHA SystemMirror 7.2.x supports SAN-based heartbeat within a site. The SAN
heartbeating infrastructure can be accomplished in many ways:
Using real or physical adapters on cluster nodes and enabling the storage framework
capability (sfwcomm device) of the HBAs. Currently FC and SAS technologies are
supported. For more details about the HBAs and the steps to set up the storage
framework communication, see this IBM Knowledge Center topic.
In a virtual environment using NPIV or vSCSI with a Virtual I/O Server. Enabling the
sfwcomm interface requires activating the target mode enabled (the tme attribute) on the
real adapters in the VIOS and defining a private virtual LAN (VLAN) with VLAN ID 3358 for
communication between the partitions that contain the sfwcomm interface and VIOS (the
VLAN acts as control channel such as in case of SEA fallover). The real adapter on VIOS
must be a supported HBA.
Using FC or SAN heartbeat requires zoning of corresponding FC adapter ports (real FC
adapters or virtual FC adapters on VIOS).
Storage zones:
– Contains the LPAR’s virtual WWPNs.
– Contains the storage controller’s WWPNs.
After the zoning is complete, the next step is to enable target mode enabled attribute. The
target mode enabled (tme) attribute for a supported adapter is available only when the
minimum AIX level for CAA is installed (AIX 6.1 TL6 or later or AIX 7.1 TL0 or later). This must
be performed on all Virtual I/O Servers. The configuration steps are as follows:
1. Configure the FC adapters for SAN heartbeating on VIOS:
# chdev -l fcsX -a tme=yes -P
2. Repeat step 1 for all FC adapters.
3. Set dynamic tracking to yes and FC error recovery to fast_fail:
# chdev -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail -P
4. Reboot the VIOS.
5. Repeat steps 1 - 4 for every VIOS that serves the cluster LPARs.
6. On the HMC, create a new virtual Ethernet adapter for each cluster LPAR and VIOS. Set
the VLAN ID to 3358. Do not put other VLAN ID or any other traffic on this interface. Save
the LPAR profile.
7. On the VIOS, run the cfgmgr command and check for the virtual Ethernet and sfwcomm
device by using the lsdev command:
#lsdev -C | grep sfwcomm
Notes:
sfwcomm0 Available 01-00-02-FF Fibre Channel Storage Framework Communication
sfwcomm1 Available 01-01-02-FF Fibre Channel Storage Framework Communication
8. On the cluster nodes, run the cfgmgr command, and check for the virtual Ethernet adapter
and sfwcomm with the lsdev command.
9. No other configuration is required at the PowerHA level. When the cluster is configured
and running, you can check the status of SAN heartbeat by using the lscluster -i
sfwcin command.
We expect that proper LPAR and DLPAR planning is part of your overall process before
implementing any similar configuration. Understanding the following topics is important:
The requirements and how to implement them.
The overall effects that each decision has on the overall implementation.
By integrating with DLPAR and CoD resources, PowerHA SystemMirror ensures that each
node can support the application with reasonable performance at a minimum cost. This
allows for flexibility in tuning the resource capacity of the LPAR by using the On/Off CoD
function to upgrade the capacity when your application requires more resources, without
having to pay for idle capacity until you need it. Using the Enterprise Pool CoD (EPCoD)
allows for sharing resources across systems in the same enterprise pool whenever needed.
Table 9-1 displays all of the available types of the CoD offering. Only two of them are
dynamically managed and controlled by PowerHA SystemMirror – EPCoD and On/Off CoD.
Utility CoD (temporary) Utility CoD automatically is performed at the PHYP/System level.
Memory and Processor PowerHA cannot play a role in the same system.
9.3.2 Planning
To prepare for ROHA configuration, the following cluster information needs to be reviewed:
LPAR resources information and resource group policies information.
– The amount of memory and CPU resources the applications supported by your cluster
require when running on their regular hosting nodes. Under normal running conditions,
check how much memory and what number of CPUs each application uses to run with
optimum performance on the LPAR node on which its resource group resides
(normally, home node for the resource group).
– Determine the startup, fallover, and fallback policies of the resource group that contains
the application controller by using the clRGinfo command. This identifies the LPAR
node to which the resource group will fall over, in case of a failure.
– The amount of memory and CPUs allocated to the LPAR node on which the resource
group will fall over, in case of a failure. This LPAR node is referred to as a standby
node. With these numbers in mind, consider whether the application's performance on
the standby node will be impaired, if the application is running with fewer resources.
– Check on the existing values for the LPAR minimums, maximums, and desired
amounts (resources and memory) specified by using the lshwres or pvmctl command
on the standby node.
In PowerHA SystemMirror Version 7.2.1 SP1or later, ROHA supports resources that can
be released asynchronously if the source node and the target node are located on
different systems.
Command Execution. In the Remote Command Execution window, select Enable remote
command execution using the ssh facility. The AIX operating system must have SSH
installed to generate the public and private keys.
Note: PowerHA SystemMirror uses the root user on the cluster node to issue the SSH
commands to the HMC. On the HMC system, the commands are run as the hscroot
user.
Note: PowerHA SystemMirror ROHA does not release more resources than the
resources received from the resource release operation while using EPCoD or On/Off
CoD.
Ensure that all nodes and the HMC are configured identically by checking the following list. All
systems (all PowerHA nodes and the HMC) must do these tasks:
Resolve the participating host names and IP addresses identical. This includes reverse
name resolution.
Use the same type of name resolution, either short or long name resolution. All systems
should use the same name resolution order, either local or remote.
To ensure these requirements, check the following files:
– /etc/hosts on all systems
– /etc/netsvc.conf on all AIX nodes
– /etc/host.conf, if applicable, on the HMC
We expect that knowing how to check these files on the AIX systems is common knowledge.
However, what is not as well known is how to check the files on the HMC which is covered in
the following sections.
With different versions of SSH and HMC code, these steps might differ slightly. We
documented the processes which we used to successfully implement our environment.
You can now install using smitty install_all. The core file sets required to install and the
results of our installation are shown in Example 9-1.
Tip: Be sure to choose yes in the field to accept the license agreement.
Now that SSH is installed, we need to configure the PowerHA nodes to access the HMC
without passwords for remote DLPAR operations.
Note: PowerHA SystemMirror Version 7.2.5 for AIX or later supports a non-root user to
connect with the HMC using SSH communications. PowerHA SystemMirror Version 7.2.4
for AIX, and earlier, only supports hscroot as the HMC username. The following
configuration steps assumes that hscroot is being used.
You must create the SSH directory $HOME/.ssh for the root user to store the authentication
keys. PowerHA runs the SSH remote DLPAR operations as the root user. By default this is
/.ssh. This is what we used in the following steps:
To generate public and private keys, run the following command on each PowerHA node:
/usr/bin/ssh-keygen -t rsa
This will create the following files in /.ssh:
private key: id_rsa
public key: id_rsa.pub
The write bits for both group and other are turned off. Ensure that the private key has a
permission of 600.
The public key for the HMC must be in the known_hosts file on each PowerHA node and
the public key for each node must be in the HMC.
This is easily accomplished by running ssh to the HMC from each PowerHA node. The
first time you run it, you are prompted to insert the key into the file. Answer yes to
continue.
You are then prompted to enter a password, which is necessary now because we have not
completed the setup yet to allow non-password SSH access, as shown in Example 9-2.
When using two HMCs, you must repeat this process for each HMC. You should also do this
between all member nodes to allow SSH based operations between them – scp, sftp, and
ssh.
To allow non-password SSH access, we put each PowerHA node’s public key into the
authorized_keys2 file on the HMC. This can be done in more than one way; you can consult
the HMC for information about using mkauthkeys, however here is an overview of the steps
we used:
Copy (scp) the authorized_keys2 file from the HMC to the local node.
Concatenate (cat) the public key for each node into the authorized_keys2 file.
Repeat on each node.
Copy (scp) the concatenated file to the HMC /home/hscroot/.ssh.
In the /.ssh directory, we then copied it to the local node by running this command:
scp hscroot@itsohmc:~/.ssh/authorized_keys2 ./authorized_keys2.hmc
2. Next, from /.ssh on each AIX LPAR, we made a copy of the public key and renamed it to
include the local node name as part of the file name. We then copied, through scp, the
public key of each machine (jessica and shanley) to one node (cassidy).
We then ran the cat command to create an authorized_keys2 file that contains the public
key information for all PowerHA nodes. Then we used scp to copy the combined file to the
HMC. The commands that were run on each node are shown in Example 9-4.
As you can see in Example 9-4 on page 375, when running the scp command to the HMC,
you are prompted to enter the password for the hscroot user in order to copy the combined
authorized_key2 file to the HMC.
3. You can then test whether the no-password access is working from each node by using
the ssh command as shown in Example 9-2 on page 374. However, this time, you should
arrive at the HMC shell prompt, as shown in Example 9-5.
Once each node can use ssh to the HMC without a password, then this step is completed
and PowerHA verification of the HMC communications will succeed.
Figure 9-4 shows a summary of the SMIT menu navigation for all ROHA panels.
HMC configuration
To define the HMC configuration to the cluster perform the following steps:
1. Enter smit sysmirror → Cluster Applications and Resources → Resources →
Configure User Applications (Scripts and Monitors) → Resource Optimized High
Availability → HMC Configuration.
2. The next panel is a menu panel with a title line and seven menu options as shown in
Figure 9-5. Its fast path is smitty cm_cfg_hmc.
HMC Configuration
Table 9-2 shows the help information for the HMC Configuration menu.
Change/Show HMC Definition Select this option to modify or view an HMC host name and
# smitty cm_cfg_ch_hmc communication parameters.
Remove HMC Definition Select this option to remove an HMC, and then remove it from
# smitty cm_cfg_rm_hmc the default list.
Change/Show HMC List for a Node Select this option to modify or view the list of an HMC of a
# smitty cm_cfg_hmcs_node node.
Change/Show Default HMC Tunables Select this option to modify or view the HMC default
# smitty cm_cfg_def_hmc_tun communication tunables.
Select this option to modify or view the default HMC list that is
Change/Show Default HMC List
used by default by all nodes of the cluster. Nodes that define
# smitty cm_cfg_def_hmcs
their own HMC list do not use this default HMC list.
Note: Before you add an HMC, you must set up password-less communication from AIX
nodes to the HMC. For more information, see “HMC configuration” on page 377.
To add an HMC, select Add HMC Definition. The next panel shown is a dialog panel with a
title dialog header and several dialog command options. Its fast path is smitty
cm_cfg_add_hmc. Each item has a context-sensitive help window that you can access by
pressing F1. You also can see any associated list for each item by pressing F4.
Figure 9-6 shows the menu to add the HMC definition and its entry fields.
[Entry Fields]
* HMC name [] +
If the DNS is configured in your environment and can resolve the HMC IP and host name,
then you can press F4 to select an HMC to be added.
Figure 9-7 shows an example of selecting one HMC from the list to perform the add
operation.
HMC name
e16hmc1 is 9.3.207.130
e16hmc3 is 9.3.207.133
Table 9-3 on page 379 shows the help panel describing all of the options available for use as
when adding the HMC definition.
PowerHA SystemMirror also supports entering the HMC IP address to add the HMC.
Figure 9-8 on page 379 shows an example of entering an HMC IP address to add the HMC.
Table 9-3 Context-sensitive help and associated list for Add HMC Definition menu
Name Context-sensitive help (F1) Associated list (F4)
HMC name Enter the host name for the HMC. An IP address is Yes (single-selection).
also accepted here. Both IPv4 and IPv6 addresses are The list is obtained by running
supported. the following command:
/usr/sbin/rsct/bin/rmcdomai
nstatus -s ctrmc -a IP
Number of retries Enter a number of times one HMC command is retried None.
before the HMC is considered as non-responding. The
next HMC in the list is used after this number of retries
fails. Setting no value means that you use the default
value, which is defined in the Change/Show Default
HMC Tunables panel. When -1 is displayed in this
field, it indicates that the default value is used.
Nodes Enter the list of nodes that use this HMC. Yes (multiple-selection).
A list of nodes to be proposed
can be obtained by running the
following command:
odmget HACMPnode
Sites Enter the sites that use this HMC. All nodes of the sites Yes (multiple-selection).
then use this HMC by default, unless the node defines A list of sites to be proposed
an HMC as its own level. can be obtained by running the
following command:
odmget HACMPsite
Check connectivity Select Yes to check communication links between <Yes>|<No>. The default is
between the HMC nodes and HMC. Yes.
and nodes
| HMC name
|
| Move cursor to desired item and press Enter.
|
| e16hmc1
| e16hmc3
|
| F1=Help F2=Refresh F3=Cancel
| Esc+8=Image Esc+0=Exit Enter=Do
| /=Find n=Find Next
Figure 9-9 Selecting an HMC from a list to change or show an HMC configuration
To modify an existing HMC, select it and press Enter. The next panel is the one that is shown
in Figure 9-10. Note that the HMC name can not be changed. To change the name the HMC
definition must be deleted and re-added.
[Entry Fields]
* HMC name e16hmc1
HMC name
e16hmc1
e16hmc3
The next panel shown is seen in Figure 9-12. Press Enter to remove the HMC definition.
[Entry Fields]
* HMC name e16hmc1
Select a Node
ITSO_rar1m3_Node1
ITSO_r1r9m1_Node1
To modify an existing node, select it and press Enter. The next panel (Figure 9-14) is a dialog
panel with a title dialog header and two dialog command options.
[Entry Fields]
* Node name ITSO_rar1m3_Node1
HMC list [e16hmc1 e16hmc3]
Note that you cannot add or remove an HMC from this list. You can only order the list of the
HMCs that are used by the node in the correct precedence order.
Table 9-4 shows the help information for the Change/Show HMC List for a Node menu.
Table 9-4 Context-sensitive help for the Change/Show HMC List for a Node menu
Name and fast path Context-sensitive help (F1)
Node name This is the node name to associate with one or more HMCs.
HMC list The precedence order of the HMCs that are used by this node. The first in
the list is tried first, then the second, and so on. You cannot add or remove
any HMC. You can modify only the order of the already set HMCs.
Select a Site
site1
site2
To modify an existing site, select it and press Enter. The next panel (Figure 9-16) is a dialog
panel with a title dialog header and two dialog command options.
[Entry Fields]
* Site Name site1
HMC list [e16hmc1 e16hmc3]
Figure 9-16 Change/Show HMC List for a Site menu
Again, you cannot add or remove an HMC from the list. You can only reorder the HMCs that
are used by the site. Table 9-5 shows the help information for the Change/Show HMC List
for a Site menu.
Site name This is the site name to associate with one or more HMCs.
HMC list The precedence order of the HMCs that are used by this site. The first in the list is
tried first, then the second, and so on. You cannot add or remove any HMC. You can
modify only the order of the already set HMCs.
[Entry Fields]
Resources Optimized High Availability management No +
can take advantage of On/Off CoD resources.
On/Off CoD use would incur additional costs.
Do you agree to use On/Off CoD and be billed
for extra costs?
App1
App2
To add hardware resource provisioning for an application controller, the list displays only
application controllers that do not already have hardware resource provisioning, as shown in
Figure 9-22.
To modify an existing application controller, select it and press Enter. Each item has an
available context-sensitive help window by pressing F1. Any existing selectable items can be
found by pressing F4 in the desired field.
[Entry Fields]
* Application Controller Name App1
To modify or remove hardware resource provisioning for an application controller, the popup
picklist displays only application controllers that already have hardware resource provisioning.
Application Controller This is the application controller for which you configure DLPAR and CoD resource
Name provisioning.
Use desired level from the There is no default value. You must make one of the following choices:
LPAR profile Enter Yes if you want the LPAR hosting your node to reach only the level of
resources that is indicated by the desired level of the LPAR’s profile. By
selecting Yes, you trust the desired level of LPAR profile to fit the needs of your
application controller.
Enter No if you prefer to enter exact optimal values for memory, processor
(CPU), or both. These optimal values match the needs of your application
controller, and you have better control of the level of resources that are
allocated to your application controller.
Enter nothing if you do not need to provision any resource for your application
controller.
For all application controllers that have this tunable set to Yes, the allocation that is
performed lets the LPAR reach the LPAR desired value of the profile.
Suppose that you have a mixed configuration, in which some application controllers
have this tunable set to Yes, and other application controllers have this tunable set
to No with some optimal level of resources specified. In this case, the allocation that
is performed lets the LPAR reach the desired value of the profile that is added to the
optimal values.
Optimal number of Enter the amount of memory that PowerHA SystemMirror attempts to acquire for
gigabytes of memory the node before starting this application controller.
This Optimal number of gigabytes of memory value can be set only if the Used
desired level from the LPAR profile value is set to No.
Enter the value in multiples of ¼, ½, ¾, or 1 GB. For example, 1 represents 1 GB or
1024 MB, 1.25 represents 1.25 GB or 1280 MB, 1.50 represents 1.50 GB or 1536
MB, and 1.75 represents 1.75 GB or 1792 MB.
If this amount of memory is not satisfied, PowerHA SystemMirror takes resource
group (RG) recovery actions to move the RG with this application to another node.
Alternatively, PowerHA SystemMirror can allocate less memory depending on the
Start RG even if resources are insufficient cluster tunable.
Optimal number of Enter the number of processors that PowerHA SystemMirror attempts to allocate to
dedicated processors the node before starting this application controller.
This attribute is only for nodes running on an LPAR with Dedicated Processing
Mode.
This Optimal number of dedicated processors value can be set only if the Used
desired level from the LPAR profile value is set to No.
If this number of CPUs is not satisfied, PowerHA SystemMirror takes RG recovery
actions to move the RG with this application to another node. Alternatively,
PowerHA SystemMirror can allocate fewer CPUs depending on the Start RG even
if resources are insufficient cluster tunable.
Optimal number of Enter the number of processing units that PowerHA SystemMirror attempts to
processing units allocate to the node before starting this application controller.
This attribute is only for nodes running on an LPAR with Shared Processing Mode.
This Optimal number of processing units value can be set only if the Used desired
level from the LPAR profile value is set to No.
Processing units are specified as a decimal number with two decimal places, 0.01
- 255.99.
This value is used only on nodes that support allocation of processing units.
If this number of CPUs is not satisfied, PowerHA SystemMirror takes RG recovery
actions to move the RG with this application to another node. Alternatively,
PowerHA SystemMirror can allocate fewer CPUs depending on the Start RG even
if resources are insufficient cluster
tunable. .
Optimal number of virtual Enter the number of virtual processors that PowerHA SystemMirror attempts to
processors allocate to the node before starting this application controller.
This attribute is only for nodes running on an LPAR with Shared Processing Mode.
This Optimal number of dedicated or virtual processors value can be set only
if the Used desired level from the LPAR profile value is set to No.
If this number of virtual processors is not satisfied, PowerHA SystemMirror takes
RG recovery actions to move the RG with this application to another node.
Alternatively, PowerHA SystemMirror can allocate fewer CPUs depending on the
Start RG even if resources are insufficient cluster tunable.
If Use desired level from the LPAR profile is set to No, then at least the memory (Optimal
number of gigabytes of memory) or CPU (Optimal number of dedicated or virtual
processors) setting is mandatory.
[Entry Fields]
Dynamic LPAR
Always Start Resource Groups Yes +
Adjust Shared Processor Pool size if required No +
Force synchronous release of DLPAR resources No +
Enterprise Pool
Resource Allocation order Free Pool First +
On/Off CoD
I agree to use On/Off CoD and be billed for No +
extra costs
Number of activating days for On/Off CoD requests [30] #
Table 9-7 Context-sensitive help for Change/Show Default Cluster Tunables menu
Name and fast path Context-sensitive help (F1)
Always start Resource Enter Yes to have PowerHA SystemMirror start RGs even if there are any errors in
Groups ROHA resources activation. Errors can occur when the total requested resources
exceed the LPAR profile’s maximum or the combined available resources, or if there
is a total loss of HMC connectivity. Thus, the best-can-do allocation is performed.
Enter No to prevent starting RGs if any errors occur during ROHA resources
acquisition.
The default is Yes.
Adjust Shared Processor Enter Yes to authorize PowerHA SystemMirror to dynamically change the
Pool size if required user-defined Shared Processors Pool boundaries, if necessary. This change can
occur only at takeover, and only if CoD resources are activated for the CEC so that
changing the maximum size of a particular Shared Processors Pool is not done to
the detriment of other Shared Processors Pools.
The default is No.
Force synchronous release Enter Yes to have PowerHA SystemMirror release CPU and memory resources
of DLPAR resources synchronously. For example, if the client must free resources on one side before
they can be used on the other side. By default, PowerHA SystemMirror
automatically detects the resource release mode by looking at whether Active and
Backup nodes are on the same or different CECs. A best practice is to have
asynchronous release to not delay the takeover.
The default is No.
I agree to use On/Off CoD Enter Yes to have PowerHA SystemMirror use On/Off CoD to obtain enough
and be billed for extra costs resources to fulfill the optimal amount that is requested. Using On/Off CoD requires
an activation code to be entered on the HMC and can result in extra costs due to
the usage of the On/Off CoD license.
The default is No.
Number of activating days Enter a number of activating days for On/Off CoD requests. If the requested
for On/Off CoD requests available resources are insufficient for this duration, then longest-can-do allocation
is performed. Try to allocate the amount of resources that is requested for the
longest duration. To do that, consider the overall resources that are available. This
number is the sum of the On/Off CoD resources that are already activated but not
yet used, and the On/Off CoD resources not yet activated.
The default is 30.
The verification tool can be used to ensure that their environment is correct regarding a
ROHA setup. Discrepancies are called out by PowerHA SystemMirror, and the tool assists in
correcting the configuration if possible.
The user is actively notified of critical errors. A distinction can be made between errors that
are raised during configuration and errors that are raised during cluster synchronization.
As a general principal, any problems that are detected at configuration time are presented as
warnings instead of errors. Another general principle is that PowerHA SystemMirror checks
only what is being configured at configuration time and not the whole configuration. PowerHA
SystemMirror checks the whole configuration at verification time.
For example, when adding an HMC, you check only the new HMC (verify that it is pingable, at
an appropriate software level, and so on) and not all of the HMCs. Checking the whole
configuration can take some time and is done at verify and sync time rather than each
individual configuration step.
General verification
Table 9-8 shows the general verification list.
Check that all RG active and standby nodes are on different CECs, Info Warning
which enables the asynchronous mode of releasing resources.
This code cannot run on an IBM POWER4 processor-based system. Error Error
Only one HMC with password-less SSH communication exists per Warning Warning
node.
Check that all HMCs share the same level (the same version of Warning Warning
HMC).
Check that all HMCs administer the CEC hosting the current node. Warning Warning
Configure two HMCs administering the CEC hosting the current
node. If not, PowerHA gives a warning message.
Check whether the HMC level supports FSP Lock Queuing. Info Info
CoD verification
Table 9-10 shows the CoD verification.
Determine which HMC is the master, and which HMC is the non-master. Info Info
Check that the nodes of the cluster are on different pools, which enables the Info Info
asynchronous mode of releasing resources.
Check that all HMCs are at level 7.8 or later. Info Warning
Check that for one given node, the total of optimal memory (of RGs on Warning Error
this node) that is added to the profile’s minimum does not exceed the
profile’s maximum.
Check that for one given node, the total of optimal CPU (of RGs on this Warning Error
node) that is added to the profile’s minimum does not exceed the profile’s
maximum.
Check that for one given node, the total of optimal PU (of RGs on this Warning Error
node) that is added to the profile’s minimum does not exceed the profile’s
maximum.
Check that the total processing units do not break the minimum Error Error
processing units per virtual processor ratio.
Example 9-6 shows an error message that gives good probable causes for a problem.
However, the following two actions can help you to discover the source of the problem:
Ping the HMC IP address.
Use the ssh hscroot@hmcip command to the HMC.
If ssh is unsuccessful or prompts for a password, this is an indication that SSH was not
correctly configured.
If the message in Example 9-7 on page 391 appears by itself, it is normally an indication that
access to the HMC is working, however the particular node’s matching LPAR definition is not
reporting that it is DLPAR-capable.
This might be caused by RMC not updating properly. Generally, this is rare, and usually
applies only to POWER4 systems. You can verify this manually from the HMC command line,
as shown in Example 9-8.
Note: The HMC command syntax can vary by HMC code levels and type.
Also, be sure that RMC communication to HMC (port 657) is working and restart the RSCT
daemons on the partitions by running these commands, in this order on the cluster node:
1. /usr/sbin/rsct/install/bin/recfgct
2. /usr/sbin/rsct/bin/rmcctrl -z
3. /usr/sbin/rsct/bin/rmcctrl –A
4. /usr/sbin/rsct/bin/rmcctrl –p
Also, restart the RSCT daemons on HMC the same as described here, but you must first
become root from the product engineering shell (pesh) and the hscpe user profile to do so.
This often requires getting a pesh password from IBM support.
During our testing, we ran several events within short periods of time. At certain points, our
LPAR reported that it was no longer DLPAR-capable. Then after a short period, it reported
normally again. We believe that this occurred because RMC information became out-of-sync
between the LPARs and the HMC and ultimately was a timing issue.
Requirements
– Two (2) IBM Power Systems 770 D model servers, both in one Power Enterprise Pool.
– One (1) PowerHA SystemMirror cluster with two nodes that are in different servers.
– The PowerHA SystemMirror cluster will manage the server’s free resources and
EPCoD mobile resource to automatically satisfy the application’s hardware
requirements before it is started.
Hardware topology
The hardware topology is shown in Figure 9-24.
There are two HMCs to manage the EPCoD, which are named e16hmc1 and e16hmc3. Here,
e16hmc1 is the master and e16hmc3 is the backup. There are two applications in this cluster
and the related resource requirement.
Cluster configuration
The cluster configuration is shown in Table 9-13.
CAA Unicast
Primary disk: repdisk1
Backup disk: repdisk2
ROHA configuration
The ROHA configuration includes the HMC, hardware resource provisioning, and the
cluster-wide tunable configuration.
HMC configuration
There are two HMCs to add, as shown in Table 9-14 and Table 9-15 on page 394.
Number of retries 2
Sites N/A
Number of retries 2
Sites N/A
Additionally, in /etc/hosts, there are resolution details between the HMC IP and the HMC
host name, as shown in Example 9-9.
I agree to use On/Off CoD and be billed for extra costs. No (default)
I agree to use On/Off CoD and be billed for extra costs. No (default)
Cluster-wide tunables
All the tunables use the default values, as shown in Table 9-18.
I agree to use On/Off CoD and be billed for extra costs. No (default)
Perform the PowerHA SystemMirror Verify and Synchronize Cluster Configuration process
after finishing the previous configuration by executing clmgr sync cluster.
9.4.2 Test scenario of Example1: Setting up one ROHA cluster without On/Off
CoD
Based on the cluster configuration in 9.4.1, “Example1: Setting up a ROHA cluster without
On/Off CoD” on page 391, this section introduces several testing scenarios:
Bringing two resource groups online
Moving a resource group to another node
Restarting with the current configuration after the primary node crashes
There are four steps for PowerHA SystemMirror to acquire resources. These steps are:
Query
Compute
Identify
Acquire
Query step
PowerHA SystemMirror queries the server, the EPCoD, the LPARs, and the current RG
information. The data is shown in yellow in Figure 9-25 on page 398.
Compute step
In this step, PowerHA SystemMirror computes how many resources are added by using
DLPAR. It needs 7C and 46 GB. The purple table shows the process in Figure 9-25 on
page 398. For example:
The expected total CPU number is as follows: 1 (Min) + 2 (RG1 requires) + 6 (RG2
requires) + 0 (running RG requires, there is no running RG) = 9C.
Take this value to compare with the LPAR’s profile needs less than or equal to the
Maximum and more than or equal to the Minimum value.
If the requirement is satisfied and takes this value minus the current running CPU, 9 - 2 =
7, you get the CPU number to add through the DLPAR.
Figure 9-25 Resource acquisition procedure to bring two resource groups online
Note: During this process, PowerHA SystemMirror adds mobile resources from EPCoD to
the server’s free pool first, then adds all the free pool’s resources to the LPAR by using
DLPAR. To describe the process clearly, the free pool means only the available resources
of one server before adding the EPCoD resources to it.
The orange tables in Figure 9-25 shows the result after the resource acquisition, and include
the LPAR’s running resource, EPCoD, and the server’s resource status.
Example 9-11 The hacmp.out log shows the resource acquisition process for example 1
# egrep "ROHALOG|Close session|Open session" /var/hacmp/log/hacmp.out
+RG1 RG2:clmanageroha[roha_session_open:162] roha_session_log 'Open session
Open session 22937664 at Sun Nov 8 09:11:39 CST 2015
INFO: acquisition is always synchronous.
=== HACMProhaparam ODM ====
--> Cluster-wide tunables display
ALWAYS_START_RG = 0
ADJUST_SPP_SIZE = 0
FORCE_SYNC_RELEASE = 0
AGREE_TO_COD_COSTS = 0
ONOFF_DAYS = 30
===========================
------------------+----------------+
HMC | Version |
------------------+----------------+
9.3.207.130 | V8R8.3.0.1 |
9.3.207.133 | V8R8.3.0.1 |
------------------+----------------+
------------------+----------------+----------------+
MANAGED SYSTEM | Memory (GB) | Proc Unit(s) |
------------------+----------------+----------------+
Name | rar1m3-9117-MMD-1016AAP | --> Server name
State | Operating |
Region Size | 0.25 | / |
VP/PU Ratio | / | 0.05 |
Installed | 192.00 | 12.00 |
Configurable | 52.00 | 8.00 |
Reserved | 5.00 | / |
Available | 5.00 | 4.00 |
Free (computed) | 5.00 | 4.00 | --> Free pool resource
------------------+----------------+----------------+
------------------+----------------+----------------+
LPAR (dedicated) | Memory (GB) | CPU(s) |
------------------+----------------+----------------+
Name | ITSO_S1Node1 |
State | Running |
Minimum | 8.00 | 1 |
Desired | 32.00 | 2 |
Assigned | 32.00 | 2 |
Maximum | 96.00 | 12 |
------------------+----------------+----------------+
+------------------+----------------+----------------+
| ENTERPRISE POOL | Memory (GB) | CPU(s) |
+------------------+----------------+----------------+
| Name | DEC_2CEC | --> Enterprise Pool Name
| State | In compliance |
| Master HMC | e16hmc1 |
| Backup HMC | e16hmc3 |
| Available | 100.00 | 4 | --> Available resource
| Unreturned (MS) | 0.00 | 0 |
| Mobile (MS) | 0.00 | 0 |
| Inactive (MS) | 140.00 | 4 | --> Maximum number to add
+------------------+----------------+----------------+
+------------------+----------------+----------------+
INIT_ONOFF_MEM_DAYS = 0
INIT_ONOFF_CPU = 4
INIT_ONOFF_CPU_DAYS = 20
SPP_SIZE_MAX = 0
DLPAR_MEM = 0
DLPAR_PROCS = 0
DLPAR_PROC_UNITS = 0
CODPOOL_MEM = 0
CODPOOL_CPU = 0
ONOFF_MEM = 0
ONOFF_MEM_DAYS = 0
ONOFF_CPU = 0
ONOFF_CPU_DAYS = 0
PARTITION = 0
MANAGED_SYSTEM = 0
ENTERPRISE_POOL = 0
PREFERRED_HMC_LIST = 0
OTHER_LPAR = 0
INIT_SPP_SIZE_MAX = 0
INIT_DLPAR_MEM = 0
INIT_DLPAR_PROCS = 0
INIT_DLPAR_PROC_UNITS = 0
INIT_CODPOOL_MEM = 0
INIT_CODPOOL_CPU = 0
INIT_ONOFF_MEM = 0
INIT_ONOFF_MEM_DAYS = 0
INIT_ONOFF_CPU = 0
INIT_ONOFF_CPU_DAYS = 0
SPP_SIZE_MAX = 0
DLPAR_MEM = 46
DLPAR_PROCS = 7
DLPAR_PROC_UNITS = 0
CODPOOL_MEM = 41
CODPOOL_CPU = 3
ONOFF_MEM = 0
ONOFF_MEM_DAYS = 0
ONOFF_CPU = 0
ONOFF_CPU_DAYS = 0
============================
Session_close:313] roha_session_log 'Close session 22937664 at Sun Nov 8 09:12:32 CST
2015'
Example 9-12 The update in the ROHA report shows the resource acquisition process for example 1
# clmgr view report roha
...
Managed System 'rar1m3-9117-MMD-1016AAP' --> this is P770D-01 server
Hardware resources of managed system
Installed: memory '192' GB processing units '12.00'
Configurable: memory '93' GB processing units '11.00'
Inactive: memory '99' GB processing units '1.00'
Available: memory '0' GB processing units '0.00'
...
Enterprise pool 'DEC_2CEC'
State: 'In compliance'
Master HMC: 'e16hmc1'
Backup HMC: 'e16hmc3'
Enterprise pool memory
Activated memory: '100' GB
Available memory: '59' GB
Unreturned memory: '0' GB
Enterprise pool processor
Activated CPU(s): '4'
Testing summary
The total time to bring the two resource groups online is 68 seconds (from 09:11:27 to
9.12:35), and it includes the resource acquisition time, as shown in Example 9-13.
In this case, we split this move into two parts: One is the RG offline at the primary node, and
the other is the RG online at the standby node.
Figure 9-26 Resource group offline procedure at the primary node during the resource group move
Query step
PowerHA SystemMirror queries the server, EPCoD, the LPARs, and the current RG
information. The data is shown in the yellow tables in Figure 9-26.
Compute step
In this step, PowerHA SystemMirror computes how many resources must be removed by
using the DLPAR. PowerHA SystemMirror needs 2C and 30 GB. The purple tables show the
process, as shown in Figure 9-26:
In this case, RG1 is released and RG2 is still running. PowerHA calculates how many
resources it can release based on whether RG2 has enough resources to run. So, the
formula is: 9 (current running) - 1 (Min) - 6 (RG2 still running) = 2C. Two CPUs can be
released.
PowerHA accounts for the fact that sometimes you adjust your current running resources
by using a manual DLPAR operation. For example, you add some resources to satisfy
another application that was not started with PowerHA. To avoid removing this kind of
resource, PowerHA must check how many resources it allocated before.
The total number is those resources that PowerHA freezes so that the number is not
greater than what was allocated before.
So in this case, PowerHA takes the value in the compute step to compare with the real
resources this LPAR allocated before. This value is stored in one ODM object database
(HACMPdryresop), and the value is 7. PowerHA SystemMirror selects the small one.
Figure 9-27 HMC message shows that there are unreturned resources that are generated
The unreturned resources can be viewed by using the clmgr view report roha command on
any of the cluster nodes as shown in Example 9-14.
From the HMC CLI, you can see the unreturned resources that are generated, as shown in
Example 9-15.
Example 9-15 Showing the unreturned resources and the status from the HMC CLI
hscroot@e16hmc1:~> lscodpool -p DEC_2CEC --level sys
name=rar1m3-9117-MMD-1016AAP,mtms=9117-MMD*1016AAP,mobile_procs=1,non_mobile_procs=8,unreturned_mobile
_procs=2,inactive_procs=1,installed_procs=12,mobile_mem=11264,non_mobile_mem=53248,unreturned_mobile_m
em=30720,inactive_mem=101376,installed_mem=196608
name=r1r9m1-9117-MMD-1038B9P,mtms=9117-MMD*1038B9P,mobile_procs=0,non_mobile_procs=16,unreturned_mobil
e_procs=0,inactive_procs=16,installed_procs=32,mobile_mem=0,non_mobile_mem=97280,unreturned_mobile_mem
=0,inactive_mem=230400,installed_mem=327680
When the DLPAR operation completes, the unreturned resources are reclaimed immediately,
and some messages are shown on the HMC in Figure 9-28. The Enterprise Pool’s status is
changed back to In compliance.
Figure 9-28 The unreturned resources are reclaimed after the DLPAR operation
You can see the changes from HMC CLI, as shown in Example 9-16.
Example 9-16 Showing the unreturned resource that is reclaimed from the HMC CLI
hscroot@e16hmc1:~> lscodpool -p DEC_2CEC --level sys
name=rar1m3-9117-MMD-1016AAP,mtms=9117-MMD*1016AAP,mobile_procs=1,non_mobile_procs
=8,unreturned_mobile_procs=0,inactive_procs=3,installed_procs=12,mobile_mem=11264,
non_mobile_mem=53248,unreturned_mobile_mem=0,inactive_mem=132096,installed_mem=196
608
name=r1r9m1-9117-MMD-1038B9P,mtms=9117-MMD*1038B9P,mobile_procs=0,non_mobile_procs
=16,unreturned_mobile_procs=0,inactive_procs=16,installed_procs=32,mobile_mem=0,no
n_mobile_mem=97280,unreturned_mobile_mem=0,inactive_mem=230400,installed_mem=32768
0
hscroot@e16hmc1:~> lscodpool -p DEC_2CEC --level pool
name=DEC_2CEC,id=026F,state=In compliance,sequence_num=41,master_mc_name=e16hmc1,
master_mc_mtms=7042-CR5*06K0040,backup_master_mc_name=e16hmc3,backup_master_mc_mtm
s=7042-CR5*06K0036,mobile_procs=4,avail_mobile_procs=3,unreturned_mobile_procs=0,m
obile_mem=102400,avail_mobile_mem=91136,unreturned_mobile_mem=0
Note: The Approaching out of compliance status is a normal status in the Enterprise
Pool, and it is useful when you need extra resources temporarily. The PowerHA
SystemMirror RG takeover scenario is one of those cases.
Example 9-17 The hacmp.out log file information about the resource group offline process
#egrep "ROHALOG|Close session|Open session" /var/hacmp/log/hacmp.out
...
===== Compute ROHA Memory =====
minimum + running = total <=> current <=> optimal <=> saved
8.00 + 40.00 = 48.00 <=> 78.00 <=> 30.00 <=> 46.00 : => 30.00 GB
============ End ==============
===== Compute ROHA CPU(s) =====
minimal + running = total <=> current <=> optimal <=> saved
1 + 6 = 7 <=> 9 <=> 2 <=> 7 : => 2 CPU(s)
============ End ==============
===== Identify ROHA Memory ====
Total Enterprise Pool memory to return back: 30.00 GB
Total On/Off CoD memory to de-activate: 0.00 GB
Total DLPAR memory to release: 30.00 GB
============ End ==============
=== Identify ROHA Processor ===
Total Enterprise Pool CPU(s) to return back: 2.00 CPU(s)
Total On/Off CoD CPU(s) to de-activate: 0.00 CPU(s)
Total DLPAR CPU(s) to release: 2.00 CPU(s)
============ End ==============
clhmccmd: 30.00 GB of Enterprise Pool CoD have been returned.
clhmccmd: 2 CPU(s) of Enterprise Pool CoD have been returned.
The following resources were released for application controllers App1Controller.
DLPAR memory: 30.00 GB On/Off CoD memory: 0.00 GB Enterprise Pool
memory: 30.00 GB.
DLPAR processor: 2.00 CPU(s) On/Off CoD processor: 0.00 CPU(s)
Enterprise Pool processor: 2.00 CPU(s)Close session 22937664 at Sun Nov 8
09:12:32 CST 2015
..
During the release process, the deallocation order is EPCoD, and then the local server’s free
pool. Because EPCoD is shared between different servers, the standby node running on
other servers always needs this resource to bring the RG online in a takeover scenario.
Note: Before the process of acquiring resources started, the resources – 2C and 30 GB –
were available in the Enterprise Pool, so they could be used by the standby node.
Figure 9-29 describes the resource acquisition process on the standby node ITSO_S2Node1.
This acquisition process differs from the scenario that is described in “Bringing two resource
groups online” on page 397. The resources that need to be added to the LPAR are one core
and six GB, which the system’s free pool can satisfy. The server does not need to acquire any
resources from EPCoD.
Removing the resource (2C and 30 GB) from the LPAR to a free pool on the primary node
took 257 seconds (10:52:51 - 10:57:08). However, there is no real concern with this time
because it is an asynchronous process.
Example 9-18 The key time stamp in hacmp.out on the primary node (ITSO_S1Node1)
# egrep "EVENT START|EVENT COMPLETED" hacmp.out
Nov 8 10:52:27 EVENT START: external_resource_state_change ITSO_S2Node1
Example 9-20 The key time stamp in hacmp.out on the standby node (ITSO_S1Node1)
#egrep "EVENT START|EVENT COMPLETED" hacmp.out
Nov 8 10:52:24 EVENT START: rg_move_release ITSO_S1Node1 1
Nov 8 10:52:24 EVENT START: rg_move ITSO_S1Node1 1 RELEASE
Nov 8 10:52:25 EVENT COMPLETED: rg_move ITSO_S1Node1 1 RELEASE 0
Nov 8 10:52:25 EVENT COMPLETED: rg_move_release ITSO_S1Node1 1 0
Nov 8 10:52:55 EVENT START: rg_move_fence ITSO_S1Node1 1
Nov 8 10:52:55 EVENT COMPLETED: rg_move_fence ITSO_S1Node1 1 0
Nov 8 10:52:57 EVENT START: rg_move_fence ITSO_S1Node1 1
Nov 8 10:52:57 EVENT COMPLETED: rg_move_fence ITSO_S1Node1 1 0
Nov 8 10:52:57 EVENT START: rg_move_acquire ITSO_S1Node1 1
Nov 8 10:52:57 EVENT START: rg_move ITSO_S1Node1 1 ACQUIRE
Nov 8 10:52:57 EVENT START: acquire_takeover_addr
Nov 8 10:52:58 EVENT COMPLETED: acquire_takeover_addr 0
Nov 8 10:53:15 EVENT COMPLETED: rg_move ITSO_S1Node1 1 ACQUIRE 0
Nov 8 10:53:15 EVENT COMPLETED: rg_move_acquire ITSO_S1Node1 1 0
Nov 8 10:53:15 EVENT START: rg_move_complete ITSO_S1Node1 1
Nov 8 10:53:43 EVENT START: start_server App1Controller
Nov 8 10:53:43 EVENT COMPLETED: start_server App1Controller 0
Nov 8 10:53:45 EVENT COMPLETED: rg_move_complete ITSO_S1Node1 1 0
Nov 8 10:53:47 EVENT START: external_resource_state_change_complete ITSO_S2Node1
Nov 8 10:53:47 EVENT COMPLETED: external_resource_state_change_complete ITSO_S2Node1 0
Restarting with the current configuration after the primary node crashes
This case introduces the Automatic Release After a Failure (ARAF) process. We simulate a
primary node that failed immediately. We do not describe how the RG is online on standby
node; we describe only what PowerHA SystemMirror does after the primary node restarts.
Assume that we activate this node with the current configuration, which means that this LPAR
still can hold the same amount of resources as before the crash.
The process is similar to “Resource group offline primary node ITSO_S1Node1” on page 403.
In this process, PowerHA SystemMirror tries to release all the resources that were held by the
two resource groups before.
Testing summary
If a resource was not released because of a PowerHA SystemMirror service crash or an AIX
operating system crash, PowerHA SystemMirror can perform the release operation
automatically after the node starts. This operation occurs before you start the PowerHA
SystemMirror service by using the smitty clstart or the clmgr start cluster commands.
Requirements
– Two (2) IBM Power Systems 770 D model servers, both in one Power Enterprise Pool
and each server has an On/Off CoD license.
– One (1) PowerHA SystemMirror cluster with two nodes that are in different servers.
– The PowerHA SystemMirror cluster will manage the server’s free resources and
EPCoD mobile resources, and On/Off CoD resources to automatically satisfy the
application’s hardware requirements before it is started.
Hardware topology
Figure 9-31 on page 412 shows the server and LPAR information for example 2.
There are two HMCs to manage the EPCoD, which are named e16hmc1 and e16hmc3. Here,
e16hmc1 is the master and e16hmc3 is the backup. There are two applications in this cluster
and related resource requirements.
If the tunable is set to 30, for example, it means that we want to activate the resources for 30
days. So, the tunable allocates 20 GB of memory only, and we have 20 GB On/Off CoD only,
even if we have 600 GB.Days available.
Cluster configuration
The Topology and RG configuration and HMC configuration is the same as shown Table 9-13
on page 393.
I agree to use On/Off CoD and be billed for extra costs Yes
I agree to use On/Off CoD and be billed for extra costs Yes
Cluster-wide tunables
All the tunables are at the default values, as shown in Table 9-21.
I agree to use On/Off CoD and be billed for extra costs Yes
This configuration requires that you perform a Verify and Synchronize Cluster Configuration
action after changing the previous configuration via clmgr sync cluster.
Example 9-21 Showing the ROHA data with the clmgr view report roha command
# clmgr view report roha
Cluster: ITSO_ROHA_cluster of NSC type
Cluster tunables --> Following is the cluster tunables
Dynamic LPAR
Start Resource Groups even if resources are insufficient: '0'
Adjust Shared Processor Pool size if required: '0'
Query step
PowerHA SystemMirror queries the server, EPCoD, the On/Off CoD, the LPARs, and the
current RG information. The data is shown in the yellow tables in Figure 9-32.
For the On/Off CoD resources, we do not display the available resources because there are
enough resources in our testing environment:
P770D-01 has 9959 CPU.days and 9917 GB.days.
P770D-02 has 9976 CPU.days and 9889 GB.days.
Compute step
In this step, PowerHA SystemMirror computes how many resources you must add through the
DLPAR. PowerHA SystemMirror needs 7C and 126 GB. The purple tables show this process
(Figure 9-32). We take the CPU resources as follows:
The expected total processor unit number is 0.5 (Min) + 3.5 (RG1 requirement) + 4.5 (RG2
requirement) +0 (running RG requirement (there is no running RG)) = 8.5C.
Take this value to compare with the LPAR’s profile, which must be less than or equal to the
Maximum value and more than or equal to the Minimum value.
If this configuration satisfies the requirement, then take this value minus the current
running CPU (8.5 - 1.5 = 7), and this is the number that we want to add to the LPAR
through DLPAR.
PowerHA SystemMirror gets the remaining 5 GB of this server, all 100 GB from EPCoD, and
21 GB from the On/Off CoD. The process is shown in the green table in Figure 9-32 on
page 416.
Note: During this process, PowerHA SystemMirror adds mobile resources from EPCoD to
the server’s free pool first, then adds all the free pool’s resources to the LPAR through
DLPAR. To describe this clearly, the free pool means the available resources of only one
server before adding the EPCoD resources to it.
The orange table in Figure 9-32 on page 416 shows the result of this scenario, including the
LPAR’s running resources, EPCoD, On/Off CoD, and the server’s resource status.
Example 9-22 The hacmp.out log shows the resource acquisition of example 2
===== Compute ROHA Memory =====
minimal + optimal + running = total <=> current <=> maximum
8.00 + 150.00 + 0.00 = 158.00 <=> 32.00 <=> 160.00 : => 126.00 GB
============ End ==============
=== Compute ROHA PU(s)/VP(s) ==
minimal + optimal + running = total <=> current <=> maximum
1 + 16 + 0 = 17 <=> 3 <=> 18 : => 14 Virtual Processor(s)
minimal + optimal + running = total <=> current <=> maximum
0.50 + 8.00 + 0.00 = 8.50 <=> 1.50 <=> 9.00 : => 7.00 Processing Unit(s)
============ End ==============
===== Identify ROHA Memory ====
Remaining available memory for partition: 5.00 GB
Total Enterprise Pool memory to allocate: 100.00 GB
Total Enterprise Pool memory to yank: 0.00 GB
Total On/Off CoD memory to activate: 21.00 GB for 30 days
Total DLPAR memory to acquire: 126.00 GB
============ End ==============
=== Identify ROHA Processor ===
Remaining available PU(s) for partition: 0.50 Processing Unit(s)
Total Enterprise Pool CPU(s) to allocate: 4.00 CPU(s)
Total Enterprise Pool CPU(s) to yank: 0.00 CPU(s)
Total On/Off CoD CPU(s) to activate: 3.00 CPU(s) for 30 days
Total DLPAR PU(s)/VP(s) to acquire: 7.00 Processing Unit(s) and 14.00
Virtual Processor(s)
============ End ==============
clhmccmd: 100.00 GB of Enterprise Pool CoD have been allocated.
clhmccmd: 4 CPU(s) of Enterprise Pool CoD have been allocated.
clhmccmd: 21.00 GB of On/Off CoD resources have been activated for 30 days.
clhmccmd: 3 CPU(s) of On/Off CoD resources have been activated for 30 days.
clhmccmd: 126.00 GB of DLPAR resources have been acquired.
clhmccmd: 14 VP(s) or CPU(s) and 7.00 PU(s) of DLPAR resources have been acquired.
...
After the RG acquisition is complete, the status of the On/Off CoD resource is shown in
Example 9-25.
For processor, PowerHA SystemMirror assigns three processors and the activation day is
30 days, so the total is 90 CPU.Day. (3*30=90), and the remaining available CPU.Day in the
On/Off CoD is 9869 (9959 - 90 = 9869).
For memory, PowerHA SystemMirror assigns 21 GB and the activation day is 30 days, so the
total is 630 GB.Day. (21*30=630), and the remaining available GB.Day in On/Off CoD is 9277
(9907 - 630 = 9277).
The process is similar to the one that is shown in “Moving a resource group to another node”
on page 403. In the release process, the deallocation order is:
1. On/Off CoD
2. EPCoD
3. Server’s free pool
After the release process completes, you can find the detailed information about compute,
identify, and release processes in the hacmp.out file, as shown in Example 9-26 on page 421.
Example 9-26 The hacmp.out log information in the release process of example 2
===== Compute ROHA Memory =====
minimum + running = total <=> current <=> optimal <=> saved
8.00 + 80.00 = 88.00 <=> 158.00 <=> 70.00 <=> 126.00 : => 70.00 GB
============ End ==============
=== Compute ROHA PU(s)/VP(s) ==
minimal + running = total <=> current <=> optimal <=> saved
1 + 9 = 10 <=> 17 <=> 7 <=> 14 : => 7 Virtual Processor(s)
minimal + running = total <=> current <=> optimal <=> saved
0.50 + 4.50 = 5.00 <=> 8.50 <=> 3.50 <=> 7.00 : => 3.50 Processing
Unit(s)
============ End ==============
===== Identify ROHA Memory ====
Total Enterprise Pool memory to return back: 49.00 GB
Total On/Off CoD memory to de-activate: 21.00 GB
Total DLPAR memory to release: 70.00 GB
============ End ==============
=== Identify ROHA Processor ===
Total Enterprise Pool CPU(s) to return back: 1.00 CPU(s)
Total On/Off CoD CPU(s) to de-activate: 3.00 CPU(s)
Total DLPAR PU(s)/VP(s) to release: 7.00 Virtual Processor(s) and 3.50
Processing Unit(s)
============ End ==============
clhmccmd: 49.00 GB of Enterprise Pool CoD have been returned.
clhmccmd: 1 CPU(s) of Enterprise Pool CoD have been returned.
The following resources were released for application controllers App1Controller.
DLPAR memory: 70.00 GB On/Off CoD memory: 21.00 GB Enterprise Pool memory: 49.00
GB.
DLPAR processor: 3.50 PU/7.00 VP On/Off CoD processor: 3.00 CPU(s) Enterprise
Pool processor: 1.00 CPU(s)
LPM allows you to eliminate downtime for planned hardware maintenance. For other
downtime – due to required software maintenance or unplanned outages – PowerHA
provides you with the ability to minimize downtime for those events.
PowerHA can be used within a partition that is capable of being moved with Live Partition
Mobility. The combination has been supported for many years as evident in IBM support page
– Support for LPM.
This does not mean that PowerHA uses Live Partition Mobility in any way. PowerHA is treated
as another application within the partition. Prior to PowerHA v7.2.0. performing LPM on a
PowerHA SystemMirror node in a cluster was a multi step manual process. An overview of the
process consisted of the following:
1. Stop cluster services on desired node with Unmanage option.
2. Disable Dead Man Switch monitoring in the cluster.
3. Perform the LPM.
4. Enable Dead Man Switch monitoring in the cluster.
5. Start cluster services on desired node with Auto manage option.
However, with PowerHA v7.2.x LPM support is integrated to simplify this process by
performing similar steps automatically. Details on this feature, known as LPM Node Policy, can
be found in 12.3, “Cluster tunables” on page 484.
If your environment has SANcomm defined then there are additional actions required to
perform LPM. See 9.5.1, “Performing LPM with SANcomm defined” on page 422 for details.
Important: You can perform LPM on a PowerHA SystemMirror LPAR that is configured
with SAN communication. However, when you use LPM, the SAN communication is not
automatically migrated to the destination system. You must configure SAN communication
on the destination system before you use LPM. Full details can be found at:
https://www.ibm.com/docs/en/powerha-aix/7.2?topic=mobility-configuring-san-comm
unication-lpm
In our scenario we have a two-node cluster, Jessica and Shanley, each on their own managed
systems named p750_4 and p750_2 respectively. Both systems and nodes are using
SANComm. We have a third managed system, p750_3 in which SANComm is not configured.
However its VIOS adapters are target-mode-capable and currently enabled.
In both scenarios, when we first start running the migration during the verification process, a
warning is displayed, as shown in Figure 9-34. Because this is only a warning, we can
continue.
The LPM process completes and node jessica is active and running on p750_3. However, of
course, SANComm is no longer functioning as shown by the lack of output from the
lscluster command output shown in Example 9-27.
Example 9-27
[jessica:root] /utilities # lscluster -m |grep sfw
[jessica:root] /utilities #
Next, we add new virtual Ethernet adapters that use VLAN3358 to each VIOS. We then run
cfmgr on each VIOS to configure the sfwcomm device. No further action is required on node
jessica because its profile already contains the proper virtual adapter.
The sfwcom devices shows automatically on jessica, as this Example 9-28 indicates.
Example 9-28
[jessica:root] /utilities # lscluster -m |grep sfw
sfwcom UP none none none
In the second scenario, we repeat the LPM. However, this time, the target system already has
both SANComm devices configured on its VIOS and the appropriate virtual Ethernet
adapters. During the LPM, we did notice a couple of seconds in which sfwcom did register as
being down, but then it automatically comes back online.
10
The attributes listed in Table 10-1 can influence the behavior of resource groups during
startup, fallover, and fallback. These are described further in the following sections.
With the settling time attribute, you can delay the acquisition of a resource group so that, in
the event of a higher priority node joining the cluster during the settling period, the resource
group will be brought online on the higher priority node instead of being activated on the first
available node.
[Entry Fields]
* Settling Time (in Seconds) [120] #
If this value is set and the node that joins the cluster is not the highest priority node, the
resource group will wait the duration of the settling time interval. When this time expires, the
resource group is acquired on the node that has the highest priority among the list of nodes
that joined the cluster during the settling time interval.
Remember that this is valid only for resource groups that use the startup policy, Online of
First Available Node.
During the acquisition of the resource groups on cluster startup, you can also see the settling
time value by running the clRGinfo -t command as shown in Example 10-1.
Note: A settling time with a non-zero value will be displayed only during the acquisition of
the resource group. The value will be set to 0 after the settling time expires and the
resource group is acquired by the appropriate node.
We specified a settling time of six minutes and configured a resource group named SettleRG1
to use the startup policy, Online on First Available Node. We set the node list for the
resource group so node jessica would fallover to node maddi.
For the first test, the following steps demonstrate how we let the settling time expire and how
the secondary node acquires the resource group:
1. With cluster services inactive on all nodes, define a settling time value of 360 seconds.
2. Synchronize the cluster via clmgr sync cluster.
3. Validate the settling time by running clmgr as follows:
[jessica:root] / # clmgr -a RG_SETTLING_TIME query cluster
RG_SETTLING_TIME="120”
4. Start cluster services on node maddi.
We started cluster services on this node because it was the last node in the list for the
resource group. After starting cluster services, the resource group was node acquired by
node maddi. Running the clRGinfo -t command displays the 360 seconds settling time
as shown in Example 10-2.
For the next test scenario, we demonstrate how the primary node will start the resource group
when the settling time does not expire.
1. Repeat the previous step 1 on page 428 through step 4 on page 428.
2. Start cluster services on node jessica.
After waiting about two minutes, after the cluster stabilized on node maddi, we started
cluster services on node jessica. This results in the resource group being brought online to
node jessica, as shown in Figure 10-3.
Note: This feature is effective only when cluster services on a node are started. This is not
enforced when C-SPOC is used to bring a resource group online.
Any pair of resource groups that do not have the following attributes might be processed in
any order, even if one of the resource groups of the pair has a relationship (serial order or
dependency) with another resource group.
[Entry Fields]
Attention: In PowerHA v7.2.7 the following warning is displayed and should be noted.
WARNING: The resource group serial acquisition and serial release ordering will be
removed in a future PowerHA SystemMirror release.
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
bdbrg ONLINE db2
OFFLINE web
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
bdbrg ONLINE db2
OFFLINE web
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
bdbrg ONLINE db2
OFFLINE web
Now upon stopping cluster the inverse, based on the configuration occurs as shown in
Example 10-4 from repetitive execution of clRGinfo.
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
bdbrg ONLINE db2
OFFLINE web
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
bdbrg RELEASING db2
OFFLINE web
-----------------------------------------------------------------------------
Group Name State Node
-----------------------------------------------------------------------------
bdbrg OFFLINE db2
OFFLINE web
This policy causes resource groups having this startup policy to spread across cluster nodes
in such a way that only one resource group is acquired by any node during startup. This can
be used, for instance, for distributing CPU-intensive applications on different nodes.
If two or more resource groups are offline when a particular node joins the cluster, this
policy determines which resource group is brought online based on the following criteria and
order of precedence:
1. The resource group with the least number of participating nodes will be acquired.
2. A parent resource group is preferred over a resource group that does not have any child
resource group.
Restriction: When utilizing the node distribution startup policy it is required that the
failback policy be set to Never Failback. Otherwise the following error will be displayed:
ERROR: Invalid configuration.
Resource Groups with Startup Policy 'Online Using Distribution Policy' can have
Only 'Never Fallback' as Fallback Policy.
[Entry Fields]
* Resource Group Name [bdbrg]
* Participating Nodes (Default Node Priority) [jessica maddi]
2. Start cluster services on node maddi. navyrg was acquired as shown in Example 10-7.
Airforcerg stays offline but can be brought online manually through C-SPOC. This is done by
running smitty cspoc, selecting Resource Group and Applications → Bring a Resource
Group Online, choosing airforcerg, and then choosing the node on which you want to start
it.
Important: Dynamic node priority is an available option only to clusters with three or more
nodes participating in the resource group.
The cluster manager queries the RMC subsystem every three minutes to obtain the current
value of these attributes on each node and distributes them cluster wide. The interval at which
the queries of the RMC subsystem are performed is not user-configurable. During a fallover
event of a resource group with dynamic node priority configured, the most recently collected
values are used in the determination of the best node to acquire the resource group.
For dynamic node priority (DNP) to be effective, consider the following information:
DNP cannot be used with fewer than three nodes.
DNP cannot be used for Online on All Available Nodes resource groups.
DNP is most useful in a cluster where all nodes have equal processing power and
memory.
Important: The highest free memory calculation is performed based on the amount of
paging activity taking place. It does not consider whether one cluster node has less real
physical memory than another.
For more details about how predefined DNP values are used, see 10.5.2, “How predefined
RMC based dynamic node priority functions” on page 439.
When you select one of the these criteria, you must also provide values for the DNP script
path and DNP time-out attributes for a resource group. When the DNP script path attribute is
specified, the given script is invoked on all nodes and return values are collected from all
nodes. The fallover node decision is made by using these values and the specified criteria. If
you choose the cl_highest_udscript_rc attribute, collected values are sorted and the node
that returned the highest value is selected as a candidate node to fallover. Similarly, if you
choose the cl_lowest_nonzero_udscript_rc attribute, collected values are sorted and the
node which returned lowest nonzero positive value is selected as a candidate node to fallover.
If the return value of the script from all nodes are the same or zero, the default node priority is
considered. PowerHA verifies the script existence and the execution permissions during
verification.
Demonstration: See the demonstration about user-defined adaptive fallover node priority:
https://www.youtube.com/watch?v=ajsIpeMkf38
[Entry Fields]
* Resource Group Name [DNP_test1]
* Participating Nodes (Default Node Priority) [ashley jessica maddi] +
3. Assign the resources to the resource group via smitty sysmirror fast path.
Select Cluster Applications and Resources → Resource Groups → Change/Show
Resources and Attributes for a Resource Group, choose the newly created resource
group from the pick list and press Enter.
Select one of the available policies by pressing F4 in the Dynamic Node Priority Policy. as
shown in Figure 10-6 on page 438.
Continue selecting the resources that will be part of the resource group as shown in
Figure 10-7 on page 438 and press Enter.
You can display the current DNP policy for an existing resource group as shown in
Example 10-9.
Notes:
Using the information retrieved directly from the ODM is for informational purposes only
because the format within the stanzas may differ between versions.
Hardcoding ODM queries within user-defined applications is not supported and should
be avoided.
The following resource monitors contain the information for each policy:
IBM.PhysicalVolume
IBM.Host
Each of these monitors can be queried during normal operation by running the commands
shown in Example 10-10.
resource 3:
Name = "hdisk6"
PVId = "0x00c472c0 0x6f48ceb0 0x00000000 0x00000000"
ActivePeerDomain = "redbook_cluster"
NodeNameList = {"jessica"}
resource 4:
Name = "hdisk5"
PVId = "0x00f92db1 0xbabc3344 0x00000000 0x00000000"
ActivePeerDomain = "redbook_cluster"
NodeNameList = {"jessica"}
resource 5:
Name = "hdisk2"
PVId = "0x00c472c0 0xde92e337 0x00000000 0x00000000"
ActivePeerDomain = "redbook_cluster"
NodeNameList = {"jessica"}
resource 6:
Name = "hdisk1"
PVId = "0x00c472c0 0x7bcbeb08 0x00000000 0x00000000"
ActivePeerDomain = "redbook_cluster"
NodeNameList = {"jessica"}
resource 7:
Name = "hdisk4"
PVId = "0x00c472c0 0x9f3e94c2 0x00000000 0x00000000"
ActivePeerDomain = "redbook_cluster"
NodeNameList = {"jessica"}
root@maddi[] lsrsrc -Ad IBM.PhysicalVolume
Resource Dynamic Attributes for IBM.PhysicalVolume
resource 1:
PctBusy = 1
RdBlkRate = 0
WrBlkRate = 930
XferRate = 4
resource 2:
PctBusy = 0
RdBlkRate = 75
WrBlkRate = 1
XferRate = 1
resource 3:
PctBusy = 0
RdBlkRate = 120
WrBlkRate = 1284
XferRate = 6
resource 4:
PctBusy = 0
RdBlkRate = 0
WrBlkRate = 39
XferRate = 4
resource 5:
PctBusy = 0
RdBlkRate = 0
WrBlkRate = 0
XferRate = 0
resource 6:
PctBusy = 0
RdBlkRate = 0
WrBlkRate = 0
XferRate = 0
resource 7:
PctBusy = 0
RdBlkRate = 0
WrBlkRate = 0
XferRate = 0
You can display the current table maintained by clstrmgrES in an active cluster by running the
command shown in Example 10-11.
The values in the table are used for the DNP calculation in the event of a fallover. If
clstrmgrES is in the middle of polling the current state when a fallover occurs, then the value
last taken when the cluster was in a stable state is used to determine the DNP.
[Entry Fields]
* Resource Group Name [DNPrg]
* Participating Nodes (Default Node Priority) [ashley jessica maddi] +
3. Assign the resources to the resource group via smitty sysmirror fast path, select Cluster
Applications and Resources → Resource Groups → Change/Show Resources and
Attributes for a Resource Group, choose the newly created resource group from the
pick list and press Enter.
4. Select one of the two available adaptive policies from the pick list by pressing F4 in the
Dynamic Node Priority Policy. as shown in Figure 10-9.
– cl_highest_udscript_rc
– cl_lowest_nonzero_udscript_rc
When using one of the two user defined policies, cl_highest_udscript_rc and
cl_lowest_nonzero_udscript_rc, it is required to also assign a DNP Script path and DNP
Script timeout value as shown in Figure 10-10 on page 444.
Continue selecting the resources that will be part of the resource group and press Enter.
5. Synchronize the cluster via clmgr sync cluster.
Figure 10-10 Assigning DNP script path and timeout attributes to resource group
You can display the current DNP policy for an existing resource group as shown in
Example 10-12.
HACMPresource:
group = "DNPrg"
type = ""
name = "NODE_PRIORITY_POLICY"
value = "cl_lowest_nonzero_udscript_rc"
id = 1
monitor_method = ""
HACMPresource:
group = "DNPrg"
type = ""
name = "SDNP_SCRIPT_PATH"
value = "/HA727/DNP.sh"
id = 2
monitor_method = ""
HACMPresource:
group = "DNPrg"
type = ""
name = "SDNP_SCRIPT_TIMEOUT"
value = "60"
id = 3
-------------------------------
NODE maddi
-------------------------------
#!/bin/ksh
exit 1
-------------------------------
NODE ashley
-------------------------------
#!/bin/ksh
exit 3
-------------------------------
NODE jessica
-------------------------------
#!/bin/ksh
exit 2
In our test we started the cluster and the resource group came online on node ashley as
designated by the startup policy. The starting resource group state is shown in
Example 10-14.
Although our default node priority list has jessica listed next because we are using DNP with
the lowest return code when a fallover occurs, it will actually fallover to node maddi. So we
perform a hard stop on ashley, via reboot -q, and the resource group activates on node
maddi as shown in Example 10-15.
Demonstration: See the demonstration about this exact fallover, albeit with different
PowerHA version and node names, at:
http://youtu.be/oP60-8nFstU
Upon successful reintegration of the original failed node, ashley, the resource group stays put
on node maddi because the fallback policy was set to Never Fallback. We now repeat the
previous test by failing the current hosting node, maddi, via reboot -q. This time it fails over to
node jessica, instead of node ashley, as shown in Example 10-16 because the DNP value is
lower.
Fallback Timer:
Sunday 12:00PM
Consider a simple scenario with a cluster having two nodes and a resource group. In the
event of a node failure, the resource group will fallover to the standby node. The resource
group remains on that node until the fallback timer expires. If cluster services are active on
the primary node at that time, the resource group will fallback to the primary node. If the
primary node is not available at that moment, the fallback timer is reset and the fallback will be
postponed until the fallback timer expires again.
[Entry Fields]
* Name of the Fallback Policy [daily515]
* HOUR (0-23) [17] #
* MINUTES (0-59) [15]
To assign a fallback timer policy to a resource group, complete the following steps:
1. Use the smitty sysmirror fast path and select Cluster Applications and Resources →
Resource Groups → Change/Show Resources and Attributes for a Resource Group.
Select a resource group from the list and press Enter.
2. Press the F4 to select one of the policies configured in the previous steps. The display is
similar to Example 10-18.
3. Select a fallback timer policy from the pick list and press Enter.
4. Add any extra resources to the resource group and press Enter.
5. Run verification and synchronization on the cluster to propagate the changes to all
cluster nodes.
HACMPtimer:
policy_name = "daily515"
recurrence = "daily"
year = -3800
month = 0
day_of_month = 1
week_day = 0
hour = 17
minutes = 30
For instance, a database must be online before the application server is started. If the
database goes down and falls over to a different node, the resource group that contains the
application server will also be brought down and back up on any of the available cluster
nodes. If the fallover of the database resource group is not successful, then both resource
groups (database and application) will be put offline.
When you plan to use Online on Different Nodes dependencies, consider these factors:
Only one Online On Different Nodes dependency is allowed per cluster.
Each resource group must have a different home node for startup.
When using this policy, a higher priority resource group takes precedence over a lower
priority resource group during startup, fallover, and fallback:
– If a resource group with High priority is online on a node, no other resource group that
is part of the Online On Different Nodes dependency can be put online on that node.
– If a resource group that is part of the Online On Different Nodes dependency is online
on a cluster node and a resource group that is part of the Online On Different Nodes
dependency and has a higher priority falls over or falls back to the same cluster node,
the resource group with a higher priority will be brought online. The resource group
with a lower priority resource group is taken offline or migrated to another cluster node
if available.
– Resource groups that are part of the Online On Different Nodes dependency and have
the same priority cannot be brought online on the same cluster node. The precedence
of resource groups that are part of the Online On Different Nodes dependency and
have the same priority is determined by alphabetical order.
– Resource groups that are part of the Online On Different Nodes dependency and have
the same priority do not cause each other to be moved from a cluster node after a
fallover or fallback.
– If a parent/child dependency is being used, the child resource group cannot have a
priority higher than its parent.
same priority level, can remain on the same node. The highest relative priority within
this set is the resource group that is listed first.
– Low Priority Resource Groups
Select the resource groups that will be part of the Online On Different Nodes
dependency and that should be acquired and brought online after all other resource
groups. On fallback and fallover, these resource groups are brought online on different
target nodes after the all higher priority resource groups are processed.
Higher priority resource groups moving to a cluster node can cause these resource
groups to be moved to another cluster node or be taken offline.
3. Continue configuring runtime policies for other resource groups or verify and synchronize
the cluster via clmgr sync cluster.
than are part of the Online of Same Node dependency will have the same priority as the
common resource group.
Only resource groups having the same priority and being part of an Online on Different
Nodes dependency relationship can be part of an Online on the Same Site dependency
relationship.
HACMPrgdependency:
id = 0
group_parent = "rg_parent"
group_child = "rg_child"
dependency_type = "PARENT_CHILD"
dep_type = 0
group_name = ""
root@maddi[] odmget HACMPrg_loc_dependency
HACMPrg_loc_dependency:
id = 1
set_id = 1
group_name = "rg_same_node2"
priority = 0
loc_dep_type = "NODECOLLOCATION"
loc_dep_sub_type = "STRICT"
HACMPrg_loc_dependency:
id = 2
set_id = 1
group_name = "rg_same_node_1"
priority = 0
loc_dep_type = "NODECOLLOCATION"
loc_dep_sub_type = "STRICT"
HACMPrg_loc_dependency:
id = 4
set_id = 2
group_name = "rg_different_node1"
priority = 1
loc_dep_type = "ANTICOLLOCATION"
loc_dep_sub_type = "STRICT"
HACMPrg_loc_dependency:
id = 5
set_id = 2
group_name = "rg_different_node2"
priority = 2
loc_dep_type = "ANTICOLLOCATION"
loc_dep_sub_type = "STRICT"
Note: Using the information retrieved directly from the ODM is for informational purposes
only, because the format within the stanzas might change with updates or new versions.
Hardcoding ODM queries within user-defined applications is not supported and should be
avoided.
We configure the first parent resource group as PCrg3, and the child as PCrg2. We also make
another parent resource group of PCrg2 and the child as PCrg1. Notice that the PCrg2
resource group is both a parent and child resource group as shown in Example 10-24 on
page 458. This is a three level nested relationship and is as far as is allowed.
When using the online on different nodes dependency its required to assign a priority to each
resource group. With this combination, ideally, to prevent any resource group from becoming
orphaned because of a failure, there should be one more node than resource groups. The
priorities are shown in Figure 10-12.
We start node jessica first, and because PCrg3 is the main parent resource group it comes
online but of course not any others because of the different node dependency. Then we start
node ashley and it acquires PCrg2 resource group because it is the child of PCrg3 and parent
of PCrg1. Then we start node maddi last it acquires PCrg1. All three of these states are
shown in Example 10-25.
[Entry Fields]
High Priority Resource Group(s) [PCrg3] +
Intermediate Priority Resource Group(s) [PCrg1 PCrg2] +
Low Priority Resource Group(s) [] +
# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
PCrg1 OFFLINE due to l ashley
OFFLINE due to l jessica
OFFLINE maddi
# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
PCrg1 OFFLINE ashley
OFFLINE jessica
ONLINE maddi
Upon failing node jessica via reboot -q, there is a cascading affect because of the
combination of dependencies utilized. Essentially ALL resource groups come offline at some
point. PCrg1 is completely taken offline because it is both the lowest child along with lowest
priority. PCrg2 temporarily comes offline because it is a child of PCrg3. PCrg3 is acquired by
node ashley, that was previously hosting the lowest resource group of PCrg1. PCrg2 is then
reacquired by node maddi. Though the restart of PCrg2 is not clearly depicted in the output,
the end result is shown Example 10-26.
To expand on the test scenario we restart cluster services on jessica and acquired the
previously orphaned resource group PCrg1 as shown in Example 10-27 on page 460.
We now fail node maddi, this results in PCrg1 coming offline and PCrg2 acquired by node
jessica an PCrg3 left in place as shown in Example 10-28.
11
To check or change the response to losing access to rootvg perform the following:
1. Enter smitty sysmirror, and then select Custom Cluster Configuration → Events →
System Events → Change/Show Event Response choose ROOTVG from the pop-up
picklist and press Enter.
2. Highlight Response and press F4 to get a picklist to pop-up with the following options:
Log event and reboot As the description implies it will both log the event and initiate a
node reboot. This is the default value.
Only log the event This will only log the event as implied.
3. Highlight Active and press F4 to get a picklist to pop-up with the following options:
Yes The system event will be monitored and reacted to based on the
response criteria select. This is the default value and generally
recommended.
No The system event will not be monitored and response selected is
irrelevant.
4. Upon completing the selections shown in Figure 11-1 press Enter.
5. Synchronize the cluster via clmgr sync cluster.
[Entry Fields]
* Event Name ROOTVG +
* Response Only log the event +
* Active Yes +
Figure 11-1 The rootvg system event
Note: The options of when and where to process the resource are not as granular as using
custom events. However they are suitable for most requirements.
[Entry Fields]
* Resource Type Name [specialrestype]
* Processing Order [FIRST] +
Verification Method []
Verification Type [Script] +
* Start Method [/HA727/custom.sh star>
* Stop Method [/HA727/custom.sh stop]
Monitor Method []
Cleanup Method []
Restart Method []
Failure Notification Method []
Required Attributes []
Optional Attributes []
Description [Test for Redbook]
Figure 11-2 Create user-defined resource type
Start Method
Enter the name of the script and its full path name (followed by arguments) to be called by
the cluster event scripts to start the user-defined resource. Use a maximum of 256
characters. This script must be in the same location on each cluster node that might start
the server. The contents of the script, however, might differ.
Stop Method
Enter the full path name of the script to be called by the cluster event scripts to stop the
user-defined resource. Use a maximum of 256 characters. This script must be in the same
location on each cluster node that might stop the resource. The contents of the script,
however, might differ.
Monitor Method
Enter the full path name of the script to be called by the cluster event scripts to monitor the
user-defined resource. Use a maximum of 256 characters. This script must be in the same
location on each cluster node that might monitor the monitor. The contents of the script,
however, might differ.
Cleanup Method
Optional: Specify a resource cleanup script to be called when a failed user-defined
resource is detected, before calling the restart method. The default for the cleanup script is
the stop script defined when the user-defined resource type was set up. If you are
changing the monitor mode to be used only in the startup monitoring mode, the method
specified in this field does not apply, and PowerHA SystemMirror ignores values entered in
this field.
Note: With monitoring, the resource stop script may fail because the resource is
already stopped when this script is called.
Restart Method
The default restart method is the resource start script defined previously. You can specify
a different method here, if desired. If you change the monitor mode to be used only in the
startup monitoring mode, the method specified in this field does not apply, and PowerHA
SystemMirror ignores values entered in this field.
Failure Notification Method
Define a notify method to run when the user-defined resource fails. This custom method
runs during the restart process and during notify activity. If you are changing the monitor
mode to be used only in the startup monitoring mode, the method specified in this field
does not apply, and PowerHA SystemMirror ignores values entered in this field.
Required Attributes
Specify a list of attribute names, with each name separated by a comma. These attributes
must be assigned with values when you create the user-defined resource, for example,
Rattr1,Rattr2. The purpose of the attributes is to store resource-specific attributes, which
can be used in the different methods specified in the resource type configuration.
Optional Attributes
Specify a list of attribute names, with each name separated by a comma. These attributes
might or might not be assigned with values when creating the user-defined resource, for
example, Oattr1, Oattr2. The purpose of the attributes is to store resource-specific
attributes which can be used in the different methods specified in the resource type
configuration.
Description
Provide a description of the user-defined resource type.
[Entry Fields]
* Resource Type Name specialrestype
* Resource Name [shawnsresource]
Attribute data []
Note: The resource name must be unique across the cluster. When you define a
volume group as a user-defined resource for a Peer-to-Peer Remote Copy (PPRC)
configuration or a HyperSwap configuration, the resource name must match the volume
group.
Attribute data
Specify a list of attributes and values in the form of attribute=value, with each pair
separated by a space as in the following example:
Rattr1="value1" Rattr2="value2" Oattr1="value3"
Once you are done, you must add the resource to a resource group for it to be used.
Tape Resources [] +
Raw Disk PVIDs [] +
Raw Disk UUIDs/hdisks [] +
Disk Error Management? no +
Miscellaneous Data []
WPAR Name [] +
User Defined Resources [shawnsresource] +
Figure 11-4 Add user-defined resource into resource group
5. Upon completion, synchronize the cluster for the new resource to be used via clmgr sync
cluster.
The name and location of scripts must be identical on all cluster nodes. However, the
content of the scripts might be different.
Thoroughly document your scripts.
Remember to set the execute bit for all scripts.
Remember that synchronization does not copy pre-event and post-event script content
from one node to another. You need to copy pre-event and post-event scripts on all cluster
nodes. You could also utilize file collections to keep them in sync, assuming the scripts are
intended to be identical on each node.
Important: The cluster will not continue processing events until the custom pre-event or
post-event script finishes running. If a problem with the scripts are encountered this can
lead to a CONFIG_TOO_LONG and a resource group ERROR state.
Only the following events occur during parallel processing of resource groups:
node_up
node_down
acquire_svc_addr
acquire_takeover_addr
release_svc_addr
release_takeover_addr
start_server
stop_server
The following events do not occur during parallel processing of resource groups:
get_disk_vg_fs
release_vg_fs
node_up_ local
node_up_remote
node_down_local
node_down_remote
node_up_local_complete
node_up_remote_complete
node_down_local_complete
node_down_remote_complete
Always be attentive to the list of events when you upgrade from an older version and choose
parallel processing for some of the pre-existing resource groups in your configuration.
Note: When trying to adjust the default behavior of an event script, always use pre-event or
post-event scripts. Do not modify the built-in event script files. This option is neither
supported nor safe because these files can be modified without notice when applying fixes
or performing upgrades.
To define a pre-event or post-event script, you must create a custom event and then associate
the custom event with a cluster event as follows:
1. Write and test your event script carefully. Ensure that you copy the file to all cluster nodes
under the same path and name.
2. Define the custom event:
a. Run smitty sysmirror fast path and select Custom Cluster Configuration →
Events → Cluster Events → Configure Pre/Post-Event Commands → Add a
Custom Cluster Event.
b. Complete the following information:
• Cluster Event Name: The name of the event.
• Cluster Event Description: A short description of the event.
• Cluster Event Script Filename: The full path of the event script.
3. Connect the custom event with pre/post-event cluster event:
a. Run smitty sysmirror fast path and select Custom Cluster Configuration →
Events → Cluster Events → Change/Show Pre-Defined Events.
b. Select the event that you want to adjust.
c. Enter the following values:
• Notify Command (optional): The full path name of the notification command, if any.
• Pre-event Command (optional): The name of a previously created custom cluster
event that you want to run as a pre-event. You can choose from the custom cluster
event list that have been previously defined.
• Post-event Command (optional): The name of a previously custom cluster event
that you want to run as a post-event. You can choose from the custom cluster event
list that have been previously defined.
• Fail event if pre or post event fails (Yes or No): By default the exit status returned by
these commands is ignored and does not affect the exit status of the main event. If
you select yes for this option, any non-zero exit status from a pre or post event
command will be treated like a failure of the main event. Further, if the pre event
command fails, the main event and post events. will not be called. And if the main
event fails, the post event will not be called. In all cases the notify command will be
called after a failure.
4. Verify and synchronize the cluster via clmgr sync cluster.
Tips:
You can use cluster file collection feature to ensure that custom event files will be
propagated automatically to all cluster nodes.
If you use pre-event and post-event scripts to ensure proper sequencing and correlation
of resources used by applications running on the cluster,
you can consider simplifying or even eliminating them by specifying parent/child
dependencies between resource groups.
PowerHA provides a SMIT interface to the AIX error notification function. Use this function to
detect an event that is not specifically monitored by the PowerHA (for example, a disk adapter
failure) and to trigger a response to this event.
Before you configure automatic error notification, a valid cluster configuration must be in
place.
Automatic error notification applies to selected hard, non-recoverable error types such as
those that are related to disks or disk adapters. This utility does not support media errors,
recovered errors, or temporary errors.
Enabling automatic error notification assigns one of two error notification methods for all error
types as follows:
The non-recoverable errors pertaining to resources that have been determined to
represent a single point of failure are assigned the cl_failover method and will trigger a
fallover.
All other non-critical errors are assigned the cl_logerror method and an error entry will be
logged against the hacmp.out file.
PowerHA automatically configures error notifications and recovery actions for several
resources and error types including these items:
All disks in the rootvg volume group.
All disks in cluster volume groups, concurrent volume groups, and file systems.
All disks defined as cluster resources.
To set up automatic error notifications, use the smitty sysmirror fast path and select
Problem Determination Tools → PowerHA SystemMirror Error Notification →
Configure Automatic Error Notification → Add Error Notify Methods for Cluster
Resources.
Note: You cannot configure automatic error notification while the cluster is running as
shown by the following output:
jessica :
jessica : HACMP Resource Error Notify Method
jessica :
jessica : hdisk0 /usr/es/sbin/cluster/diag/cl_failover
jessica : hdisk1 /usr/es/sbin/cluster/diag/cl_logerror
jessica : hdisk2 /usr/es/sbin/cluster/diag/cl_logerror
jordan:
jordan: HACMP Resource Error Notify Method
jordan:
With PowerHA, you can customize the error notification method for other devices and error
types and define a specific notification method, rather than using one of the two automatic
error notification methods.
After an error notification is defined, PowerHA offers the means to emulate it. You can
emulate an error log entry with a selected error label. The error label is listed in the error log
and the notification method is run by errdemon.
[Entry Fields]
Error Label Name EPOW_RES_CHRP
Notification Object Name diagela
Notify Method /usr/lpp/diagnostics/bi
Figure 11-5 Error log emulation
To change the total event duration time before receiving a config_too_long warning
message, complete these steps:
1. Use the smitty sysmirror fast path and select Custom Cluster Configuration →
Events → Cluster Events → Change/Show Time Until Warning.
2. Complete these fields:
– Max. Event-only Duration (in seconds)
The maximum time (in seconds) to run a cluster event. The default is 180 seconds.
– Max. Resource Group Processing Time (in seconds)
The maximum time (in seconds) to acquire or release a resource group. The default is
180 seconds.
– Total time to process a Resource Group event before a warning is displayed
The total time for the Cluster Manager to wait before running the config_too_long
script. The default is six minutes. This field is the sum of the two other fields and is not
editable.
3. Press Enter to complete the change.
4. Verify and synchronize the cluster to propagate the changes via clmgr sync cluster.
For more details regarding resources and RMC see the IBM RSCT website.
Recovery programs
A recovery program consists of a sequence of recovery command specifications that has the
following format:
:node_set recovery_command expected_status NULL
Where:
node_set: The set of nodes on which the recovery program will run and can take one of
the following values:
– all: The recovery command runs on all nodes.
– event: The node on which the event occurred.
– other: All nodes except the one on which the event occurred.
recovery_command: String (delimited by quotation marks) that specifies a full path to the
executable program. The command cannot include any arguments. Any executable
program that requires arguments must be a separate script. The recovery program must
have the same path on all cluster nodes. The program must specify an exit status.
expected_status: Integer status to be returned when the recovery command completes
successfully. The Cluster Manager compares the actual status returned against the
expected status. A mismatch indicates unsuccessful recovery. If you specify the character
X in the expected status field, Cluster Manager will skip the comparison.
NULL: Not used, included for future functions.
Multiple recovery command specifications can be separated by the barrier command. All
recovery command specifications before a barrier start in parallel. When a node encounters a
barrier command, all nodes must reach the same barrier before the recovery program
resumes.
– Selection String: An SQL expression that includes attributes of the resource instance.
– Expression: Relational expression between dynamic resource attributes. When the
expression evaluates true it generates an event.
– Rearm expression: An expression used to generate an event that alternates with an
original event expression until it is true, then the rearm expression is used until it is
true, then the event expression is used, and so on. Usually the logical inverse or
complement of the event expression.
3. Press Enter to create the error notification object.
[Entry Fields]
* Event name [user_defined_event]
* Recovery program path [/user_defined.rp]
* Resource name [IBM.FileSystem]
* Selection string [name = "/var"]
* Expression [PercentTotUsed > 70]
Rearm expression [PercentTotUsed < 50]
For additional details regarding user-defined events, see the Planning PowerHA
SystemMirror guide.
12
With PowerHA SystemMirror 7.2, unicast communications are always used between sites in a
linked cluster. Within a site, you can select unicast (the default) or multicast communications.
Note: PowerHA v7.1.0 through v7.1.2 require the use of multicast within a site. PowerHA
v7.1.3 introduced unicast as an option, making multicast optional.
PowerHA uses a cluster health management layer embedded as part of the operating system
called Cluster Aware AIX (CAA). CAA uses kernel-level code to exchange heartbeats over
network, SAN fabric (when correct Fibre Channel adapters are deployed), and also
disk-based messaging through the central repository.
If multicast is selected during the initial cluster configuration, an important factor is that the
multicast traffic be able to flow between the cluster hosts in the data center before the cluster
formation can be attempted. Plan to test and verify the multicast traffic flow between the
“would-be” cluster nodes before attempting to create the cluster. Review the guidelines in the
following sections to test the multicast packet flow between the hosts.
Network switches
Hosts communicate over the network fabric that might consist of many switches and routers.
A switch connects separate hosts and network segments and allows for network traffic to be
sent to the correct place. A switch refers to a multiport network bridge that processes and
routes data at the data link layer (Layer 2) of the OSI model. Some switches can also process
data at the network layer (Layer 3).
IGMP communication protocol is used by the hosts and the adjacent routers on IP networks
to interact and establish rules for multicast communication, in particular establish multicast
group membership. Switches that feature IGMP snooping derive useful information by
observing these IGMP transactions between the hosts and routers. This enables the switches
to correctly forward the multicast packets when needed to the next switch in the network path.
IGMP snooping
IGMP snooping is an activity performed by the switches to track the IGMP communications
packet exchanges and adapt the same in regards to filtering the multicast packets. Switches
monitor the IGMP traffic and allow out the multicast packets only when necessary. The switch
typically builds an IGMP snooping table that has a list of all the ports that have requested a
particular multicast group and uses this table to allow or disallow the multicast packets to flow.
Multicast routing
The network entities that forward multicast packets by using special routing algorithms are
referred to as mrouters. Also, router vendors might implement multicast routing – refer to the
router vendor’s documentation and guidance. Hosts and other network elements implement
mrouters and allow for the multicast network traffic to flow appropriately. Some traditional
routers also support multicasting packet routing.
When switches are cascaded, or chained, setting up the switch to forward the packets might
be necessary to implement mrouting. However, this might be one of the possible approaches
to solving multicast traffic flow issues in the environment. See the switch vendor’s
documentation and guidance regarding setting up the switches for multicast traffic.
Multicast testing
Do not attempt to create the cluster by using multicast until you verify that multicast traffic
flows without interruption between the nodes that will be part of the cluster. Clustering will not
continue if the mping test fails. If problems occur with the multicast communication in your
network environment, contact the network administrator and review the switches involved and
the setup needed. After the setup is complete, retest the multicast communication.
The mping test: One of the simplest methods to test end-to-end multicast communication
is to use the mping command available on AIX. On one node, start the mping command in
receive mode and then use mping command to send packets from another host. If multiple
hosts will be part of the cluster, test end-to-end mping communication from each host to the
other.
The mping command can be invoked with a particular multicast address or it chooses a
default multicast address. A test for our cluster is shown in Example 12-1.
As the address input to mping, use the actual multicast address that will be used during
clustering. CAA creates a default multicast address if one is not specified during cluster
creation. This default multicast address is formed by combining (using OR) 228.0.0.0 with the
lower 24 bits of the IP address of the host. As an example, in our case the host IP address is
192.168.100.51, so the default multicast address is 228.168.100.51.
Troubleshooting
If mping fails to receive packets from host to host in the network environment, some issue in
the network path exist in regards to multicast packet flow.
Disable IGMP snooping on the switches. Most switches allow for disabling IGMP
snooping. If your network environment allows, disable the IGMP snooping and allow all
multicast traffic to flow without any problems across switches.
If your network requirements do not allow snooping to be disabled, debug the problem by
disabling IGMP snooping and then adding network components, one at a time, for
snooping.
Debug, if necessary, by eliminating any cascaded switch configurations by having only one
switch between the hosts.
This network-wide attribute can be used to customize the load balancing of PowerHA service
IP labels, taking into consideration any persistent IP labels that are already configured. The
distribution selected is maintained during cluster startup and subsequent cluster events. The
distribution preference will be maintained, if acceptable network interfaces are available in the
cluster. However, PowerHA will always keep service IP labels active, even if the preference
cannot be satisfied.
The placement of the service IP labels can be specified with these distribution preferences:
Anti-Collocation
This is the default, and PowerHA distributes the service IP labels across all boot IP
interfaces in the same PowerHA network on the node. The first service label placed on the
interface will be the source address for all outgoing communication on that interface.
Collocation
PowerHA allocates all service IP addresses on the same boot IP interface.
Collocation with persistent label
PowerHA allocates all service IP addresses on the boot IP interface that is hosting the
persistent IP label. This can be useful in environments with VPN and firewall configuration,
where only one interface is granted external connectivity. The persistent label will be the
source address.
Collocation with source
Service labels are mapped using the Collocation preference. You can choose one service
label as a source for outgoing communication. The service label chosen in the next field is
the source address.
Collocation with Persistent Label and Source
Service labels will be mapped to the same physical interface that currently has the
persistent IP label for this network. This choice will allow you to choose one service label
as source for outgoing communication. The service label chosen in the next field is source
address.
Anti-Collocation with source
Service labels are mapped using the Anti-Collocation preference. If not enough adapters
are available, more than one service label can be placed on one adapter. This choice
allows one label to be selected as the source address for outgoing communication.
[Entry Fields]
* Network Name net_ether_01
* Distribution Preference Collocation with Persistent Label +
Source IP Label for outgoing packets +
+--------------------------------------------------------------------------+
| Source IP Label for outgoing packets |
| |
| Move cursor to desired item and press Enter. |
| |
| dallasserv |
| ftwserv |
| |
| F1=Help F2=Refresh F3=Cancel |
F1| F8=Image F10=Exit Enter=Do |
F5| /=Find n=Find Next |
F9+--------------------------------------------------------------------------+
5. Press Enter to accept your selection and update the PowerHA ODM on the local node.
6. Verify and synchronize the cluster via clmgr sync cluster. If the cluster is active, it will
initiate a dynamic reconfiguration. See note below.
Network net_ether_01
NODE maddi:
ftwserv 10.10.10.52
dallasserv 10.10.10.51
maddi_xd 192.168.150.52
NODE jessica:
ftwserv 10.10.10.52
dallasserv 10.10.10.51
jessica_xd 192.168.150.51
NODE ashley:
ftwserv 10.10.10.52
dallasserv 10.10.10.51
ashley_xd 192.168.150.53
Network net_ether_01 is using the following distribution preference for service labels:
Collocation with persistent - service label(s) will be mapped to the same interface as the
persistent label.
Network net_ether_010
NODE maddi:
maddi 192.168.100.52
NODE jessica:
jessica 192.168.100.51
NODE ashley:
ashley 192.168.100.53
This was visible in the output of the netstat -i command, as shown in Example 12-3 on
page 484.
jessica-# netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 0.2.55.4f.c4.ab 5044669 0 1828909 0 0
en0 1500 10.10.31 jessicaa 5044669 0 1828909 0 0
en0 1500 192.168.100 jessicapers 5044669 0 1828909 0 0
en0 1500 192.168.100 maddisvc 5044669 0 1828909 0 0
en0 1500 192.168.100 ashleysvc 5044669 0 1828909 0 0
en3 1500 link#3 0.20.35.e2.7f.8d 3191047 0 1410806 0 0
en3 1500 10.10.32 jessicab 3191047 0 1410806 0 0
lo0 16896 link#1 1952676 0 1957548 0 0
lo0 16896 127 localhost 1952676 0 1957548 0 0
lo0 16896 localhost 1952676 0 1957548 0 0
Note: In this output, the node jessica had the resource groups for nodes maddi and ashley
and their corresponding service IP addresses. The distribution policy was set to
Collocation with persistent.
Our testing of the dynamic change of this policy resulted in no move of any of the labels after
a synchronization. The following message was logged during the synchronization of the
cluster after making the service IP distribution policy change:
Verifying additional pre-requisites for Dynamic Reconfiguration...
Note: For this instance, the message logged is generic and gets reported only because a
change was detected. As long as that was the only change made, no actual resources will
be brought offline.
A change to the service IP distribution policy is only enforced when we manually invoke a
swap event or stop and restart PowerHA on a node. This is the intended behavior of the
feature to avoid any potential disruption of connectivity to those IP addresses. The remaining
cluster nodes will not enforce the policy unless cluster services are also stopped and
restarted on them.
automatically replied to by the other nodes in the cluster. The packet exchanges are used to
calculate the round-trip time.
The round-trip time (rtt) value is shown in the output of the lscluster -i and lscluster -m
commands. The mean deviation in network rtt is the average round-trip time, which is
automatically managed by CAA.
Node failure detection timeout: This is the time in seconds that the health
management layer waits before start preparing to declare a node failure indication. The
minimum and maximum values are in effect to CAA layer and can be seen by running the
below command clctrl -tune -L node_timeout. Consider the MIN and MAX values for
network_fdt and the values shown in the output of command are in milliseconds, enter the
values in this screen in seconds.
Node failure detection grace period: This is the time in seconds that the node will wait
after the Node Failure Detection Timeout before actually declaring that a node has failed.
The minimum and maximum values are in effect to CAA layer and can be seen by running
the below command clctrl -tune -L node_down_delay. Consider the MIN and MAX
values for network_fdt and the values shown in the output of command are in milliseconds,
enter the values in this screen in seconds.
Node failure detection timeout during LPM: If specified, this timeout value (in
seconds) will be used during a Live Partition Mobility (LPM) instead of the Node Failure
Detection Timeout value. You can use this option to increase the Node Failure Detection
Timeout during the LPM duration to ensure it will be greater than the LPM freeze duration
in order to avoid any risk of unwanted cluster events. Enter a value between 10 and 600.
LPM Node Policy: Specifies the action to be taken on the node during a Live Partition
Mobility operation. If “unmanage” is selected, the cluster services are stopped with the
'Unmanage Resource Groups' option during the duration of the LPM operation. Otherwise
PowerHA SystemMirror will continue to monitor the resource group(s) and application
availability.
Repository Mode: Controls the node behavior when cluster repository disk access is lost.
Valid values are Assert or Event with the latter being the default. When the value is set to
Assert, the node will crash upon losing access to the cluster primary repository without
moving to backup repositories. When the value is set to Event, an AHAFS event is
generated.
Config Timeout: Specifies the CAA config timeout for configuration change. A positive
value indicates the maximum number of seconds CAA will wait on the execution of client
side callouts including scripts and CAA configuration code. A value of zero disables the
timeout. The default value is 240 seconds. The valid range is 0-2147483647.
Disaster Recovery: To enable or disable the CAA PVID based identification when UUID
based authentication failed. Value 1 is default enabled. The 0 value is disabled.
PVM Watchdog Timer: Controls the behavior of the CAA PVM Watchdog timer. Valid
values are:
– DISABLE: As the name implies when chosen the tunable is disabled.
– DUMP_RESTART: CAA will dump and restart the LPAR when VM fails to reset the timer.
– HARD_RESET: CAA will hard reset the LPAR when VM fails to reset the timer.
– HARD_POWER_OFF: CAA will hard power off the LPAR when VM fails to reset the timer,
user will have to bring up the LPAR using HMC options.
[Entry Fields]
When cluster sites are used, specifically linked sites, the options differ.
[Entry Fields]
Most changes require a cluster synchronization, however this specific change is dynamic so a
cluster restart is not required. But rarely is there a good reason to not perform at least a
verification after a change has been made.
Unstable Period (seconds): Network instability occurs when there are an excessive
amount of events received over a period of time. The unstable period defines the period of
time used to determine instability. If the threshold number of events is received inside of
the unstable period, the network is declared as unstable. Provide an integer value
between 1 and 120 seconds.
4. Upon completion synchronize the cluster.
Change/Show a Network
[Entry Fields]
* Network Name net_ether_010
New Network Name [bdb_net]
* Network Type [XD_data] +
* Netmask(IPv4)/Prefix Length(IPv6) [255.255.255.0]
* Network attribute public +
* Unstable Threshold [3] #
* Unstable Period (seconds) [90] #
It also can be used in combination with regular service IP labels and persistent IP labels. In
general, use persistent IP labels, especially one that is node-bound with XD_data networks,
because no communication occurs through the service IP label that is configurable on
multiple nodes.
To configure and use site-specific service IP labels, obviously sites must be defined to the
cluster. After you add a cluster and add nodes to the cluster, complete these steps:
1. Add sites.
2. Add more networks as needed (ether, XD_data, or XD_ip):
– Add interfaces to each network
3. Add service IP labels:
– Configurable on multiple nodes
– Specify the associated site
In our test scenario, we have a two-node cluster (maddi and jessica nodes) that currently has
a single ether network with a single interface defined to it. We also have a volume group
available on each node named xsitevg. Our starting topology is shown in Example 12-4.
NODE maddi:
Network net_ether_010
maddi 192.168.100.52
NODE jessica:
Network net_ether_010
jessica 192.168.100.51
Adding sites
To define the sites, complete these steps:
1. Run smitty sysmirror fast path, select Cluster Nodes and Networks → Manage Site →
Add a Site, and press Enter.
2. We add the two sites dallas and fortworth. Node jessica is a part of the dallas site; node
maddi is a part of the fortworth site. The Add a Site menu is shown in Figure 12-5 on
page 490.
Add a Site
[Entry Fields]
* Site Name [dallas]
* Site Nodes jessica +
Cluster Type Standard
Also not that after adding sites, the cluster type automatically changes from Standard to
Stretched. The next site added will automatically show the new cluster type. This is fine for
cross-site LVM configurations but will not be if you have plans to use PowerHA Enterprise
Edition and linked cluster. In that case, you must delete and re-create the cluster. This is all
shown in the output from adding the site (Figure 12-6).
COMMAND STATUS
Adding a network
To define the additional network, complete these steps (see Figure 12-7 on page 491):
1. Using the smitty sysmirror fast path, select Cluster Nodes and Networks → Manage
Networks and Network Interfaces → Networks → Add a Network, and press Enter.
Choose the network type, in our case we select XD_ip, and press Enter.
2. You can keep the default network name, as we did, or specify one as desired.
[Entry Fields]
* Network Name [net_XD_ip_01]
* Network Type XD_ip
* Netmask(IPv4)/Prefix Length(IPv6) [255.255.255.0]
* Network attribute public +
Figure 12-7 Add network
[Entry Fields]
* IP Label/Address [jessica_xd] +
* Network Type XD_ip
* Network Name net_XD_ip_01
* Node Name [jessica] +
Network Interface []
[Entry Fields]
* IP Label/Address dallasserv +
Netmask(IPv4)/Prefix Length(IPv6) []
* Network Name net_XD_ip_01
Associated Site dallas +
Example 12-5 New topology after adding network and service IP addresses
[jessica:root] / # cltopinfo
Cluster Name: xsite_cluster
Cluster Type: Stretched
Heartbeat Type: Unicast
Repository Disk: hdisk1 (00f6f5d015a4310b)
Cluster Nodes:
Site 1 (dallas):
jessica
Site 2 (fortworth):
maddi
NODE maddi:
Network net_XD_ip_01
ftwserv 10.10.10.52
dallasserv 10.10.10.51
maddi_xd 192.168.150.52
Network net_ether_010
maddi 192.168.100.52
NODE jessica:
Network net_XD_ip_01
ftwserv 10.10.10.52
dallasserv 10.10.10.51
jessica_xd 192.168.150.51
Network net_ether_010
jessica 192.168.100.51
[jessica:root] / # cllssite
---------------------------------------------------
Sitename Site Nodes Dominance Protection Type
---------------------------------------------------
dallas jessica NONE
fortworth maddi NONE
Note: The additional options, shown in blue in Figure 12-10 on page 493, are available
only if sites are defined.
[Entry Fields]
* Resource Group Name [xsiteRG]
Inter-Site Management Policy [ignore] +
* Participating Nodes from Primary Site [jessica] +
Participating Nodes from Secondary Site [maddi] +
Startup Policy Online On Home Node Onl>+
Fallover Policy Fallover To Next Priori>+
Fallback Policy Never Fallback +
-------------------------------
NODE maddi
-------------------------------
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 7a.40.c8.b3.15.2 42474 0 41723 0 0
en0 1500 192.168.100 maddi 42474 0 41723 0 0
en1 1500 link#3 7a.40.c8.b3.15.3 5917 0 4802 0 0
en1 1500 192.168.150 maddi_xd 5917 0 4802 0 0
lo0 16896 link#1 5965 0 5965 0 0
lo0 16896 127 loopback 5965 0 5965 0 0
lo0 16896 loopback 5965 0 5965 0 0
-------------------------------
NODE jessica
-------------------------------
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 ee.af.1.71.78.2 45506 0 44229 0 0
en0 1500 192.168.100 jessica 45506 0 44229 0 0
en1 1500 link#3 ee.af.1.71.78.3 5685 0 5040 0 0
en1 1500 192.168.150 jessica_xd 5685 0 5040 0 0
lo0 16896 link#1 15647 0 15647 0 0
lo0 16896 127 loopback 15647 0 15647 0 0
lo0 16896 loopback 15647 0 15647 0 0
The text that is in bold text indicates that en1 is currently configured only with the boot IP
address on both nodes. Upon starting cluster services, because jessica is the primary, it will
acquire the service address that is specific to the dallas site of dallasserv. The secondary
node, maddi, remains unchanged. as shown in Example 12-7.
-------------------------------
NODE maddi
-------------------------------
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 7a.40.c8.b3.15.2 52447 0 51059 0 0
en0 1500 192.168.100 maddi 52447 0 51059 0 0
en1 1500 link#3 7a.40.c8.b3.15.3 9622 0 9213 0 0
-------------------------------
NODE jessica
-------------------------------
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 ee.af.1.71.78.2 55556 0 54709 0 0
en0 1500 192.168.100 jessica 55556 0 54709 0 0
en1 1500 link#3 ee.af.1.71.78.3 10161 0 8680 0 0
en1 1500 10.10.10 dallasserv 10161 0 8680 0 0
en1 1500 192.168.150 jessica_xd 10161 0 8680 0 0
lo0 16896 link#1 24361 0 24361 0 0
lo0 16896 127 loopback 24361 0 24361 0 0
lo0 16896 loopback 24361 0 24361 0 0
We then move the resource group to the fortworth site through the SMIT fast path, smitty
cl_resgrp_move_node.select. Upon success, the primary site service IP, dallasserv, is
removed and the secondary site IP, ftwserv is brought online, as shown in Example 12-8.
-------------------------------
NODE maddi
-------------------------------
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 7a.40.c8.b3.15.2 54634 0 53164 0 0
en0 1500 192.168.100 maddi 54634 0 53164 0 0
en1 1500 link#3 7a.40.c8.b3.15.3 9722 0 9264 0 0
en1 1500 10.10.10 ftwserv 9722 0 9264 0 0
en1 1500 192.168.150 maddi_xd 9722 0 9264 0 0
lo0 16896 link#1 8332 0 8332 0 0
lo0 16896 127 loopback 8332 0 8332 0 0
lo0 16896 loopback 8332 0 8332 0 0
-------------------------------
NODE jessica
-------------------------------
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 ee.af.1.71.78.2 57916 0 57041 0 0
en0 1500 192.168.100 jessica 57916 0 57041 0 0
en1 1500 link#3 ee.af.1.71.78.3 10262 0 8730 0 0
en1 1500 192.168.150 jessica_xd 10262 0 8730 0 0
lo0 16896 link#1 25069 0 25069 0 0
lo0 16896 127 loopback 25069 0 25069 0 0
lo0 16896 loopback 25069 0 25069 0 0
pinged to determine if the adapter is available. The new format of entries in the file is
discussed in this section.
There is a newer option available – poll_uplink – which is recommended to address this issue.
It is discussed in 12.6, “Using poll_uplink” on page 498.
There is already a general guideline against having two PowerHA nodes in the same cluster
using the same VIOS, because this can mean that heartbeats can be passed between the
nodes through the server even when no real network connectivity exists. The problem
addressed by this netmon.cf format is not the same as that issue, although similarities exist.
This decision of which adapter is bad, local or remote, is made based on whether any network
traffic can be seen on the local adapter, using the inbound byte count of the interface. Where
VIO is involved, this test becomes unreliable because there is no way to distinguish whether
inbound traffic came in from the Virtual I/O Server's connection to the outside world, or from a
neighboring virtual I/O client. This is a design point of VIO, that its virtual adapters be
indistinguishable to the LPAR from a real adapter.
Problem resolution
The netmon.cf format was added to help in virtual environments. This new format allows
customers to declare that a given adapter should be considered active only if it can ping a set
of specified targets.
Important: For this fix to be effective, the customer must select targets that are outside the
VIO environment, and not reachable simply by hopping from one Virtual I/O Server to
another. Cluster verification will not determine if they are valid or not.
Configuring netmon.cf
The netmon.cf file must be placed in the /usr/es/sbin/cluster directory on all cluster nodes.
Up to 32 targets can be provided for each interface. If any specific target is pingable, the
adapter will be considered “up.”
Targets are specified by using the existing netmon.cf configuration file with this new format,
as shown in Example 12-9 on page 497.
Parameters:
----------
!REQD :An explicit string; it *must* be at the beginning of the line (no leading spaces).
<owner> : The interface this line is intended to be used by; that is, the code monitoring
the adapter specified here will determine its own up/down status by whether it can ping any
of the targets (below) specified in these lines. The owner can be specified as a hostname,
IP address, or interface name. In the case of hostname or IP address, it *must* refer to
the boot name/IP (no service aliases). In the case of a hostname, it must be resolvable to
an IP address or the line will be ignored. The string "!ALL" will specify all adapters.
<target> : The IP address or hostname you want the owner to try to ping. As with normal
netmon.cf entries, a hostname target must be resolvable to an IP address in order to be
usable.
Attention: The traditional format of the netmon.cf file is not valid in PowerHA v7, and
later, and is ignored. Only the !REQD lines are used.
The order from one line to the other is unimportant. Comments, lines beginning with the
number sign character (#) are allowed on or between lines and are ignored. With IBM AIX 7.1
with Technology Level 4, or earlier, you can specify the same owner entry up to 32 different
lines in the netmon.cf file, any more than those 32 are ignored. In IBM AIX 7.1 later than
Technology Level 4 and AIX Version 7.2, or later, only the last five entries for an owner entry
are considered. For an owning adapter listed on more than one line, the adapter is considered
available if it can ping any of the provided targets.
Example 12-10
!REQD en2 100.12.7.9
!REQD en2 100.12.7.10
In Example 12-11, the adapter owning host1.ibm is considered “up” only if it can ping
100.12.7.9 or whatever host4.ibm resolves to. The adapter owning 100.12.7.20 is
considered “up” only if it can ping 100.12.7.10 or whatever host5.ibm resolves to. It is
possible that 100.12.7.20 is the IP address for host1.ibm.
Example 12-11
!REQD host1.ibm 100.12.7.9
!REQD host1.ibm host4.ibm
!REQD 100.12.7.20 100.12.7.10
!REQD 100.12.7.20 host5.ibm
In Example 12-12 on page 498, all adapters are available only if they can ping the
100.12.7.9, 110.12.7.9, or 111.100.1.10 IP addresses. The en1 owner entry has an
additional target of 9.12.11.10.
Example 12-12
!REQD !ALL 100.12.7.9
!REQD !ALL 110.12.7.9
!REQD !ALL 111.100.1.10
!REQD en1 9.12.11.10
All adapters are considered up only if they can ping 100.12.7.9, 110.12.7.9, or
111.100.1.10. Interface en1 has one additional target: 9.12.11.10. In this example, having
any traditional lines is pointless because all of the adapters have been defined to use the
new method.
12.5.3 Implications
The following implications of the new format should be considered:
Any interfaces that are not included as an owner of one of the !REQD lines in the netmon.cf
will continue to behave in the old manner, even if you are using this new function for other
interfaces.
This format does not change heartbeating behavior in any way. It changes only how the
decision is made regarding whether a local adapter is up or down. This new logic will be
used in this situations:
– Upon startup, before heartbeating rings are formed.
– During heartbeat failure, when contact with a neighbor is initially lost.
– During periods when heartbeating is not possible, such as when a node is the only one
up in the cluster.
Invoking the format changes the definition of a good adapter – from “Am I able to receive
any network traffic?” to “Can I successfully ping certain addresses?” – regardless of how
much traffic is seen.
– Because of this, an adapter is inherently more likely to be falsely considered down
because the second definition is more restrictive.
– For this same reason, if you find that you must take advantage of this new functionality,
be as generous as possible with the number of targets you provide for each interface.
To use the poll_uplink option, you must have the following versions and settings:
VIOS 2.2.3.4 or later installed in all related VIO servers.
The LPAR must be at AIX 7.1 TL3 SP3 or later.
The option poll_uplink needs to be set in the LPAR on the virtual entX interfaces.
The option poll_uplink can be defined directly on the virtual interface for shared Ethernet
adapter (SEA) fallover or the Etherchannel device. To enable poll_uplink, use the following
command:
To display the settings, use the lsattr –El entX command. Example 12-13 shows the default
settings for poll_uplink.
The entstat command can be used to check for the poll_uplink status and if it is enabled.
Example 12-14 shows an excerpt of the entstat command output in an LPAR where
poll_uplink is set to no.
Example 12-15 shows the entstat command output on a system where poll_uplink is
enabled and where all physical links that are related to this virtual interface are up. The text in
bold shows the additional displayed content:
– VIRTUAL_PORT
– PHYS_LINK_UP
– Bridge Status: Up
Example 12-16 shows the entstat command output on a system where poll_uplink is
enabled and where all physical links that are related to this virtual interface are down. Notice
that the PHYS_LINK_UP no longer displays, and the Bridge Status changes from Up to
Unknown.
Example 12-16 Poll_uplink enabled entstat output when physical link is down
# entstat -d ent0
--------------------------------------------------
ETHERNET STATISTICS (en0) :
Device Type: Virtual I/O Ethernet Adapter (l-lan)
...
General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 20000
Driver Flags: Up Broadcast Running
Simplex 64BitSupport ChecksumOffload
DataRateSet VIOENT VIRTUAL_PORT
...
LAN State: Operational
Bridge Status: Unknown <------
192.168.150.51 #jessica_xd
192.168.150.52 #maddi_xd
192.168.150.53 #ashley_xd
192.168.100.53 #ashley
Notice that all of the addresses are pulled in including the boot, service, and persistent IP
labels. Before using any of the monitor utilities from a client node, the clhosts.client file
must be copied over to all clients as /usr/es/sbin/cluster/etc/clhosts. Remember to
remove the client extension when you copy the file to the client nodes.
Important: The clhosts file on a client must never contain 127.0.0.1, loopback, or
localhost.
In this type of environment, implementing a clhosts file on the client is critical. This file gives
the clinfoES daemon the addresses to attempt communication with the SNMP process
running on the PowerHA cluster nodes.
13
Remote disks can be combined into a volume group via the AIX Logical Volume Manager
(LVM) and this volume group can be imported to the nodes located at different sites. You can
create logical volumes and set up a LVM mirror with a copy at each site. Although LVM
mirroring supports up to three copies, PowerHA only supports two sites. It is still possible to
have three LVM copies with two LVM copies (even using two servers) at one site and one
remote copy at another site. This would be an uncommon situation.
Though it is common to have the same storage type at each location, it is not a requirement.
This is an advantage for this type of configuration as they are storage type agnostic. As long
as the storage is supported for SAN attachment to AIX and gives adequate performance it
most likely is a valid candidate to be used in this configuration.
The main difference between local clusters and cluster solutions with cross-site mirroring is
as follows:
In local clusters, all nodes and storage subsystems are located in the same location.
With cross-site mirrored clusters, nodes and storage subsystems reside at different sites.
Each site has at least one cluster node and one storage subsystem with all necessary IP
and SAN connectivity, similar to a local cluster.
Use ignore for the resource group inter-site management policy.
The increased availability of metropolitan area networks (MAN) in recent years has made this
solution more feasible and popular.
This solution offers automation of AIX LVM mirroring within SAN disk subsystems between
different sites. It also provides automatic LVM mirroring synchronization and disk device
activation when, after a disk or site failure, a node or disk becomes available again.
Each node in a cross-site LVM cluster accesses all storage subsystems. The data availability
is ensured through the LVM mirroring between the volumes residing on different storage
subsystems on different sites.
In case of a complete site failure, PowerHA performs a takeover of the resources to the
secondary site according to the cluster policy configuration. It activates all defined volume
groups from the surviving mirrored copy. In case where one storage subsystem fails, I/O may
experience a temporary delay but it continues to access data from the active mirroring copy
on the surviving disk subsystem.
PowerHA drives automatic LVM mirroring synchronization, and after the failed site joins the
cluster, it automatically fixes removed and missing physical volumes (PV states removed and
missing) and synchronizes data. However, automatic synchronization is not possible for all
cases. But C-SPOC can be used to synchronize the data from the surviving mirrors to stale
mirrors after a disk or site failure as needed.
13.1.1 Requirements
The following requirements must be met to assure data integrity and appropriate PowerHA
reaction in case of site or disk subsystem failure:
A server and storage unit at each of two sites.
SAN and LAN connectivity across/between sites. Redundant infrastructure both within and
across sites is also recommended.
PowerHA Standard Edition is supported (allows stretched clusters and supports site
creation).
A two site stretched cluster must be configured.
The force varyon attribute for the resource group must be set to true.
The logical volumes allocation policy must be set to superstrict (this ensures that LV
copies are allocated on different volumes, and the primary and secondary copy of each LP
is allocated on disks located in different sites).
The LV mirrored copies must be allocated on separate volumes that reside on different
disk subsystems in the different sites.
Mirror pools should be used to help define the disks at each site.
When adding additional storage space, for example increasing the size of the mirrored file
system, it is necessary to assure that the new logical partitions will be allocated on different
volumes and different disk subsystems according to the requirements above. For this task it is
required to increase the logical volume first with the appropriate volume selections, then
increase the file system, preferably by using C-SPOC.
The SAN network can be expanded beyond the original site, by way of advanced technology.
Here is an example of what kind of technology could be used for expansion. This list is not
exhaustive:
FCIP router.
Wave division multiplexing (WDM) devices. This technology includes:
– CWDM stands for Coarse Wavelength Division Multiplexing, which is the less
expensive component of the WDM technology.
– DWDM stands for Dense Wave length Division Multiplexing.
Repository disk(s)
Similar to a local cluster, a stretched cross-site LVM mirrored cluster consists of only a single
repository disk. So a decision has to be made as to which site should contain the repository
disk. An argument can be made to having it a either site. If it resides at the primary site, and
the primary site goes down, a fallover can and should still succeed. However it is also strongly
recommended to define a backup repository disk to the cluster at the opposite site from the
primary repository disk. In the event of primary site failure the repository disk will be taken
over with the backup repository disk via the automatic repository replacement feature within
PowerHA.
Mirror pools
Though technically not a requirement, it is also strongly recommended to use the AIX LVM
capability of mirror pools. Using mirror pools correctly helps to both create and maintain
copies across separate storage subsystems ensuring a separate and complete copy of all
data at each site.
Mirror pools make it possible to divide the physical volumes of a volume group into separate
pools. A mirror pool is made up of one or more physical volumes. Each physical volume can
only belong to one mirror pool at a time. When creating a logical volume, each copy of the
logical volume being created can be assigned to a mirror pool. Logical volume copies that are
assigned to a mirror pool will only allocate partitions from the physical volumes in that mirror
pool. This provides the ability to restrict the disks that a logical volume copy can use. Without
mirror pools, the only way to restrict which physical volume is used for allocation when
creating or extending a logical volume is to use a map file.
Table 13-1 Device type, device name, device attributes and settings
Device type Device name Device attribute Setting
fc_err_recover fast_fail
hcheck_interval 30
timeout_policy fail_path
Tip: it is important to understand and utilize MPIO best practices which can be found at
https://developer.ibm.com/articles/au-aix-mpio
It is important to note that AIX disk storage subsystem failure detection is prolonged in a
redundant virtual SCSI environment. This is one of many reasons why NPIV is a much better
choice than virtual SCSI to minimize the time required for AIX to detect and recover from a
disk storage subsystem failure in a virtualized environment.
The number of paths to each hdisk/LUN affects the time required to detect and recover from a
disk storage subsystem failure. The more paths, the longer recovery will take. We generally
recommend no more than two paths per server port through which a given LUN can be
accessed. Two server ports per LUN are generally sufficient unless I/O bandwidth
requirements are extremely high.
When a disk storage subsystem fails, AIX must detect failure of each LUN presented by the
subsystem. The more LUNs, the more potential application write I/O stalls while AIX detects a
LUN failure. The number of LUNs is dictated by the LUN size, and the disk storage capacity
required. Since many I/O requests can be driven simultaneously to a single hdisk/LUN
(limited by the hdisk's queue_depth attribute), we generally recommend fewer larger LUNs as
compared to more smaller LUNs.
But too few LUNs can sometimes lead to a performance bottleneck. The AIX hdisk device
driver is single threaded, which limits the number of IOPS which AIX can drive to a single
hdisk. It is therefore inadvisable to drive more than 5,000 IOPS to a single hdisk.
Please note that in most cases IOPS to a single hdisk/LUN will be constrained by disk storage
subsystem performance well before the hdisk IOPS limit is reached, but the limit can easily be
exceeded on LUNs residing on FlashSystem or on solid-state disk drives.
If the anticipated workload on a volume group is such that the hdisk IOPS limit might be
exceeded, create the volume group on more smaller hdisks/LUNs.
If an anticipated workload is such that it tends to drive I/O to only one file or file system at a
time – especially when I/O requests are sequential – then to help AIX process failures of
multiple LUNs simultaneously you should stripe logical volumes (including those underlying
file systems) across hdisks/LUNs using the -S flag on the mklv command. It will not help to
configure logical volumes with physical partition striping by specifying -e x on the mklv
command. If physical partition striping is used with such a workload, AIX might still process a
disk storage subsystem failure one LUN at a time.
The following is a list of reasons why implementing cross-site LVM mirroring may not be
desirable:
It is specific to AIX, no heterogeneous operating system support.
It provides synchronous replication only.
It cannot copy raw disks.
It presents potential system performance implications because it doubles the write I/Os.
There is no way to define primary (source) or secondary (target) copies. Though you can
specify preferred read option for storage like flash.
Due to AIX I/O hang times in the event of a copy loss, it still might not prevent application
failure.
Quorum is usually disabled, and forced varyon of the volume group enabled. This can lead
to data inconsistencies and must be carefully planned for recovery.
Like most data replication, it is also good at copying bad data. This means it does not
eliminate the need for backup and back out procedures.
Important: Even with redundant components, most environments, such as cross-site LVM
mirroring, can only handle any one failure. That failure needs to be corrected before
another failure occurs. A series of rolling failures may still result in an outage, or worse,
data corruption.
3. Enter hostname and login credentials of one of the cluster nodes. In our scenario it was
node shawn as shown in Figure 13-3 and click on Continue.
4. Enter cluster name and choose Stretched for cluster type. In our scenario cluster name is
xlvm_cluster as shown in Figure 13-4 and click on Continue.
5. Enter site names and nodes per site as shown in Figure 13-5 and click on Continue.
6. Choose desired repository disk by clicking the box next to the hdisk name as shown in
Figure 13-6. A list of free shared disks are displayed. You can hover the mouse pointer
over each disk and get the PVID and size to verify you are picking the correct one. When
you are done click on Continue.
7. Now a cluster summary is presented as shown in Figure 13-8 on page 512. Review the
details and once all is confirmed click on Submit to complete.
After successful creation, the cluster will be displayed in the SMUI as unassigned and offline.
This is shown in Figure 13-7. At this point only the base CAA cluster has been created. The
resources, service IP, volume group, and resource group are still needing to be created.
2. On the next screen fill out all the options and click on Continue. For our scenario they are
as follows and also shown in Figure 13-10:
a. Resource Group Name: xsiterg
b. Make Primary Site: Saginaw
i. Add nodes in order of: shawn and jordan
c. For site Haslet add node: jessica
d. Inter-Site management: ignore
Note: For inter-site management we choose ignore as that is what should be used with
cross-site LVM configurations.
3. Choose the desired Startup, Fallover and Failback policies as shown in Figure 13-11 and
then click on Create. For our scenario they are as follows:
a. Startup: Online on Home Node
b. Fallover: Move the resource group to the next available node in order of
priority.
c. Fallback: Avoid another outage and never return to a previous node.
4. After clicking Create, the resource group is created. You are presented with Figure 13-12.
.
In our case we will choose Add Resource and continue on in 13.3.3, “Defining resources”
on page 515.
You are then presented with Figure 13-13 with four basic resource options.
We are presented with the screen to define an IP address as shown in Figure 13-14. We
chose the Existing IP Address since we already added it into the /etc/hosts file and was
able to select bdbsvc from the pull-down list. Otherwise you could chose to Define IP
Address and manually enter it. In either case after completing choose Continue.
Next we are presented with the screen to chose which Network Name, via pull-down list, and
Subnet mask as shown in Figure 13-15 and click on Continue. However the subnet mask is
optional as it will inherit it from the network.
Following is where you specify if the service IP should be associated with a specific site. In
our scenario we will allow the same service IP to traverse both sites so we chose the No Site
option as shown in Figure 13-16 and then click on Create to complete.
Since we chose No here, the remaining two options do not apply so we leave them blank.
After completing the fields we click on Continue as shown in Figure 13-18. This completes
creating the resource and automatically adds it into the resource group.
Now we will choose Add Resource to create the volume group, as shown in 13.3.5, “Defining
and creating volume group(s)” on page 518.
This time we chose the volume group option and are presented with the Add Volume Group
screen as shown in Figure 13-20.
Fill out the fields as desired and click on Continue. In our scenario we completed them as
follows:
Select volume group type: Scalable
Volume group name: xsitevg
Partition Size: 256MB
Maximum physical partitions: 32
Maximum logical volumes: 256
Next we are presented with the screen to chose the specific disks we want to be members of
the volume group. In our scenario we only have two free disks, hdisk4 and hdisk8 as shown in
Figure 13-21 and we chose them both.
We then click on Continue to advance to the mirror pools tab as shown in Figure 13-22.
In our scenario we will continue to use the SMUI to create the file system. This will create a
generically named logical volume and create a jfsloglv. It also auto-detects the fact that the
volume group is already defined with mirror pools and will mirror the logical volume properly.
We begin by choosing Add Resource from the successful volume group creation status
window. We then choose Journaled File System from the Add Resource screen, similar to the
one shown in Figure 13-13 on page 515. Once chosen we are presented with the Add
Journaled File System screen as shown Figure 13-23.
We mainly chose the defaults except for changing permissions to Read and Write and the
size of 1GB. Had any existing logical volume without a filesystem been detected it would’ve
been available for selection in the drop down list. In our case we have none, so we allow
SMUI to auto-create the logical volume for us. We then click on Continue and are presented
with the secondary screen for Mount Options as shown in.Figure 13-24. We enter the mount
point name, /smuijfstest and chose Create.
After clicking create we get status feedback showing successful creation, and we can either
click on Done or Add Resource. In our case we are done in Figure 13-25. However if more
resources need to be added, simply repeat by the process by choosing add resource.
mirror pools. As these validations are not visible within the same SMUI screen, we chose to
use the command line.
Like most options this can be changed either with SMIT or the command line. We use the
command line and execute the following:
clmgr modify resource_group 'xsiterg' FORCED_VARYON='true'
We also specifically check the logical volume attributes highlighted in blue shown in
Example 13-3. They correspond to the upper bound, strictness and preferred read. We need
to restrict the upper bound and enable superstrictness as generally recommended best
practice. Though the superstrict for the mirror pools is already enabled and should be the
highest determining factor.
To change these logical volume attributes, upper bound and strictness, we utilize the C-SPOC
command line:
/usr/es/sbin/cluster/cspoc/cli_chlv -s s -u 1 lv01
This should be repeated for every logical volume in the volume group. In our case we also
have loglv01. We verify the attributes are set as desired as shown in Example 13-4.
Notice PREFERRED READ has not changed. Technically it is not required to utilize this
option, but for the best performance, it is generally recommended to always read from the
disk local on each site. This can be intelligently controlled by PowerHA by using a
combination of attributes at both the volume group, and the mirror pool level, of LVM –
Preferred Read and Storage Location respectively. These are covered in the following
sections.
The only attribute that may require further explanation is the last one: LVM Preferred Read. It
has the following options to choose from by pressing F4 to get a pop-up picklist:
roundrobin This is the default policy for LVM preferred read copy, LVM will
decide which mirror copy should be used to read the data.
favorcopy Choose if you would like to read from Flash storage irrespective of
where the resource group is online.
siteaffinity Choose if you would like to read from storage at the local site
where resource group is online. The siteaffinity option is available
only for site based clusters.
As is often the case after any change, the cluster needs to be synchronized. This can be done
after every change, or a group of changes. The benefit of doing so after every change is it
often makes it easier to troubleshoot any problems that are encountered.
For example, after setting siteaffinity and synchronizing the cluster we see a warning as
shown in Example 13-6. This is a reminder we still need to assign the mirror pools to the sites.
To check this cluster setting from the command line see Example 13-7.
In our case hdisk4 is local to the saginawmp mirror pool, and hdisk8 is local to the hasletmp as
shown in Example 13-8.
# lspv -P
Physical Volume Volume Group Mirror Pool
hdisk2 rootvg
hdisk3 caavg_private
hdisk4 xsitevg saginawmp
Again we will utilize C-SPOC to assign each mirror pool to their respective site as follows:
1. smitty cl_mirrorpool_mgt → Change/Show Characteristics of a Mirror Pool.
2. Choose the mirror pool from the pop-up picklist, in our case saginawmp.
3. Go down to Storage Location, press F4 and select the desired site. In our case it is
Saginaw.
4. Press Enter twice to execute.
5. Repeat as needed for each mirror pool.
In our case we repeated with hasletmp and assigned it to site Haslet.
After all creations and changes have been completed the cluster still needs to be
synchronized to push those changes across all cluster nodes.
In the left hand navigation panel we go to the cluster and expand out all the sites and nodes.
After doing so you can see that the node shawn shows a yellow warning indicator and clearly
states there are unsynchronized changes. This is important because it validates that the
cluster changes we have made have been performed on that specific node and we need to
run the synchronization from/on that node. We click on the three dots next to node shawn and
chose the option to synchronize cluster as shown in Figure 13-26.
13.4 Testing
In this section we cover some of the common major failures and the such as:
Local node failure within primary site
Rolling node failures promoted to site failure
Primary site local storage failure
Primary site remote storage failure
Primary site all storage failure
The first two test scenarios aren’t specifically any different then a typical local shared cluster.
However we think it is important to show the most common likely failures encountered. Also,
unless otherwise stated, every test begins with all nodes active in the cluster as shown in
Figure 13-28 on page 528. We also validate before and after fallover the PREFERRED READ
is set as desired to each site.
We simply fail the node by executing reboot -q. PowerHA detects the node failure and
acquires the resource group on the next available node within the same site, jordan, as
shown in Figure 13-29. We can see the resource group is online, the primary node state is
unknown and the primary site has an error because of the previously created failure.
We check the logical volume settings again and it is still the same as shown in Example 13-9
on page 528.
This time when checking the logical volume we see that the PREFERRED READ has changed to
2, as shown in Example 13-10,which is expected and desired as it is the hasletmp mirror pool.
In this scenario we simulate lost access of the primary storage at the primary site. This can be
done in a number of ways but we chose to simply unmap the volume, hdisk4, at the primary
site.
The disk loss is detected by AIX and reported in the error report as shown in Example 13-11.
There is no fallover of any action taken as the remote copy is still available. This allows for,
and actually provides, continuous operation.
The logical volume does report that it is stale and the physical volume missing as expected.
This is shown in Example 13-12.
To recover we simply remap the volume back to the primary node shawn and execute
varyonvg -c xsitevg. This brings the disk active and auto syncs the stale mirrored copies.
AIX also marks the disk missing and also logs the logical volume as stale as shown in
Example 13-14.
# lsvg -p xsitevg
xsitevg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk4 active 374 245 75..00..20..75..75
hdisk8 missing 374 245 75..00..20..75..75
To recover we simply remap the volume back to the primary node shawn and execute
varyonvg -c xsitevg. This brings the disk active and auto syncs the stale mirrored copies.
Now the secondary node, jordan, does attempt to acquire the resource group. It is able to
bring up the service IP but because it also lost access to the storage, the acquisition fails
trying to activate the volume group. This forces the resource group to error out and then
fallover to the secondary site, Haslet, onto node jessica. That succeeds as shown in
Figure 13-31. We also verify the correct PREFERRED READ setting and it is set as shown in
Example 13-10 on page 530.
However the overall cluster status looks normal because all nodes are still active in the
cluster. There is clearly a SAN problem that still needs to be resolved, but at a glance at the
cluster status, you may not easily know. You most likely would have to sift through the error
report and the cluster logs to ultimately determine the problem.
To recover we stop cluster services on both nodes at the primary site. We also remap both
volumes back to both nodes through the storage subsystem. Once completed we verify we
can query the disks as shown in Example 13-16. We ran this test multiple times and on one
occasion we could not query the disks. We simply ran cfgmgr and then the query was
successful.
We then restart cluster services on both nodes. Once the nodes join and stabilize we then
move the resource group back to the primary node and site via the clmgr command as shown
in Example 13-17.
We can actually see in the hacmp.out log file that the disk is detected missing and because of
it the mirrored copies can not be synchronized as shown in Example 13-18.
The missing disk, logical volume stale and stale partitions are all shown in Example 13-19.
# lsvg -l xsitevg
xsitevg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
loglv01 jfs2log 1 2 2 open/stale N/A
lv01 jfs2 128 256 2 open/stale /smuijfstest
In a real world scenario like this recovering back to the primary node and site requires
problem solving the original site failure and deciding if or when to move it back. It is possible
to move it back with the one copy, assuming the primary storage is still unavailable. However,
if it is available, you probably will want to get the copies back in sync before ever moving it
back to the primary site. That will be our course of action.
Once the primary storage is available again, validate the missing disk is now accessible –
ultimately on all nodes, but initially on the secondary site node jessica. This is shown in
Example 13-16 on page 533. If not, run cfgmgr and try again. Once querying the disk is
successful we need to reactivate the volume group to change hdisk4 from missing to active
using varyonvg -c xsitevg. This will also auto synchronize the stale partitions. Once back in
sync, as shown in Example 13-20, the cluster nodes at the primary site can be restarted.
Lastly the resource group can be moved back to the primary as shown in Example 13-17 on
page 534.
# lsvg -l xsitevg
xsitevg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
loglv01 jfs2log 1 2 2 open/syncd N/A
lv01 jfs2 128 256 2 open/syncd /smuijfstest
14
14.1 Introduction
The following gives an overview of an IBM Power Systems Virtual Server and the HADR
options available for cloud and hybrid cloud combinations.
With the Power Systems Virtual Server, you can quickly create and deploy one or more virtual
servers (that are running either the AIX, IBM i, or Linux operating systems). After you
provision the Power Systems Virtual Server, you get access to infrastructure and physical
computing resources without the need to manage or operate them. However, you must
manage the operating system and the software applications and data.
14.1.2 IBM Power and IBM Power Systems Virtual Server HADR options
IBM Power systems are one of the most reliable platforms in the industry, often approaching
levels similar to what you experience with IBM zSystems. IBM Power servers are designed to
match the requirements of the most critical data-intensive workloads. For the 13th straight
year, IBM Power servers achieved the highest server reliability rankings in the ITIC 2021
Global Server Hardware and Server OS Reliability survey.1
To provide even higher levels of availability, IBM Power Systems and IBM PowerVS provide a
range of HADR solutions. There are three broad categories of HADR solutions to consider:
If that virtual machine (VM) fails, it can be restarted on another server in the cluster by
utilizing Simplified Remote Restart (SRR) capabilities. The resulting outage time can be
shortened as the LPAR can be quickly restarted on another server, eliminating problem
determination and repair time on the failed server. For a fully automated remote restart
function, IBM provides the IBM VM Recovery Manager product (VMRM). IBM PowerVS
utilizes this function to allow your PowerVS virtual instance to be restarted on another IBM
PowerVS server.
With the addition of replicated storage, IBM VMRM can provide DR operations, allowing your
workloads to be quickly restarted at a remote data center in case your primary data center
1 https://www.ibm.com/downloads/cas/A856LOWK
experiences a failure. For more information, see Implementing IBM VM Recovery Manager
for IBM Power Systems, SG24-8426.
While active/passive solutions provide you a way of moving your workloads to avoid planned
outages, they do not eliminate unplanned outages. They only allow you a quicker method of
recovering your applications and shortening the length of the outage. VM recovery options
are operating system agnostic – they support all of the operating systems that run on IBM
Power servers or IBM PowerVS.
Table 14-1 highlights the differences between IBM Power HADR solutions.
Outage types Software, hardware, HA, Software, hardware, HA, Hardware, HA, DR,
planned, and unplanned. DR, planned, and planned, and unplanned.
The RTO is 0 with limited unplanned. The RTO is The RTO is greater than
distance. greater than 0 with 0 with multiple sites.
multiple sites.
IBM solution Db2 pureScale®, Db2 PowerHA SystemMirror, VMRM HA, LPM, and
Mirror, and Red Hat HA, and Linux VMRM DR.
IBM Storage Scale. HA.
a. The number of licensed processor cores on each system in the cluster.
The benefits of Global Replication on Power Virtual Server include the following:
– Maintain a consistent and recoverable copy of the data at the remote site, created with
minimal impact to applications at your local site.
– Efficiently synchronize the local and remote sites with support for fallover and fallback
modes, helping to reduce the time that is required to switch back to the local site after a
planned or unplanned outage.
– Replicate more data in less time to remote locations.
– Maintain redundant data centers in distant geographies for rapid recovery from
disasters.
– Eliminate costly dedicated networks for replication and avoid bandwidth upgrades.
Restriction: At the time of the writing of this book GRS is currently enabled in two data
centers – DAL12 and WDC06. Additional sites will be added over time. See this GRS link
for further information.
GRS aims to automate the complete DR solution and provide the API and CLI interfaces to
create the recipe for the DR solution. GRS currently does not have any user interface. IBM
provides an automation toolkit for GRS. The IBM Toolkit for AIX from Technology Services
enables clients to automate disaster recovery (DR) functions and capabilities by integrating
Power Virtual Server with the capabilities of GRS. With the Toolkit, clients can manage their
DR environment using their existing AIX skills. The benefits include the following:
– Simplify and automate operations of your multi-site disaster recovery solution in Power
Virtual Server.
– Provide a secondary IBM Cloud site as a host for your business application as a
disaster recovery solution.
writing, public cloud deployments do not support storage-based replication, so you must
convert to GLVM.
GLVM uses caching in memory and backup on disk, so system capacity sizing is critical.
Likewise, source and target system throughput must be closely matched. If you want to
ensure that sufficient capacity is available in the public cloud, you must license as many
processor cores as needed to conduct production operation at the required performance.
Figure 14-1 provides an overview of PowerHA SystemMirror Enterprise Edition with GLVM on
the IBM PowerVS architecture. It consists of the following:
Traditional AIX LVM native mirroring is replicated over IP addresses to the secondary
system to maintain two (or three) identical copies in sync mode, and near identical copies
in async mode.
The architecture is disk subsystem neutral, and it is implemented through RPVs, which
virtualize the remote disk to appear local. This architecture differs from logical unit
numbers (LUNs), which are used through SAN storage.
Easily managed by the administrator.
You use term licensing for public cloud deployments because it is registered to the
customer and not the serial number.
Reserve sufficient capacity for running your production on a DR virtual server by licensing
the number of cores that is required. N+1 licensing does not apply because expanded
capacity on demand cannot be ensured.
Figure 14-1 PowerHA SystemMirror for AIX on the IBM PowerVS architecture
When using PowerHA SystemMirror for AIX with GLVM on IBM PowerVS, consider the
following items:
There is no limit to the scalability from a production point of view if the systems are
properly configured with sufficient disk, memory, and quality bandwidth. Source and target
systems must be matched for equal throughput.
For existing customers, you match the bandwidth and target configuration throughput to
the on-premises deployments.
For more information, see IBM Power Systems High Availability and Disaster Recovery
Updates: Planning for a Multicloud Environment, REDP-5663 and AIX Disaster Recovery with
IBM PowerVS: An IBM Systems Lab Services Tutorial.2
The distance between the sites is limited only by the acceptable latency (for synchronous
configurations) or by the size of the cache (for asynchronous configurations). For
asynchronous replication, the size of the cache represents the maximum acceptable amount
of data that can be lost in a disaster.
In this architecture, it is the pseudo-local representation of the remote physical volume (RPV)
that allows the LVM to consider the physical volume at the remote site as another local, albeit
slow, physical volume. The actual I/O operations are performed at the remote site.
2 https://cloud.ibm.com/media/docs/downloads/power-iaas-tutorials/PowerVS_AIX_DR_Tutorial_v1.pdf
There is one RPV server for each replicated physical volume, and it is named rpvserver#.
You can mirror your data across two sites by configuring VGs that contain both local physical
disks and RPVs. With an RPV device driver, the LVM does not distinguish between local and
RPVs – It maintains mirror copies of the data across attached disks. The LVM is usually
unaware that some disks are at a remote site.
For PowerHA SystemMirror installations, GMVGs can be added to resource groups and
managed and monitored by PowerHA SystemMirror. Defining the GLVM volume group is
discussed in Chapter 15, “GLVM wizard” on page 549.
For more information about GLVM, see the PowerHA SystemMirror documentation and
Asynchronous Geographic Logical Volume Mirroring Best Practices for Cloud Deployment,
REDP-5665.
In a non-concurrent configuration, only one node owns the disk at a time. If the owner node
fails, the next highest priority cluster node in the resource group node list takes ownership of
the shared disk and restarts the application to restore critical services to the client, which
provides client applications access to the data that is stored on disk.
The takeover usually occurs within 30 – 300 seconds. This range depends on the number and
types of disks that are used, the number of VGs and file systems (shared or cross-attached to
a Network File System) and the number of mission-critical applications in the cluster
configuration.
When planning a shared external disk for a cluster, the goal is to eliminate SPOFs in the disk
storage subsystem.
For more information, see PowerHA SystemMirror 7.2 for AIX planning.
Important: File based replication and clustering is independent of PowerHA. They may be
used in a complimentary fashion to achieve a desired end goal, but it is considered a
customized solution with no formal integrated support.
In addition to general data access, IBM Storage Scale includes many other features, such as
data replication, policy-based storage management, and multi-site operations.
IBM Storage Scale can be run on virtualized instances, LPARs, or other hypervisors to enable
common data access in scenarios. Multiple IBM Storage Scale clusters can share data over a
local area network (LAN) or wide area network (WAN).
An IBM Storage Scale cluster with DR capabilities consists of two or three geographically
different sites that work together in a coordinated manner. Two of the sites consist of
IBM Storage Scale nodes and storage resources that hold complete copies of the file system.
If the third site is active, it consists of a single node and a single disk that are used as the IBM
Storage Scale arbitration tie-breaker. The file system service fails over to the remaining
subset of the cluster and uses the copy of the file system that survived the disaster to
continue to provide data access in a hardware failure that causes the entire site to become
inoperable, assuming that the arbitration site still exists.
IBM Storage Scale also supports asynchronous replication by using Active File Management
(AFM), which is primarily designed for a head office or remote offices configuration. It is
available in IBM Storage Scale Standard Edition.
AFM provides a scalable, high-performance file system caching layer that is integrated with
the IBM Storage Scale cluster file system. With AFM, you can create associations from a
local IBM Storage Scale cluster to a remote cluster or storage, and define the location and
flow of file data to automate the management of the data. You can implement a single
namespace view across sites around the world.
The primary site is a read/write file set where the applications are running and have
read/write access to the data. The secondary site is read-only. All the data from the primary
site is asynchronously synchronized with the secondary site. The primary and secondary
sites can be independently created in a storage and network configuration. After the sites are
created, you can establish a relationship between the two file sets. The primary site is
available for the applications even when communication or secondary fails. When the
connection with the secondary site is restored, the primary site detects the restored
connection and asynchronously updates the secondary site.
For more information, see the IBM Storage Scale 5.1.2 documentation.
The advantage of this model is cost savings because the enterprises can provision minimal
resources and scale up as required during a HADR situation. In this scenario, the enterprises
run their production workloads on-premises and use the resources in a public cloud for
HADR, as shown in Figure 14-3.
The main advantage of this model is enhanced resiliency. Outages can happen at any time for
a cloud provider, which makes it risky for enterprises to rely on single cloud vendor.
In this scenario, the enterprises run their production workloads on one public cloud and
HADR on another public cloud, as shown in Figure 14-4.
15
15.1 Introduction
The following gives an overview of using the PowerHA 7.2.7 SMUI GLVM configuration
wizard. GLVM can be configured by either of the following methods.
Cluster Creation
During cluster creation, the cluster creation wizard is used for GLVM configuration. As a
prerequisite for the cluster creation wizard, you must have a volume group with its physical
volumes shared on all local site nodes. Physical volumes of the same size must exist in
the remote site and will be automatically detected.
Add to existing Cluster
The GLVM configuration wizard is used for linked clusters that are already being managed
by the PowerHA SystemMirror GUI. You can create and manage multiple GLVM
configurations in the same cluster by using the GLVM configuration wizard. Physical
volumes of the same size must exist in the remote site and will be automatically detected.
15.1.1 Prerequisites
Before attempting to use the SMUI to configure GLVM ensure the following prerequisites are
met.
Ensure cluster.es.assist.common is installed.
The volume group must be available in all local site nodes and must not use any physical
volumes that are shared on any nodes in the remote site.
IP addresses and labels are in /etc/hosts of all nodes.
Adequate number of same sized free disks and space on remote site to accommodate the
mirroring. The space needed for the AIO cache logical volume must also be accounted.
A free disk of at least 512MGB at each site for cluster repository disk.
Ensure XD_data networks are defined in the cluster (when adding to an existing cluster).
Python version 2.0.x, or later, must be installed on all the cluster nodes.
Note: To delete a GLVM configuration from the PowerHA SystemMirror GUI, you must
bring the cluster offline first.
15.1.2 Limitations
The PowerHA SystemMirror GUI GLVM configuration wizard has the following limitations:
Only supports asynchronous mirroring for GLVM.
After you configure GLVM, you cannot modify the asynchronous cache size.
The asynchronous cache utilization graph is updated only when the corresponding
resource group is active and when the application controller is active.
The asynchronous cache utilization graph will not be visible when GLVM is configured
outside of the GLVM wizard.
Service IP addresses must be created and added to resource group after the cluster and
resource group creation as desired.
Split/Merge policy defaults to majority. You might want to change this.
15.1.3 Logs
The following logs are useful to utilize to both observe and troubleshoot as needed when
utilizing the GLVM wizard through the SMUI:
/var/hacmp/log/cl_glvm_configuration.log
The primary log of the GLVM wizard located on the primary node where the wizard is
executing.
/usr/es/sbin/cluster/ui/agent/logs/smui-agent.log
This is the SMUI log on the cluster node where the tasks are executing.
/usr/es/sbin/cluster/ui/server/logs/smui-server.log
This is the SMUI log on the SMUI server itself.
Specify node and login credentials and click Continue as shown in Figure 15-2 on
page 552.
Enter cluster name and check box the option to create GLVM cluster and click Continue as
shown in Figure 15-3.
Specify the XD_data network name, site names, nodes per site, and persistent alias. Then
click Continue as shown in Figure 15-5.
Choose a repository disk for each site and click Continue as shown in Figure 15-6.
Select volume group, specify AIO cache LV size, compression, IO Group Latency, and
number to sync in parallel and click Continue as shown in Figure 15-7 on page 555.
Descriptions for some of the options are:
Async I/O Cache LV Is a logical volume of type aio_cache which is used to store
asynchronous writes. Instead of waiting for the write to be
completed on the remote physical volume, the write is recorded on
the local cache, and then acknowledgment is returned to the
application. The I/Os that are recorded in the cache are played in
order against the remote disks and then, deleted from the cache
after it is successful write acknowledgment.
I/O Group Latency This parameter indicates the maximum expected delay (in
milliseconds) before receiving the I/O acknowledgment in a mirror
pool. You need to specify the volume group that is associated with
the mirror pool by using the -v flag. The default value is 10
milliseconds. If you specify lower values, I/O performance might
improve with result of higher CPU consumption. This attribute is a
specific VG attribute.
Number of Parallel LPs The number of logical partitions to be synchronized in parallel. The
valid range is 1 to 32. The number of parallel logical partitions must
be tailored to the machine, disks in the volume group, system
resources, and the volume group mode.
Figure 15-7 AIO cache LV size, compression, IO Group Latency, Parallel Sync
Note: During the writing of this book, the SMUI summary screen always displays the
asynchronous i/o cache size in GB regardless what was chosen on previous screens.
However, it does create it at the proper specified size in GB, MB or KB.
The results of this creation can be found in 15.4, “GLVM Cluster configuration” on page 566.
The SMIT version will create the GMVG entirely new, whereas SMUI requires an existing
volume group (preferably with all needed logical volumes and filesystems already created).
The SMIT version will create – in addition to the async I/O logical volumes – a JFS2
filesystem and JFSlog as shown in Example 15-2 on page 559.
Note: The SMIT panel shown in Figure 15-10, does not ask about the use of compression,
I/O group latency, or the number of logical partitions to sync in parallel like the SMUI does.
They will be set to defaults of 0 (disabled), 1 and 32 respectively.
To add an asynchronous GMVG into an existing cluster using the GLVM wizard execute
smitty sysmirror → Cluster Applications and Resources → Make Applications Highly
Available (Use Smart Assists) → GLVM Configuration Assistant → Configure
Asynchronous GMVG.
[Entry Fields]
* Enter the name of the VG [GLVMvg]
* Select disks to be mirrored from the local site (00c472c092322efd) +
* Select disks to be mirrored from the remote site (00c472c0de143bdf) +
* Enter the size of the ASYNC cache [64] #
* Enter the unit of ASYNC cache size M +
Upon completion an overview of all the tasks executed are displayed in the SMIT screen and
is shown in Example 15-2.
The overall configuration created is nearly identical to the SMUI results shown in 15.4, “GLVM
Cluster configuration” on page 566 with the exception of the creation of a JFS2 filesystem and
a jfs2log as shown in Example 15-3.
Example 15-3 Logical volumes and JFS2 created from asynchronous GLVM wizard
LVMvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
GLVMvgLV jfs2 10 20 2 closed/syncd /GLVMvgfs0
GLVMvgLV1 jfs2log 10 20 2 closed/syncd N/A
GLVMvgALV aio_cache 8 8 1 open/syncd N/A
GLVMvgALV1 aio_cache 8 8 1 closed/syncd N/A
Note: The full command syntax executed from the SMIT panel for this example was:
Specify the local site, XD_Data network, and persistent IP aliases if they are not already
defined in the current cluster topology. Our cluster already has the persistent aliases so no
further selection is required. Then click Continue as shown in Figure 15-13 on page 563.
Select the volume group, specify the AIO cache LV size, compression, IO Group Latency, and
number to sync in parallel and click Continue as shown Figure 15-14 on page 564.
Descriptions for some of the options are:
Async I/O Cache LV Is a logical volume of type aio_cache which is used to store
asynchronous writes. Instead of waiting for the write to be
completed on the remote physical volume, the write is recorded on
the local cache, and then acknowledgment is returned to the
application. The I/Os that are recorded in the cache are played in
order against the remote disks and then, deleted from the cache
after it is successful write acknowledgment.
I/O Group Latency This parameter indicates the maximum expected delay (in
milliseconds) before receiving the I/O acknowledgment in a mirror
pool. You need to specify the volume group that is associated with
the mirror pool by using the -v flag. The default value is 10
Note: Even though the last option says “Number of Parallel Logical volumes” it technically
is “Number of Logical Partitions to Synchronize in Parallel”. It is directly related to syncvg
command and is specified either via -P flag or the environment variable
NUM_PARALLEL_LPS.
Next, a summary is displayed. Review and click Submit as shown in Figure 15-15.
Note: During the writing of this book, the SMUI summary screen always displays the
asynchronous i/o cache size in GB regardless what was chosen on previous screens.
However, it does create it at the proper specified size in GB, MB or KB.
15.4.1 Topology
In our scenario we have two networks. When creating a new cluster, the default settings will
make both as type XD_data. There also is a persistent IP address associated with one of the
XD_data networks as specified previously using the GLVM wizard. It may be desirable to
change which interface the persistent is located, or change one of the networks to type ether.
# cllsif -p
Adapter Type Network Net Type Attribute Node IP
Address Hardware Address Interface Name Global Name Netmask
-------------------------------
NODE jessica
-------------------------------
Name Mtu Network Address Ipkts Ierrs Opkts
en0 1500 link#2 96.d7.54.3b.2d.4 196703 0 838865
en0 1500 192.168.100 192.168.100.90 196703 0 838865
en1 1500 link#3 96.d7.54.3b.2d.2 912643 0 54204
en1 1500 10.2 10.2.30.190 912643 0 54204
en1 1500 192.168.0 192.168.100.190 912643 0 54204
lo0 16896 link#1 20920 0 20920
lo0 16896 127 127.0.0.1 20920 0 20920
lo0 16896 ::1%1 20920 0 20920
-------------------------------
NODE jordan
-------------------------------
Name Mtu Network Address Ipkts Ierrs Opkts
en0 1500 link#2 96.d7.58.9d.39.4 197120 0 2320881
en0 1500 192.168.100 192.168.100.91 197120 0 2320881
en1 1500 link#3 96.d7.58.9d.39.2 824644 0 90585
en1 1500 10.2 10.2.30.191 824644 0 90585
en1 1500 192.168.0 192.168.100.191 824644 0 90585
lo0 16896 link#1 26676 0 26676
lo0 16896 127 127.0.0.1 26676 0 26676
lo0 16896 ::1%1 26676 0 26676
While all settings are important, the most note worthy are Site Relationship, Fallback Policy
and Service IP Label. For site relationship the setting of Prefer Primary Site will move the
resource group back once the site rejoins the cluster. This is contrary to the Fallback Policy
of Never Fallback which is an intra-site policy so this site fallback behavior must be
understood. The options for site relationship are as follows.
Ignore
As the name implies there is none specified. Should not be used with replicated
resources, including GLVM configurations.
Prefer Primary Site
Upon startup only a node within the primary site will activate the resource group. When a
site fails, the active site with the highest priority acquires the resource. When the failed site
rejoins, the site with the highest priority acquires the resource.
Either Site
Upon startup a participating node in either site may activate the resource group. When a
site fails, the resource will be acquired by the highest priority standby site. When the failed
site rejoins, the resource remains with its new owner.
Online on Both Sites
Resources are acquired by both sites. This is for concurrent capable resource groups and
does not apply to GLVM configurations.
For service IP addresses it is common to define two, one for each site, and add both into the
resource group. This is referred to as Site-Specific Service IP Labels. More details on
configuring and using this type of service IP addresses can be found in 12.4, “Site-specific
service IP labels” on page 488.
Both scripts are plain text ksh shell scripts and are used primarily to export environmental
variables and for the application monitor to gather GLVM statistics.
Note: The Action on Application Failure is set to fallover. The monitor script should
always exit 0 and never fail. However, it may be desirable to set it to notify along with a
notification method. This would provide an additional level of insurance that a fallover
would NOT occur in the event of a monitor failure.
The file collection is created with the attributes as shown in Example 15-8. The example
contains a combination of two different SMIT panels to show all in one view.
[Entry Fields]
File Collection Name GLVM_GLVMvg_RG_Json_fc
File Collection Description
[Entry Fields]
* Automatic File Update Time (in minutes) [10]
# clcmd lsrpvclient
-------------------------------
NODE jessica
-------------------------------
hdisk2 00c472c092322efd Unknown
-------------------------------
NODE jordan
-------------------------------
hdisk2 00c472c0de143bdf
Scalable: Yes
Quorum: Disabled/No
The mirror pools names are initially created with simply “MP” and “MP#” appended to the end
of the volume group name. It also sets the async cache high water mark to 80.
The high water mark is the percent of I/O cache size that can be used before new write
requests must wait for mirroring to catch up. For reference the default is 100. All the mirror
pool detailed attributes can be seen in Example 15-11.
Example 15-11 GLVM Async Mirror Pools Attributes and Member Disks
# lsmp -A GLVMvg
VOLUME GROUP: GLVMvg Mirror Pool Super Strict: yes
The AIO logical volume names are initially created with simply “ALV” and “ALV1” appended to
the end of the logical volume name. The names and attributes are shown in Example 15-13.
# lslv GLVMvgALV
LOGICAL VOLUME: GLVMvgALV VOLUME GROUP: GLVMvg
LV IDENTIFIER: 00c472c000004b00000001858b9bc6b7.2 PERMISSION: read/write
VG STATE: active/complete LV STATE: closed/syncd
TYPE: aio_cache WRITE VERIFY: off
MAX LPs: 512 PP SIZE: 8 megabyte(s)
COPIES: 1 SCHED POLICY: parallel
LPs: 8 PPs: 8
STALE PPs: 0 BB POLICY: non-relocatable
INTER-POLICY: minimum RELOCATABLE: yes
INTRA-POLICY: middle UPPER BOUND: 1
MOUNT POINT: N/A LABEL: None
DEVICE UID: 0 DEVICE GID: 0
DEVICE PERMISSIONS: 432
MIRROR WRITE CONSISTENCY: on/PASSIVE
EACH LP COPY ON A SEPARATE PV ?: yes (superstrict)
Serialize IO ?: NO
INFINITE RETRY: no PREFERRED READ: 0
DEVICESUBTYPE: DS_LVZ
COPY 1 MIRROR POOL: GLVMvgMP
COPY 2 MIRROR POOL: None
COPY 3 MIRROR POOL: None
ENCRYPTION: no
# lslv GLVMvgALV1
LOGICAL VOLUME: GLVMvgALV1 VOLUME GROUP: GLVMvg
LV IDENTIFIER: 00c472c000004b00000001858b9bc6b7.3 PERMISSION: read/write
VG STATE: active/complete LV STATE: closed/syncd
TYPE: aio_cache WRITE VERIFY: off
MAX LPs: 512 PP SIZE: 8 megabyte(s)
COPIES: 1 SCHED POLICY: parallel
LPs: 8 PPs: 8
STALE PPs: 0 BB POLICY: non-relocatable
INTER-POLICY: minimum RELOCATABLE: yes
INTRA-POLICY: middle UPPER BOUND: 1
MOUNT POINT: N/A LABEL: None
DEVICE UID: 0 DEVICE GID: 0
DEVICE PERMISSIONS: 432
MIRROR WRITE CONSISTENCY: on/PASSIVE
EACH LP COPY ON A SEPARATE PV ?: yes (superstrict)
Serialize IO ?: NO
INFINITE RETRY: no PREFERRED READ: 0
DEVICESUBTYPE: DS_LVZ
COPY 1 MIRROR POOL: GLVMvgMP1
COPY 2 MIRROR POOL: None
If this does not comply with your desired cluster configuration, additional details on changing
these policies and their possible values can be found in Table 2-1 on page 37.
Part 5
Part 5 Appendixes
This part includes the following appendixes:
Appendix A, “Paper planning worksheets” on page 579
Appendix B, “Cluster Test Tool log” on page 591
For additional information on planning, see the Planning PowerHA SystemMirror web page:
Node Names
Major Number
Physical Volumes
Size
Export options
Export Options
Export Options
Export Options
Application worksheet
Use the worksheet in Table A-7 to record information about applications in the cluster.
Directory
Executable Files
Configuration Files
Cluster Name
Node
Strategy
Verification Commands
Node A
Node B
Server name
Start Script
Stop Script
Server name
Start Script
Stop Script
Server name
Start Script
Stop Script
Monitor Method
Monitor Interval
Stabilization Interval
Restart Count
Restart Interval
Notify Method
Cleanup Method
Restart Method
Startup Policy
Fallover Policy
Fallback Policy
Settling Time
Runtime Policies
Service IP Label
file systems
Volume Groups
Tape Resources
Application Servers
Cluster Name
Miscellaneous Data
WPAR Name
COMMENTS
Event Command
Notify Command
Pre-Event Command
Post-Event Command
Recovery Counter
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:07:14: | Initializing Variable Table
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:07:14: Using Process Environment for Variable Table
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:07:14: | Reading Static Configuration Data
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:07:14: Cluster Name: jessica_cluster
26/11/2022_21:07:14: Cluster Version: 23
26/11/2022_21:07:14: Local Node Name: jessica
26/11/2022_21:07:14: Cluster Nodes: jessica jordan
26/11/2022_21:07:14: Found 1 Cluster Networks
26/11/2022_21:07:14: Found 4 Cluster Interfaces/Device/Labels
26/11/2022_21:07:14: Found 1 Cluster Resource Groups
26/11/2022_21:07:14: Found 10 Cluster Resources
26/11/2022_21:07:14: Event Timeout Value: 720
26/11/2022_21:07:14: Maximum Timeout Value: 2880
26/11/2022_21:07:14: Found 2 Cluster Sites
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:07:14: | Building Test Queue
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:07:14: Test Plan: /usr/es/sbin/cluster/cl_testtool/auto_topology
26/11/2022_21:07:14: Event 1: NODE_UP: NODE_UP,ALL,Start cluster services on all available
nodes
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:07:14: | Validate NODE_UP
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:07:14: Event node: ALL
26/11/2022_21:07:14: Configured nodes: jessica jordan
26/11/2022_21:07:14: Event 2: NODE_DOWN_GRACEFUL: NODE_DOWN_GRACEFUL,node1,Stop cluster
services gracefully on a node
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:07:14: | Validate NODE_DOWN_GRACEFUL
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:07:14: Event node: jessica
26/11/2022_21:07:14: Configured nodes: jessica jordan
26/11/2022_21:07:14: Event 3: NODE_UP: NODE_UP,node1,Restart cluster services on the node
that was stopped
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:07:14: | Validate NODE_UP
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:07:14: Event node: jessica
26/11/2022_21:07:14: Configured nodes: jessica jordan
26/11/2022_21:07:14: Event 4: NODE_DOWN_TAKEOVER: NODE_DOWN_TAKEOVER,node2,Stop cluster
services with takeover on a node
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:07:14: | Validate NODE_DOWN_TAKEOVER
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:07:14: Event node: jordan
26/11/2022_21:07:14: Configured nodes: jessica jordan
26/11/2022_21:07:14: Event 5: NODE_UP: NODE_UP,node2,Restart cluster services on the node
that was stopped
26/11/2022_21:07:14: -------------------------------------------------------
26/11/2022_21:09:39: ||
|| Test 3 Complete - NODE_UP: Restart cluster services on the node that was stopped
||
26/11/2022_21:09:39: || Test Completion Status: PASSED
||
26/11/2022_21:09:39:
===========================================================================
26/11/2022_21:09:39:
===========================================================================
26/11/2022_21:09:39: ||
|| Starting Test 4 - NODE_DOWN_TAKEOVER,jordan,Stop cluster services with takeover on a
node
||
26/11/2022_21:09:39:
===========================================================================
26/11/2022_21:09:39: -------------------------------------------------------
26/11/2022_21:09:39: | is_rational NODE_DOWN_TAKEOVER
26/11/2022_21:09:39: -------------------------------------------------------
26/11/2022_21:09:39: Checking cluster stability
26/11/2022_21:09:39: jessica: ST_STABLE
26/11/2022_21:09:39: jordan: ST_STABLE
26/11/2022_21:09:39: Cluster is stable
26/11/2022_21:09:39: -------------------------------------------------------
26/11/2022_21:09:39: | Executing Command for NODE_DOWN_TAKEOVER
26/11/2022_21:09:39: -------------------------------------------------------
26/11/2022_21:09:39: /usr/es/sbin/cluster/utilities/cl_rsh -n jordan
/usr/es/sbin/cluster/cl_testtool/cl_testtool_ctrl -e NODE_DOWN_TAKEOVER -m execute 'jordan'
26/11/2022_21:09:42: -------------------------------------------------------
26/11/2022_21:09:42: | Entering wait_for_stable
26/11/2022_21:09:42: -------------------------------------------------------
26/11/2022_21:09:42: Waiting 30 seconds for cluster to stabilize.
26/11/2022_21:10:13: Checking Node States:
26/11/2022_21:10:13: Node jessica: ST_STABLE
26/11/2022_21:10:13: Active Timers: None
26/11/2022_21:10:13: Node jordan: ST_INIT
26/11/2022_21:10:13: -------------------------------------------------------
26/11/2022_21:10:13: | NODE_DOWN_TAKEOVER: Checking post-event status
26/11/2022_21:10:13: -------------------------------------------------------
26/11/2022_21:10:13: pre-event online nodes: jessica jordan
26/11/2022_21:10:13: post-event online nodes: jessica
26/11/2022_21:10:13: Checking node states
26/11/2022_21:10:13: jessica: Preevent state: ST_STABLE, Postevent state: ST_STABLE
26/11/2022_21:10:13: jordan: Preevent state: ST_STABLE, Postevent state: ST_INIT
26/11/2022_21:10:13: Checking RG states
26/11/2022_21:10:13: Resource Group: redbookrg
26/11/2022_21:10:13: Node: jessica Pre Event State: ONLINE, Post Event State: ONLINE
26/11/2022_21:10:13: Node: jordan Pre Event State: OFFLINE, Post Event State: OFFLINE
26/11/2022_21:10:13: Checking event history
26/11/2022_21:10:13: Begin Event History records:
26/11/2022_21:10:13: NODE: jessica
Nov 26 2022 21:09:42 EVENT COMPLETED: site_down_remote FortWorth 0
<LAT>|2022-11-26T21:09:42|24439|EVENT COMPLETED: site_down_remote FortWorth
0|</LAT>
Nov 26 2022 21:09:42 EVENT COMPLETED: site_down FortWorth 0
<LAT>|2022-11-26T21:09:42|24439|EVENT COMPLETED: site_down FortWorth 0|</LAT>
Nov 26 2022 21:09:43 EVENT COMPLETED: node_down jordan 0
<LAT>|2022-11-26T21:09:43|24439|EVENT COMPLETED: node_down jordan 0|</LAT>
Nov 26 2022 21:09:45 EVENT COMPLETED: node_down_complete jordan 0
<LAT>|2022-11-26T21:09:45|24440|EVENT COMPLETED: node_down_complete jordan
0|</LAT>
||
26/11/2022_21:11:08:
===========================================================================
26/11/2022_21:11:08:
===========================================================================
26/11/2022_21:11:08: ||
|| Starting Test 6 - NODE_DOWN_FORCED,jessica,Stop cluster services forced on a node
||
26/11/2022_21:11:08:
===========================================================================
26/11/2022_21:11:09: -------------------------------------------------------
26/11/2022_21:11:09: | is_rational NODE_DOWN_FORCED
26/11/2022_21:11:09: -------------------------------------------------------
26/11/2022_21:11:09: Checking cluster stability
26/11/2022_21:11:09: jessica: ST_STABLE
26/11/2022_21:11:09: jordan: ST_STABLE
26/11/2022_21:11:09: Cluster is stable
26/11/2022_21:11:09: Node: jessica, Force Down:
26/11/2022_21:11:09: Node: jordan, Force Down:
26/11/2022_21:11:09: -------------------------------------------------------
26/11/2022_21:11:09: | Executing Command for NODE_DOWN_FORCED
26/11/2022_21:11:09: -------------------------------------------------------
26/11/2022_21:11:09: /usr/es/sbin/cluster/cl_testtool/cl_testtool_ctrl -e NODE_DOWN_FORCED
-m execute 'jessica'
26/11/2022_21:11:11: -------------------------------------------------------
26/11/2022_21:11:11: | Entering wait_for_stable
26/11/2022_21:11:11: -------------------------------------------------------
26/11/2022_21:11:11: Waiting 30 seconds for cluster to stabilize.
26/11/2022_21:11:41: Checking Node States:
26/11/2022_21:11:41: Node jessica: ST_STABLE
26/11/2022_21:11:41: Active Timers: None
26/11/2022_21:11:41: Node jordan: ST_STABLE
26/11/2022_21:11:41: Active Timers: None
26/11/2022_21:11:41: -------------------------------------------------------
26/11/2022_21:11:41: | NODE_DOWN_FORCED: Checking post-event status
26/11/2022_21:11:41: -------------------------------------------------------
26/11/2022_21:11:42: pre-event online nodes: jessica jordan
26/11/2022_21:11:42: post-event online nodes: jessica jordan
26/11/2022_21:11:42: Checking forced down node lists
26/11/2022_21:11:42: Node: jessica, Force Down: jessica
26/11/2022_21:11:42: Checking node states
26/11/2022_21:11:42: jessica: Preevent state: ST_STABLE, Postevent state: ST_STABLE
26/11/2022_21:11:42: jordan: Preevent state: ST_STABLE, Postevent state: ST_STABLE
26/11/2022_21:11:42: Checking RG states
26/11/2022_21:11:42: Resource Group: redbookrg
26/11/2022_21:11:42: Node: jessica Pre Event State: ONLINE, Post Event State: UNMANAGED
26/11/2022_21:11:42: Node: jordan Pre Event State: OFFLINE, Post Event State: UNMANAGED
26/11/2022_21:11:42: Checking event history
26/11/2022_21:11:42: Begin Event History records:
26/11/2022_21:11:42: NODE: jessica
Nov 26 2022 21:11:10 EVENT COMPLETED: admin_op clrm_stop_request 30965 0 0
<LAT>|2022-11-26T21:11:10|30965|EVENT COMPLETED: admin_op clrm_stop_request 30965
0 0|</LAT>
Nov 26 2022 21:11:13 EVENT COMPLETED: site_down_local 0
<LAT>|2022-11-26T21:11:13|30965|EVENT COMPLETED: site_down_local 0|</LAT>
Nov 26 2022 21:11:13 EVENT COMPLETED: site_down Dallas 0
<LAT>|2022-11-26T21:11:13|30965|EVENT COMPLETED: site_down Dallas 0|</LAT>
Nov 26 2022 21:11:14 EVENT COMPLETED: node_down jessica forced 0
<LAT>|2022-11-26T21:11:14|30965|EVENT COMPLETED: node_down jessica forced
0|</LAT>
26/11/2022_21:12:18: -------------------------------------------------------
26/11/2022_21:12:18: | Validate VG_DOWN
26/11/2022_21:12:18: -------------------------------------------------------
26/11/2022_21:12:18: Event node: ANY
26/11/2022_21:12:18: Configured nodes: jessica jordan
26/11/2022_21:12:18: VG: leevg, RG Name: redbookrg
26/11/2022_21:12:18:
###########################################################################
26/11/2022_21:12:18: ##
## Starting Cluster Test Tool: -c -e /usr/es/sbin/cluster/cl_testtool/auto_vg
##
26/11/2022_21:12:18:
###########################################################################
26/11/2022_21:12:18:
===========================================================================
26/11/2022_21:12:18: ||
|| Starting Test 1 - VG_DOWN,ANY,leevg
||
26/11/2022_21:12:18:
===========================================================================
26/11/2022_21:12:18: -------------------------------------------------------
26/11/2022_21:12:18: | is_rational VG_DOWN
26/11/2022_21:12:18: -------------------------------------------------------
26/11/2022_21:12:18: Checking cluster stability
26/11/2022_21:12:18: jessica: ST_STABLE
26/11/2022_21:12:18: jordan: ST_STABLE
26/11/2022_21:12:18: Cluster is stable
26/11/2022_21:12:18: VG: leevg, RG: redbookrg, ONLINE NODES: jessica
26/11/2022_21:12:18: -------------------------------------------------------
26/11/2022_21:12:18: | Executing Command for VG_DOWN
26/11/2022_21:12:18: -------------------------------------------------------
26/11/2022_21:12:18: /usr/es/sbin/cluster/cl_testtool/cl_testtool_ctrl -e VG_DOWN -m
execute 'leevg'
26/11/2022_21:12:19: -------------------------------------------------------
26/11/2022_21:12:19: | Entering wait_for_stable
26/11/2022_21:12:19: -------------------------------------------------------
26/11/2022_21:12:19: Waiting 30 seconds for cluster to stabilize.
26/11/2022_21:12:49: Checking Node States:
26/11/2022_21:12:49: Node jessica: ST_STABLE
26/11/2022_21:12:49: Active Timers: None
26/11/2022_21:12:49: Node jordan: ST_STABLE
26/11/2022_21:12:49: Active Timers: None
26/11/2022_21:12:49: -------------------------------------------------------
26/11/2022_21:12:49: | VG_DOWN: Checking post-event status
26/11/2022_21:12:49: -------------------------------------------------------
26/11/2022_21:12:50: RESID: 11, RG: redbookrg, RGID: 1, TYPE: 0
26/11/2022_21:12:50: Checking node states
26/11/2022_21:12:50: jessica: Preevent state: ST_STABLE, Postevent state: ST_STABLE
26/11/2022_21:12:50: jordan: Preevent state: ST_STABLE, Postevent state: ST_STABLE
26/11/2022_21:12:50: Volume Group: leevg Failure Action: fallover
26/11/2022_21:12:50: Checking RG states
26/11/2022_21:12:50: Resource Group: redbookrg
26/11/2022_21:12:50: Node: jessica Pre Event State: ONLINE, Post Event State: OFFLINE
26/11/2022_21:12:50: Node: jordan Pre Event State: OFFLINE, Post Event State: ONLINE
26/11/2022_21:12:50: Checking event history
26/11/2022_21:12:50: Begin Event History records:
26/11/2022_21:12:50: NODE: jessica
Nov 26 2022 21:12:20 EVENT COMPLETED: resource_state_change jessica 0
<LAT>|2022-11-26T21:12:20|30970|EVENT COMPLETED: resource_state_change jessica
0|</LAT>
Related publications
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.
IBM Redbooks
The following IBM Redbooks publications provide additional information about the topic in this
document. Some publications in this list might be available in softcopy only.
Guide to IBM PowerHA SystemMirror for AIX Version 7.1.3, SG24-8167
IBM PowerHA SystemMirror 7.1.2 Enterprise Edition for AIX, SG24-8106
IBM PowerHA SystemMirror Standard Edition 7.1.1 for AIX Update, SG24-8030
IBM PowerVM Virtualization Introduction and Configuration, SG24-7940
Understanding LDAP - Design and Implementation, SG24-4986
Asynchronous Geographic Logical Volume Mirroring Best Practices for Cloud
Deployment, REDP-5665
IBM Power Systems Private Cloud with Shared Utility Capacity: Featuring Power
Enterprise Pools 2.0, SG24-8478
Implementing High Availability Cluster Multi-Processing (HACMP) Cookbook, SG24-6769
IBM System Storage DS8000 Copy Services for Open Systems, SG24-6788
ILM Library: Information Lifecycle Management Best Practices Guide, SG24-7251
IBM System Storage SAN Volume Controller and Storwize V7000 Replication Family
Services, SG24-7574
IBM Power Systems High Availability and Disaster Recovery Updates: Planning for a
Multicloud Environment, REDP-5663
You can search for, view, download or order these documents and other Redbooks,
Redpapers, Web Docs, draft and additional materials, at the following website:
ibm.com/redbooks
Online resources
These websites are also relevant as further information sources:
IBM PowerHA SystemMirror for AIX documentation
https://www.ibm.com/docs/en/powerha-aix
List of current service packs for PowerHA:
http://www14.software.ibm.com/webapp/set2/sas/f/hacmp/home.html
PowerHA frequently asked questions:
http://www-03.ibm.com/systems/power/software/availability/aix/faq/index.html
List of supported devices by PowerHA:
http://ibm.co/1EvK8cG
SG24-7739-02
ISBN DocISBN
Printed in U.S.A.
®
ibm.com/redbooks