0% found this document useful (0 votes)
4 views3 pages

AIX for System Administrators_ HA - CAA

The document provides a comprehensive guide on Cluster Aware AIX (CAA), detailing its features, commands, and configurations for creating and managing high availability clusters. It explains the role of CAA in conjunction with PowerHA, the requirements for the cluster repository disk, and the functionality of the AHAFS for event notifications. Additionally, it covers the deadman switch mechanism to protect data integrity during node isolation and includes various commands for cluster management.

Uploaded by

raj0000kaml
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views3 pages

AIX for System Administrators_ HA - CAA

The document provides a comprehensive guide on Cluster Aware AIX (CAA), detailing its features, commands, and configurations for creating and managing high availability clusters. It explains the role of CAA in conjunction with PowerHA, the requirements for the cluster repository disk, and the functionality of the AHAFS for event notifications. Additionally, it covers the deadman switch mechanism to protect data integrity during node isolation and includes various commands for cluster management.

Uploaded by

raj0000kaml
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

More rajraj.

kamlesh@

AIX for System Administrators


Practical Guide to AIX (and PowerVM, PowerHA, PowerVC, HMC, DevOps ...)

AIX-HW AIX-OS FS-LVM HA HMC-POWER NETWORKS NIM PERFORMANCE STORAGE POWERVM POWERVC DEVOPS EXTRA

HA - CAA
CAA (Cluster Aware AIX)

CAA is an AIX feature that was introduced in AIX 6.1 TL6 and AIX 7.1. It helps to easily create a cluster. CAA is not used as a stand-
alone package, it is used with PowerHA or with Shared Storage Pool. It can be seen as a set of commands and services that other
applications (like PowerHA, SSP) can exploit to provide high availability and disaster recovery support. CAA does not provide application
monitoring and resource failover capabilities, those are provided by PowerHA for example. IBM PowerHA, SSP and even RSCT (Reliable
Scalable Cluster Technology) use these built-in AIX clustering capabilities, and the reason for these built-in functions was to simplify
the configuration and management of high availability clusters.

CAA can provide specific events, so that applications can monitor these from any node in the cluster:
Node UP and node DOWN
Network adapter UP and DOWN
Network address change
Disk UP and DOWN
Predefined and user-defined events

CAA needs the following ports on all nodes for network communication:
4098 (for multicast)
6181
16191
42112

Checking CAA related daemons (services):


# lssrc -g caa
Subsystem Group PID Status
clcomd caa 6553806 active
clconfd caa 6619352 active

clcomd: It is the cluster communications daemon. Since PowerHA 7.1 it is a CAA service, before PowerHA 7.1 it was part of PowerHA
(clcomdES). The rhosts file that is used by the clcomd is in the /etc/cluster/rhosts file. The old clcomdES rhosts file in the
/usr/es/sbin/cluster/etc directory is not used.
clconfd: The clconfd subsystem runs on each node of the cluster. The clconfd daemon wakes up every 10 minutes to synchronize any
necessary cluster changes.

Starting with AIX 7.1 TL2, no longer require a total cluster outage to upgrade the cluster nodes:
A rolling upgrade of a cluster is done by taking a node offline and upgrading it to a new AIX technology level, while the other nodes
remain active. After a node is upgraded, the node is rebooted and brought online by issuing the clctrl command. This process is repeated
until all the nodes are upgraded. In a mixed cluster environment, CAA maintains compatibility with nodes that are still running prior AIX
levels. New features are not enabled until all the cluster nodes are upgraded to the new technology level.

=============================

CAA Repository disk

The cluster repository disk stores the cluster configuration data. It provides a central repository, so this disk must be accessible from
all nodes in the cluster. A minimal disk size of 10 GB is preferred (1GB may be also enough). This disk cannot be used for application
storage or any other purposes, the use of LVM commands (mkvg, mklv...) are not supported whith the cluster repository disk. The AIX LVM
commands are single node administrative commands, and are not applicable in a clustered configuration. (The cluster repository disk must
be compliant with the 512 byte block size.)

CAA stores the repository disk related information in the ODM CuAt, as part of the cluster information.

# odmget CuAt | grep -p cluster


CuAt:
name = "cluster0"
attribute = "node_uuid"
value = "52a6b8be-fff8-11e5-8e37-56a1a7627864"
type = "R"
generic = "DU"
rep = "s"
nls_index = 3

CuAt:
name = "cluster0"
attribute = "clvdisk"
value = "d7063c81-3f64-b5f7-d82b-fa8ed99bfe61"
type = "R"
generic = "DU"
rep = "s"
nls_index = 2
In case this ODM entry is missing (which can cause that a node will fail to join the cluster) it can be repopulated (and the node forced
to join the cluster) using clusterconf command: clusterconf -r hdisk# (hdisk# is the repository disk)

=============================

/aha (AHAFS)

Nodes that belong to a CAA cluster use the common AIX HA File System (AHAFS) for event notification. AHAFS is a pseudo file system used
for synchronized information exchange and it is implemented in the AIX kernel extension. AHAFS is mounted on /aha. It can monitor
predefined and user-defined system events and automatically notifies registered users or processes about the occurrences of the following
types of events:
- Modification of content of a file
- Usage of a file system that exceeds a user-defined threshold
- Death of a process
- Change in the value of a kernel tunable parameter

=============================

Creating a cluster

The command mkcluster is used for creating a CAA cluster:


# mkcluster -n mycluster -m nodeA,nodeB -r hdisk1 -d hdisk2,hdisk3

The name of the cluster is mycluster, the nodes are nodeA and nodeB, the repository disk is hdisk1 and the shared disks are hdisk2 and
hdisk3. When the cluster is ready a special volume group (caavg_private), new logical volumes and filesystems are created.

The following happens after issuing the mkcluster command:


- The cluster configuration is written to the cluster repository disk.
- Special volume groups, logical volumes, filesystems are created on the cluster repository disk. (caavg_private)
- Cluster services are made available to other applications like RSCT or PowerHA.
- Additional storage related taks...
- A clusterwide multicast address is established.
- The node discovers and monitors the available communication interfaces.
- The cluster interacts with Autonomic Health Advisory File System (AHAFS) for clusterwide event distribution and makes messages
available to PowerHA, RSCT...

CAA uses IP based network communications and storage interface communication through Fibre Channel. When using both type of
communication, all nodes in the cluster can always communicate with any other nodes in the cluster and thus eliminating "split brain"
incidents. If some node cannot communicate with others DMS (Dead Man Switch) timers are triggered.

A deadman switch is an action when Cluster Aware AIX (CAA) detects that a node become isolated. This occurs when nodes are not capable of
communicating via network and repository disk anymore. Based on the deadman switch setting (or the deadman_mode tunable) the AIX
operating system can react differently. DMS monitors for some specific time (node_timout) IO traffic, process health etc. and after the
timeout it can force a system shutdown or generate an Autonomic Health Advisor File System (AHAFS) event.

=============================

Deadman switch (DMS)

A deadman switch is an action when CAA detects that a node become isolated. This occurs when nodes are not capable of communicating via
network and repository disk anymore. The purpose of the DMS is to protect the data on the external disks. The AIX operating system reacts
differently depending on the DMS (deadman_mode) tunable. The deadman switch tunable can be set to force a system crash or generate an
AHAFS event.

# clctrl -tune -L deadman_mode <--check the current setting (clctrl -tune -h deadman_mode gives more details
NAME DEF MIN MAX UNIT SCOPE
ENTITY_NAME(UUID) CUR
--------------------------------------------------------------------------------
deadman_mode a c n
caa_cl(25ebea90-784a-11e1-a79d-b6fcc11f846f) a
--------------------------------------------------------------------------------

When the value is set to "a" (assert), the node will crash upon the deadman timer popping.
When the value is set to "e" (event), an AHAFS event is generated.

By default, the CAA deadman_mode option is “a”. If the deadman timeout is reached, the node crashes immediately to prevent a partitioned
cluster and data corruption.

=============================

Commands:

/var/adm/ras/syslog.caa caa log (in syslog: caa.info /var/adm/ras/syslog.caa rotate size 1m files 10)

lscluster -i lists interface configuration of the cluster


lscluster -i | egrep 'Node|Interface' overview of cluser, all interfaces (network and disk heartbeat)
lscluster -m lists info about nodes in the cluster
lscluster -d lists disks in the cluster and their status
lscluster -s lists statistis of network of a cluster
lscluster -c shows info about cluster configuration

mkcluster create a cluster


chcluster change a cluster configuration
rmcluster remove a cluster configuration
clcmd run a command on all nodes of a cluster (clcmd date: it shows the date on all nodes)

lsattr -El cluster0 lists IDs of cluster, disks

clctrl -stop -n mycluster -a stop cluster on all nodes (stop cluster on 1 node: clctrl -stop -n mycluster -m myserver1)
clctrl -start -n mycluster -a start cluster on all nodes (after completing maintenance)
clctrl -tune -L lists CAA related tunables (values stored in repository disk)
clctrl -tune -o <tunable>=<value> modifies a tunable across cluster (new value will be active at the next start)
=============================

If you want to use force option with CAA commands (not -f flag), the environment variable CAA_FORCE_ENABLE has to be set to 1:
# export CAA_FORCE_ENABLED=1
# rmcluster -r hdisk2

(Using force with rmcluster will remove the repository disk and ignore all errors.

=============================

Labels: HACMP

3 comments:
Anonymous said...
One question. How do you varyoff a "caavg_private" VG without messing up the cluster. I need to UPDATE SDDPCM and I am not sure if removing
the cluster will be a good choiche.
February 20, 2018 at 11:24 AM

Satpal said...
This comment has been removed by the author.
July 28, 2018 at 10:40 AM

Satpal said...
You can varyoff CAA (caavg_private) using 'clctl -stop -n CLUSTERNAME -m NODENAME'
Use "clctl -start -n CLUSTERNAME -m NODENAME" to varyon. (Try this command from the other node of cluster if you are facing problem with
varyon on problematic node.)
July 28, 2018 at 10:42 AM

Post a Comment

Newer Post Home Older Post

Subscribe to: Post Comments (Atom)

© aix4admins.blogspot.com (2015) - Unauthorized use of this material is strictly prohibited.. Powered by Blogger.

You might also like