10/18/24, 2:39 SLE HA 15 SP2 | Administration Guide | Executing
PM Maintenance Tasks
Applies to SUSE Linux Enterprise High Availability 15 SP2
27 Executing
Maintenance Tasks
To perform maintenance tasks on the cluster nodes, you might need
to stop the resources running on that node, to move them, or to shut
down or reboot the node. It might also be necessary to temporarily
take over the control of resources from the cluster, or even to stop the
cluster ser‐ vice while resources remain running.
This chapter explains how to manually take down a cluster node
with‐ out negative side-effects. It also gives an overview of different
options the cluster stack provides for executing maintenance tasks.
27.1 Preparing and
Finishing Maintenance
Work
Use the following commands to start, stop, or
view the status of the cluster:
crm cluster start [--all]
Start the cluster services on one node or all
nodes
crm cluster stop [--all]
Stop the cluster services on one node or all
nodes
crm cluster restart [--all]
Restart the cluster services on one node or
[Link] 1/11
all nodes
[Link]
10/18/24, 2:39 SLE HA 15 SP2 | Administration Guide | Executing
PM Maintenance Tasks
crm cluster status
View the status of the cluster stack
Execute the above commands as user root , or as a user with the
required privileges.
When you shut down or reboot a cluster node (or stop the cluster services
on a node), the following processes will be triggered:
The resources that are running on the node will be stopped or moved
off the node.
If stopping a resource fails or times out, the STONITH mechanism
will fence the node and shut it down.
Warning: Risk of data loss
If you need to do testing or maintenance work, follow the general
steps below.
Otherwise, you risk unwanted side effects, like resources not starting in an
or‐ derly fashion, unsynchronized CIBs across the cluster nodes, or even
data loss.
[Link] you start, choose the appropriate option from Section
27.2, “Different Options for Maintenance Tasks”.
[Link] this option with Hawk2 or crmsh.
3. Execute your maintenance task or tests.
4. After you have finished, put the resource, node or cluster back to
“nor‐ mal” operation.
27.2 Different Options for
Maintenance Tasks
Pacemaker offers the following options for performing system
maintenance:
[Link] 2/11
[Link]
10/18/24, 2:39 SLE HA 15 SP2 | Administration Guide | Executing
PM Maintenance Tasks
Putting the Cluster into Maintenance Mode
The global cluster property maintenance-mode puts all resources into
main‐ tenance state at once. The cluster stops monitoring them and
becomes oblivious to their status. Note that only the resource
management by Pacemaker is disabled. Corosync and SBD are still
functional. Use mainte‐ nance mode for any tasks involving cluster
resources. For any tasks involving infrastructure such as storage or
networking, the safest method is to stop the cluster services completely.
See Section 27.6, “Stopping the Cluster Services on a Node”.
Putting a Node into Maintenance Mode
This option allows you to put all resources running on a specific node into
maintenance state at once. The cluster will cease monitoring them and
thus become oblivious to their status.
Putting a Node into Standby Mode
A node that is in standby mode can no longer run resources. Any
resources running on the node will be moved away or stopped (if no
other node is eli‐
gible to run the resource). Also, all monitoring operations will be stopped
on the node (except for those with role="Stopped" ).
You can use this option if you need to stop a node in a cluster while
continu‐ ing to provide the services running on another node.
Stopping the Cluster Services on a Node
This option stops all of the cluster services on a single node. Any resources
running on the node will be moved away or stopped (if no other node is
eli‐ gible to run the resource). If stopping a resource fails or times out, the
node will be fenced.
Putting a Resource into Maintenance Mode
When this mode is enabled for a resource, no monitoring operations will
be triggered for the resource.
Use this option if you need to manually touch the service that is managed
by this resource and do not want the cluster to run any monitoring
operations for the resource during that time.
[Link] 3/11
[Link]
10/18/24, 2:39 SLE HA 15 SP2 | Administration Guide | Executing
PM Maintenance Tasks
Putting a Resource into Unmanaged Mode
The is-managed meta attribute allows you to temporarily “release” a re‐
source from being managed by the cluster stack. This means you can
man‐ ually touch the service that is managed by this resource (for
example, to a d ‐ just any components). However, the cluster will continue
to monitor the re‐ source and to report any failures.
If you want the cluster to also cease monitoring the resource, use the per-
re‐ source maintenance mode instead (see Putting a Resource into
Maintenance Mode).
27.3 Putting the Cluster into
Maintenance Mode
Warning: Maintenance mode only disables Pacemaker
When putting a cluster into maintenance mode, only the resource
manage‐ ment by Pacemaker is disabled. Corosync and SBD are still
functional.
Depending on your maintenance tasks, this might lead to fence
operations.
Use maintenance mode for any tasks involving cluster resources. For any
tasks involving infrastructure such as storage or networking, the safest
method is to stop the cluster services completely. See Section 27.6,
“Stopping the Cluster Services on a Node”.
To put the cluster into maintenance mode on the crm shell, use the
following command:
# crm maintenance on
To put the cluster back to normal mode after your maintenance work is
done, use the following command:
# crm maintenance off
PROCEDURE 27.1: PUTTING THE CLUSTER INTO MAINTENANCE MODE WITH HAWK2
1. Start a Web browser and log in to the cluster as described
in Section 5.4.2, “Logging In”.
[Link] 4/11
[Link]
10/18/24, 2:39 SLE HA 15 SP2 | Administration Guide | Executing
PM Maintenance Tasks
[Link] the left navigation bar, select Configuration › Cluster Configuration.
3. Select the maintenance-mode attribute from the empty drop-down
box.
4. From the maintenance-mode drop-down box, select Yes.
5. Click Apply.
6. After you have finished the maintenance task for the whole cluster,
select
No from the maintenance-mode drop-down box, then click Apply.
From this point on, High Availability will take over cluster
management again.
27.4 Putting a Node into
Maintenance Mode
To put a node into maintenance mode on the crm shell, use the
following command:
# crm node maintenance NODENAME
To put the node back to normal mode after your maintenance work is done,
use the following command:
# crm node ready NODENAME
PROCEDURE 27.2: PUTTING A NODE INTO MAINTENANCE MODE WITH HAWK2
[Link] a Web browser and log in to the cluster as described
in Section 5.4.2, “Logging In”.
[Link] the left navigation bar, select Cluster Status.
3. In one of the individual nodes' views, click the wrench icon next to
the node and select Maintenance.
4. After you have finished your maintenance task, click the wrench icon
next to the node and select Ready.
[Link] 5/11
[Link]
10/18/24, 2:39 SLE HA 15 SP2 | Administration Guide | Executing
PM Maintenance Tasks
To put a node into standby mode on the crm shell, use the following
command:
# crm node standby NODENAME
To bring the node back online after your maintenance work is done, use the
fol‐ lowing command:
# crm node online NODENAME
PROCEDURE 27.3: PUTTING A NODE INTO STANDBY MODE WITH HAWK2
[Link] a Web browser and log in to the cluster as described
in Section 5.4.2, “Logging In”.
[Link] the left navigation bar, select Cluster Status.
3. In one of the individual nodes' views, click the wrench icon next to
the node and select Standby.
4. Finish the maintenance task for the node.
5. To deactivate the standby mode, click the wrench icon next to the
node and select Ready.
27.6 Stopping the Cluster Services on
a Node
You can move the services off the node in an orderly fashion before shutting
down or rebooting the node. This allows services to migrate off the node
without being limited by the shutdown timeout of the cluster services.
PROCEDURE 27.4: MANUALLY REBOOTING A CLUSTER NODE
[Link] the node you want to reboot or shut down, log in as root
or equivalent.
[Link] the node into standby mode:
# crm node standby
[Link] 6/11
[Link]
10/18/24, 2:39 SLE HA 15 SP2 | Administration Guide | Executing
PM Maintenance Tasks
By default, the node will remain in standby mode after rebooting.
Alternatively, you can set the node to come back online
automatically with crm node standby reboot .
3. Check the cluster status:
# crm status
It shows the respective node in standby mode:
[...]
Node bob:
standby [...]
4. Stop the cluster
services on that
node:
# crm cluster
stop
5. Reboot the
node.
To check if the node
joins the cluster
again:
[Link] the node
reboots, log in
to it again.
[Link] if the
cluster services
have started:
# crm cluster
status
This might take some time. If the cluster services do not start again
on their own, start them manually:
# crm cluster start
27.7 Putting
3. Check a status:
the cluster Resource
into #Maintenance
crm status Mode
4. If the node is still in standby mode, bring it back online:
[Link] 7/11
[Link] # crm node online
10/18/24, 2:39 SLE HA 15 SP2 | Administration Guide | Executing
PM Maintenance Tasks
To put a resource into maintenance mode on the crm shell, use the
following command:
# crm resource maintenance RESOURCE_ID true
To put the resource back into normal mode after your maintenance work
is done, use the following command:
# crm resource maintenance RESOURCE_ID false
PROCEDURE 27.5: PUTTING A RESOURCE INTO MAINTENANCE MODE WITH HAWK2
[Link] a Web browser and log in to the cluster as described
in Section 5.4.2, “Logging In”.
[Link] the left navigation bar, select Resources.
3. Select the resource you want to put in maintenance mode or
unman‐ aged mode, click the wrench icon next to the resource
and select Edit Resource.
4. Open the Meta Attributes category.
5. From the empty drop-down list, select the maintenance attribute
and click the plus icon to add it.
6. Activate the check box next to maintenance to set the maintenance
attri‐ bute to yes .
[Link] your changes.
8. After you have finished the maintenance task for that resource,
deacti‐ vate the check box next to the maintenance attribute for
that resource.
From this point on, the resource will be managed by the High
Availability software again.
27.8 Putting a Resource into
Unmanaged Mode
To put a resource into unmanaged mode on the crm shell, use the
following
[Link] 8/11
[Link]
10/18/24, 2:39 SLE HA 15 SP2 | Administration Guide | Executing
PM Maintenance Tasks
command:
# crm resource unmanage RESOURCE_ID
To put it into managed mode again after your maintenance work is done,
use the following command:
# crm resource manage RESOURCE_ID
PROCEDURE 27.6: PUTTING A RESOURCE INTO UNMANAGED MODE WITH HAWK2
[Link] a Web browser and log in to the cluster as described
in Section 5.4.2, “Logging In”.
[Link] the left navigation bar, select Status and go to the
Resources list.
3. In the Operations column, click the arrow down icon next to the
resource you want to modify and select Edit.
The resource configuration screen opens.
4. Below Meta Attributes, select the is-managed entry from the
empty drop-down box.
5. Set its value to No and click Apply.
6. After you have finished your maintenance task, set is-managed to
Yes (which is the default value) and apply your changes.
From this point on, the resource will be managed by the High
Availability software again.
[Link] 9/11
[Link]
10/18/24, 2:39 SLE HA 15 SP2 | Administration Guide | Executing
PM Maintenance Tasks
27.9 Rebooting a Cluster Node While
in Maintenance Mode
Note: Implications
If the cluster or a node is in maintenance mode, you can use tools external
to the cluster stack (for example, systemctl ) to manually operate the
compo‐ nents that are managed by the cluster as resources. The High
Availability software will not monitor them or attempt to restart them.
If you stop the cluster services on a node, all daemons and processes
(origi‐ nally started as Pacemaker-managed cluster resources) will
continue to run.
If you attempt to start cluster services on a node while the cluster or node
is in maintenance mode, Pacemaker will initiate a single one-shot monitor
op‐ eration ( a “probe”) for every resource to evaluate which resources
are cur‐ rently running on that node. However, it will take no further action
other than determining the resources' status.
PROCEDURE 27.7: REBOOTING A CLUSTER NODE WHILE THE CLUSTER OR NODE
IS IN MAINTENANCE MODE
[Link] the node you want to reboot or shut down, log in as root
or equivalent.
[Link] you have a DLM resource (or other resources depending on DLM),
make sure to explicitly stop those resources before stopping the
cluster services:
crm(live)resource# stop RESOURCE_ID
[Link] 10/
[Link] 11
10/18/24, 2:39 SLE HA 15 SP2 | Administration Guide | Executing
PM Maintenance Tasks
The reason is that stopping Pacemaker also stops the Corosync
service on whose membership and messaging services DLM depends.
If Corosync stops, the DLM resource will assume a split brain scenario
and trigger a fencing operation.
3. Stop the cluster services on that node:
# crm cluster stop
4. Shut down or reboot the node.
[Link] 11/
[Link] 11