Oracle RAC Administration
Oracle RAC Administration
ADMINISTRATION
C H E TA N G U P T E
BACKGROUND P ROCESSES OF ORACLE 12 C RAC
The GCS and GES processes, and the GRD collaborate to enable Cache Fusion.
The Oracle RAC processses and their identifiers are as follows:
CRSd manages the resources like starting and stopping the services and failing-over the
cluster resources which can be Virtual IP, Database Instance, Listener or Database etc
CRS daemon has two modes of running. During startup and after a shutdown. During
planned clusterware start it is started as ‘reboot’ mode. It is started as ‘restart’ mode after
unplanned shutdown in which it retains the previous state and returns resources to their
previous states before shutdown.
OCSSD
It maintains membership in the cluster through a special file called a voting disk (also
referred to as a quorum disk).This is the first process that is started in the Oracle
Clusterware stack.
OCSS in Stand-alone Databases using ASM is used for the inter-instance communication and in
RAC environments, identified a Clustered Configuration.
OCSS reads OCR to locate VD and reads the VD to determine the number and names of
cluster members.
CSS verifies the number of nodes already registered as part of the cluster. After verification, if
no MASTER node has been established, CSS authorizes the verifying node to be the MASTER
node.This is the first node that attains the ACTIVE state. Cluster synchronization begins when
the MASTER node synchronizes with the other nodes.
OCSSD
OCSSd offers Node Membership(NM) and Group Membership(GM) services.
The NM checks the heartbeat across the various nodes in the cluster every second. If the
heartbeat/node members do not respond within 60 seconds, the node (among the surviving
nodes) that was started first (master) will start evicting the other node(s) in the cluster.
All clients that perform I/O operations register with the GM (e.g., the LMON, DBWR).
Reconfiguration of instances (when an instance joins or leaves the cluster) happens through
the GM. When a node fails, the GM sends out messages to other instances regarding the
status.
EVMD
It receives the FAN events posted by the clients and propagates the information
to the other nodes.
It is spawned by init.evmd wrapper script. It starts evmlogger child process which scans the
callout directory and starts racgevt process to execute the callouts.
ONS:
It is a publish and subscribe service for communicating Fast Application Notification (FAN) events
to clients.
Whenever the state of resource changes in the cluster nodes, CRS triggers a HA event and routes
them to the ONS process which propagates the information to other cluster nodes.
OPROCD:
OPROCd serves as the I/O fencing solution for the Oracle Clusterware.
It is the process monitor for Oracle Clusterware and it uses the hang check timer for the cluster
integrity so that the hanging nodes cannot perform any I/O. Failure of the OPROCd process causes
the node to restart.
CLUSTER SYNCHRONIZATION SERVICE (CSS):
Manages the cluster configuration by controlling which nodes are members of the cluster and
by notifying members when a node joins or leaves the cluster.
If you are using certified third-party clusterware, then CSS processes interfaces with your
clusterware to manage node membership information.
CSS has three separate processes:
the CSS daemon (ocssd)
the CSS Agent (cssdagent)and
the CSS Monitor (cssdmonitor)
The cssdagent process monitors the cluster and provides input/output fencing.
A cssdagent failure results in Oracle Clusterware restarting the node.
DISK MONITOR DAEMON (DISKMON):
Monitors and performs input/output fencing for Oracle Exadata Storage Server.
As Exadata storage can be added to any Oracle RAC node at any point in time, the diskmon
daemon is always started when ocssd is started.
ARCHIVE_LAG_TARGET
CONTROL_MANAGEMENT_PACK_ACCESS
LICENSE_MAX_USERS
LOG_ARCHIVE_FORMAT
SPFILE
UNDO_RETENTION
FLASH RECOVERY AREA:
Oracle recommends that you enable a flash recovery area to simplify your backup management.
Ideally, the flash recovery area should be large enough to contain all the following files:
A copy of all datafiles
Incremental backups
Online redo logs
Archived redo log files that have not yet been backed up
Control files and control file copies
Autobackups of the control file and database initialization parameter file
TROUBLESHOOTING ORACLE RAC:
Find status of Clusterware Stack:
./crsctl check crs
A RAC DBA might possibly face several issues which might be related to Clusterware Stack, Resources, OCR &
Voting Disk etc.
In the below example while trying to start the resources, if we get the issue like below:
CRS-0215: Could not start resource 'ora.prod2.vip’
We can debug any resources with the help of crsctl command as below:
./crsctl debug log res "ora.prod2.vip:2"
":2" denotes level of debugging and can be in the range of 1 to 5.
Checking the log files:
$CRS_HOME/log/<hostname>
Debugging Components:
We can also debug the Clusterware components i.e. CRS, EVM, OCSS etc
crsctl debug log crs "CRSD:1"
DIAGNOSTICS COLLECTION SCRIPT
Every time an Oracle Clusterware error occurs, you should use run the diagcollection.pl
script to collect diagnostic information from Oracle Clusterware in trace files.
The diagnostics provide additional information so Oracle Support can resolve problems.
Run this script from the following location:
CRS_home/bin/diagcollection.pl