0% found this document useful (0 votes)
9 views23 pages

RAC System Test Plan Outline

Rac test plan steps

Uploaded by

Rania Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views23 pages

RAC System Test Plan Outline

Rac test plan steps

Uploaded by

Rania Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

RAC Assurance Team

RAC System Test Plan Outline


Oracle 10gR2 and 11gR1

Purpose
Before a new computer /cluster system is deployed in production it is important to test the system thoroughly to validate
that it will perform at a satisfactory level, relative to its service level objectives. Testing is also required when
introducing major or minor changes to the system. This document provides an outline consisting of basic guidelines and
recommendations for how to test a new system. It can be used as a framework for building a system test plan specific to
each company’s RAC implementation and the associated service level objectives.

Scope of System Testing


This document provides an outline of basic testing guidelines that will be used to validate core component functionality
for RAC environments in the form of an organized test plan. Every application exercises the underlying software and
hardware infrastructure differently, and must be tested as part of a component testing strategy. Each new system must be
tested thoroughly, in an environment that is a realistic representation of the production environment in terms of
configuration, capacity, and workload prior to going live or after implementing significant architectural/system
modifications. Without a completed system implementation and functional available end-user applications, only core
component functionality and testing is possible to verify cluster, RDBMS and various sub-component behaviors for the
Networking, I/O subsystem and miscellaneous database administrative functions.

In addition to the specific system testing outlined in this document additional testing needs to be defined and executed
for RMAN, backup and recovery, and Data Guard (for disaster recovery). Each component area of testing also requires
specific operational procedures to be documented and maintained to address site-specific requirements.

Testing Objectives
In addition to application functionality testing, overall system testing is normally performed for one or more of the
following reasons:
 Verify that the system has been installed and configured correctly. Check that nothing is broken.
 Verify that basic functionality still works in a specific environment and for a specific workload. Vendors normally
test their products very thoroughly, but it is not possible to test all possible HW/SW combinations and unique
workloads.
 Make sure that the system will achieve its objectives, in particular, availability and performance objectives. This can
be very complex and normally requires some form of simulated production environment and workload.
 Test operational procedures. This includes normal operational procedures and recovery procedures.
 Train operations staff.

Planning System Testing


Effective system testing requires careful planning. The service level objectives for the system itself and for the testing
must be clearly understood and a detailed test plan should be documented. The basis for all testing is that the current
best practices for RAC system configuration have been implemented before testing.

Testing should be performed in an environment that mirrors the production environment as much as possible. The
software configuration should be identical but for cost reasons it might be necessary to use a scaled down hardware
configuration. All testing should be performed while running a workload that is as close to production as possible.
Generating a realistic workload can be complex and expensive but it is probably the most important factor for effective
testing.
For each individual test in the plan, a clear understanding of the following is required:
 What is the objective of the test and how does this relate to the overall system objectives?
 Exactly how will the test be performed and what are the execution steps?

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan
 What are the success/failure criteria, and what are the expected results?
 How will the test result be measured?
 Which tools will be used?
 Which logfiles and other data will be collected?
 Which operational procedures normal and recovery are relevant?

Production Simulation / System Stress Test


The best way to ensure that the system will perform well without any problems is to simulate production workload and
conditions before going live. Ideally the system should be stressed a little more than what is expected in production. In
addition to running the normal user and application workload, all normal operational procedures should also be tested at
the same time. The output from the normal monitoring procedures should be kept and compared with the real data when
going live. Normal maintenance operations such as adding users, adding disk space, reorganizing tables and indexes,
backup, archiving data, etc. must also be tested. A commercial or in-house developed workload generator is essential.

Destructive Testing
The system configuration and operational procedures must also be tested to make sure that component failures and other
problems can be dealt with as efficiently as possible and with minimum impact on system availability. This section
provides some examples of tests that can be used as part of a system test plan. The idea is to test the system’s robustness
against various failures. Depending on the overall architecture and objectives, only some of the tests might be used
and/or additional tests might have to be constructed. Introducing multiple failures at the same time should also be
considered.

This list only covers testing for RAC-related components and procedures. Additional tests are required for other parts of
the system. These tests should be performed with a realistic workload on the system. Procedures for detecting and
recovering from these failures must also be tested.

In some worst-case scenarios it might not be possible to recover the system within an acceptable time frame and a
disaster recovery plan should specify how to switch to an alternative system or location. This should also be tested.
The result of a test should initially be measured at a business or user level to see if the result is within the service level
agreement. If a test fails it will be necessary to gather and analyze the relevant log and trace files. The analysis can result
in system tuning, changing the system architecture or possibly reporting component problems to the appropriate vendor.
Also, if the system objectives turn out to be unrealistic, they might have to be changed.

In a multi machine environment it is important to be able to compare times of events happening on different machines.
To facilitate comparison and analysis of log and trace files from multiple servers, ntp or a similar facility should be
implemented.

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan
System Testing Scenarios
Test # / Test Procedure Expected Results Measures Actual Results/Notes
Contact
Test 1 Node Failure Start client workload The instances and other CRS Time to detect node or
Identify instance with most client connections resources that were running on that instance failure
Reboot the node where the most loaded instance node go offline (no value for Time to complete
is running ‘Host’ field of crs_stat output) instance recovery. Check
- For AIX, HPUX, Windows: The VIP becomes owned by one alert log for instance
‘shutdown –r’ of the surviving nodes performing the recovery
- For Linux: ‘shutdown –r now’ One other instance performs Time to restore client
- For Solaris: ‘reboot’ instance recovery activity to same level
Services are moved to available (assuming remaining
instances, if the downed instance is nodes have sufficient
specified as a preferred instance capacity to run workload)
Client connections are moved / Duration of database
reconnected to surviving instances freeze during failover.
(Procedure and timings will Time before failed
depend on client types and instance is restarted
configuration) automatically by CRS and
After a short freeze, surviving is accepting new
instances continue processing their connections
workload
The workload from the failed
node will not get started
automatically. With TAF and
select failover configured, select
sessions will continue, not
update/delete/insert sessions.
Test 2 Node Failure As above, but simulate a hard / uncontrolled node
failure:
On many servers the power-off switch will
perform a controlled shutdown, and it might be
necessary to cut the power supply

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Measures Actual Results/Notes
Contact
Test 3 Restart Failing nodeapps / instance / Time for all resources to
Failed Node listener will be restarted by CRS, become available again,
unless this feature has been Check with crs_stat –t.
disabled

Test 4 Reboot all For AIX, HPUX, Windows: All nodes, instances and resources Time for all resources to
nodes at the ‘shutdown –r’ are restarted without problems become available again,
same time For Linux: ‘shutdown –r now’ Check with crs_stat –t.
For Solaris: ‘reboot’
Test 5 Instance Start client workload One other instance performs Time to detect instance
Failure Identify instance with the most client instance recovery failure
connections: Services are moved to available Time to complete
- For AIX, HPUX, Linux, Solaris: instances, if a preferred instance instance recovery. Check
Find pid of the process with the most failed alert log for recovering
client connections in this Client connections are moved / instance
instance and issue: reconnected to surviving instances Time to restore client
‘kill –9 <pid for loaded process>’ (Procedure and timings will activity to same level
- For Windows: find the thread ID of the depend on client types and (assuming remaining
thread with the most client connections: configuration) nodes have sufficient
After a short freeze, surviving capacity to run workload)
select b.name, p.spid from v$bgprocess instances continue processing the Duration of database
b, v$process p where b.paddr=p.addr; workload freeze during failover.
Failing instance will be restarted Time before failed
Then run orakill reference Note 69882.1 by CRS, unless this feature has instance is restarted
been disabled automatically by CRS and
is accepting new
connections
Test 6 Instance As above, but: ‘Shutdown abort’
Failure

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Measures Actual Results/Notes
Contact
Test 7 Restart Automatic restart by CRS or Node rejoins CRS cluster without Time before services and 
Failed Manual restart when the "Auto Start" option for any problems (review CRS and workload are rebalanced
Instance the related instance has been disabled. system logs) across all instances
Instance rejoins RAC cluster (including any manual
without any problems (review alert steps)
logs etc.) Duration of database
Short database freeze when failed freeze when instance
instance rejoins cluster rejoins cluster
Client connections and workload
will be load balanced across the
new instance (Manual procedure
might be required to redistribute
workload if long running /
permanent connections)
Test 8 ASM ‘shutdown abort’ (ASM instance) One other instance performs Time to detect instance
Instance instance recovery failure
Failure Services are moved to available Time to complete
instances, if a preferred instance instance recovery. Check
failed alert log for recovering
Client connections are moved / instance
reconnected to surviving instances Time to restore client
(Procedure and timings will activity to same level
depend on client types and (assuming remaining
configuration) nodes have sufficient
After a short freeze, surviving capacity to run workload)
instances continue processing the Duration of database
workload freeze during failover.
Failing instance will be restarted Time before failed
by CRS, unless this feature has instance is restarted
been disabled automatically by CRS and
Related ASM and instance CRS is accepting new
resources will no longer be running connections
on the designated host in crs_stat -t

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Measures Actual Results/Notes
Contact
Test 9 Multiple Kill PMON process for two different instances on Both instances should be recovered As for instance failure. N/A in Sandbox (2 node
Instance different nodes and restarted without problems. cluster)
Failure - For AIX, HPUX, Linux, Solaris:
Find pid of the PMON process in this
instance and issue kill command:
ps -ef|grep pmon
‘kill –9 <pid for PMON process>’
- For Windows: find the thread ID of the
PMON background:

select b.name, p.spid from v$bgprocess


b, v$process p where b.paddr=p.addr;

Then run orakill. Reference Note 69882.1


Test 10 Instance Remove access to disks with control files from one IMR will evict the instance from Actual time before
loses access node. the RAC cluster after 10 minutes. instance is evicted.
to
controlfile(s)

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Measures Actual Results/Notes
Contact
Test 11 Listener For AIX, HPUX, Linux and Solaris: No impact on connected database Time for CRS to detect
Failure ‘kill –9 <pid for listener process on one node>’ sessions. failure and restart listener.
New connections are redirected to
For Windows: listener on other node (depends on
pskill <pid for listener process – tnslsnr.exe – from client configuration)
task manager on one node>’ Local database instance will
receive new connections if you are
Pskill can be acquired from the PSTools suite: using shared server. Local
http://www.sysinternals.com database instance will NOT receive
new connections if dedicated
server is being used.
Listener restarted by CRS
Test 12 CRS Process For AIX, HPUX, Linux and Solaris: CRSD process is restarted. (Check Time to restart CRSD
Failure ps -ef|grep crsd.bin CRS logs) process
kill -9 <pid found from previous command>
For Windows:
pskill <pid for crsd.exe from task manager>

Pskill can be acquired from the PSTools suite on


http://www.sysinternals.com

Test 13 CRS Process For AIX, HPUX, Linux and Solaris: As above As above
Failure ps -ef|grep evmd.bin
kill -9 <pid found from previous command>

For Windows:
pskill <pid for evmd.exe from task manager>

Pskill can be acquired from the PSTools suite on


http://www.sysinternals.com

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Measures Actual Results/Notes
Contact
Test 14 CRS Process For AIX, HPUX, Linux and Solaris: Node will reboot.
Failure ps -ef|grep ocssd.bin
kill -9 <pid found from previous command>
For Windows:
pskill <pid for ocssd.exe from task manager>

Pskill can be acquired from the PSTools suite on


http://www.sysinternals.com

Test 15 Public Unplug all network cables for the public network VIP and Instance should shut As for instance failure
Network down and be deregistered with the
(VIP) Failure surviving listeners.
If TAF is configured, clients
should fail over to an available
node. This node can be tracked by
querying the VIP location from
OCR using crs_stat –t
For 10.2.0.3 patchset and higher,
the VIP will failover to surviving
node and the Instance will stay up.
Test 16 Public NIC Assuming dual NICs in a bonding or teamed Network traffic should fail over to Time to fail over to the
Failure configuration other NIC without any impact on other NIC card. With
Remove one NIC VIP or clients. bonding / teaming
configured this should be
less than 100ms.

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Measures Actual Results/Notes
Contact
Test 17 Interconnect Unplug all network cables for the interconnect CRS and/or RAC will detect split- As for instance failure
Network network brain situation and evict node and Time to detect split brain
Failure instance from CRS cluster and and start eviction.
RAC cluster.

In a two-node cluster the node with


the lowest node number will
survive.

In a multiple node cluster the


biggest subcluster will survive.

In case of equal subcluster the one


with the lowest node number will
survive.
Test 18 Interconnect Assuming dual NICs in a bonding or teamed Network traffic should fail over to Time to fail over to other
NIC Failure configuration other NIC without any impact on NIC card.
Remove one NIC interconnect traffic or instances.
Test 19 Interconnect  Assuming dual switches in failover configuration Network traffic should fail over to
Switch  Power off one switch other switch without any impact on
Failure interconnect traffic or instances.
Test 20 Node Loses  Unplug external storage cable connection (SCSI, Node and instance are evicted from As for node failure
Access to FC or LAN cable) from node to disks containing CRS and RAC clusters, within the
Disks with the CRS Voting Device. configured misscount timeout.
CRS Voting
Device
Test 21 Node Loses  Unplug external storage cable connection (SCSI,  If multi-pathing is enabled, the Monitor database status
Access to FC or LAN cable) from node to disk subsystem. multi-pathing configuration should under load to ensure no
Single Path provide failure transparency service interruption
of Disk  No impact to database instances. occurs.
Subsystem
(CRS,
Voting
Device,
Database
files)

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Measures Actual Results/Notes
Contact
Test 22 ASM Disk Assuming ASM normal redundancy No impact on database instances Monitor progress: Pass/Functions as designed
Lost Power off / pull out / offline (depending on ASM starts rebalancing select * from
config) one ASM disk. v$asm_operation
Test 23 ASM Disk Power on / insert / online the ASM disk No impact on database instances Monitor progress: Pass/Functions as designed
Repaired ASM starts rebalancing select * from
v$asm_operation
Test 24 Lost access Remove access to the OCR device from one node. CSS detects this and will evict the Time to reconfigure the
to OCR node cluster and the instances
device running on the dead node.
Test 25 Lost access Remove access to the voting device from one CSS will detect this and evict the Time to reconfigure the
to Voting node. node cluster and the instances
Device running on the dead node.
Test 26 One copy of Remove access to mirrored copy of the voting Mirroring hw/sw should detect this Time to failover depends
mirrored disks from all nodes. and fail over to surviving copy on mirroring solution and
Voting CSS misscount settings.
Device is
lost
Test 27 Lost one Remove or overwrite one copy of OCR. Everything should continue to run
copy of OCR without problems

Note: this test assumes that the OCR is mirrored

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Measures Actual Results/Notes
Contact
Test 28 Restore lost 1. Find an OCR backup (from a time prior to Everything should continue to run Restore is successful. Pass/Functions as designed
copy of OCR when the failure occurred). To do this run: without problems
‘ocrconfig – show showbackup’ from
$ORA_CRS_HOME/bin/.

If no recent backup can be found, backup the OCR


manually using ‘dd’ with a block size of 4k on
AIX, HPUX, Linux and Solaris or using ‘cp’ if the
ocr device is stored on a file system. On
Windows, use the ‘ocopy’ command to backup the
OCR.

2. Reboot the nodes (in single user mode or


runlevel 1).

3. Run ‘ocrdump –backupfile <filename>’


To validate the backup OCR before restoring.

4. Restore the OCR with ocrconfig using the most


recent backup:
cd $ORA_CRS_HOME\bin
ocrconfig -restore <path to ocr backup
determined from step 1 above>

5. Reboot the nodes.


Test 29 Lose / Recover from a lost Voting Disk is to restore it Everything should continue to run Restore is successful. Pass/Functions as designed
restore one from a backup. without problems
copy of
Voting For AIX, HPUX, Linux and Solaris, a voting disk
Device backup is taken with ‘dd.’ The block size for the
'dd' command should be 4k, to ensure that the
backup of the Voting Disk gets complete blocks.

For Windows, a voting disk backup is taken with


the ocopy command line utility.

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Component Functionality Testing
Normally it should not be necessary to perform additional functionality testing for each individual system component.
However, for some new components in new environments it might be useful to perform additional testing to make sure
that they are configured properly. This testing will also help system and database administrators become familiar with
new technology components.

Cluster Infrastructure
To simplify testing and problem diagnosis it is often very useful to do some basic testing on the cluster infrastructure
without Oracle software or a workload running. Normally this testing will be performed after installing the hardware and
operating system, but before installing any Oracle software. If problems are encountered during System Stress Test or
Destructive Testing, diagnosis and analysis can be facilitated by testing the cluster infrastructure separately. Typically
some of these destructive tests will be used:

Node Failure. Obviously without Oracle software or workload.


Restart Failed Node
Reboot all nodes at the same time
Lost disk access
HBA failover. Assuming multiple HBAs with failover capability.
Disk controller failover. Assuming multiple disk controllers with failover capability.
Public NIC Failure
Interconnect NIC Failure
NAS (Netapps) storage failure – In case of a complete mirror failure, meassure the time that the storage reconfiguration
needed to be completed. Check the same if going into maintenance mode.

If using non-Oracle cluster software:


Interconnect Network Failure
Lost access to cluster voting/quorum disk

ASM Test and Validation


This test and validation plan is intended to give the customer or engineers a procedural approach to:
Validating the installation of RAC-ASM
Functional and operation validation of ASM

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Component Testing: ASM Installation and Configuration Tests
Test # / Test Procedure Expected Results Actual Results/Notes
Contact
Test 30 Verify that candidate Login to ASM: SQL> select NAME, MOUNT_STATUS, STATE from Pass/Functions as designed
disks are available. select name, group_number, path, state, v$asm_disk where GROUP_NUMBER=1
This should be run on all mode_status, label from v$asm_disk
NAME MOUNT_S STATE
nodes, with same output. ------------------------------ ------- -------
DATA_0000 CACHED NORMAL
Test 31 Create an ASM Example for AIX, HPUX, Linux and A successfully created diskgroup. This diskgroup should also Pass/Functions as designed
diskgroup. Creating a Solaris: be listed in v$asm_diskgroup.
diskgroup will validate create diskgroup dgroup1 external
several aspects of the redundancy disk ‘ORCL:*’ ;
ASM code, including
disk discovery, metadata Example for Windows:
creation and disk create diskgroup DATA2 external
encapsulation. redundancy disk
'\\.\ORCLDISKDATA2';
Test 32 Create a diskgroup using Log into ASM This should create a disk group with normal redundancy and Pass/Functions as designed
normal and high two failure groups. For high redundancy, it will create three
redundancy. For AIX, HPUX, Linux and Solaris: fail groups.
create diskgroup dbfile_group1 external
redundancy SQL> select name, state, type from v$asm_diskgroup;
disk 'ORCL:d0001, 'ORCL:0002 ;

For Windows:
create diskgroup DATA2 external
redundancy disk
'\\.\ORCLDISKDATA2',
‘\\.\ORCLDISKDATA3’;
Test 33 Verify CSS-database Start all the database instances. Database instances should start and v$asm_client view (in Pass/Functions as designed
communication and each ASM instance) should list the database instance.
ASM files access.

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Actual Results/Notes
Contact
Test 34 Check the internal alter diskgroup DATA2 check all; If there are no internal inconsistencies, the statement Pass/Functions as designed
consistency of disk “Diskgroup altered” will be returned. If inconsistencies are
group metadata. discovered, then appropriate messages are displayed
describing the problem.

Component Testing: ASM Functional Tests


Test # / Test Procedure Expected Results Actual Results/Notes
Contact
Test 35 asmcmd - Use the asmcmd User’s Guide to perform Pass/Functions as designed
This is a command line commands such as:
interface into ASM. Ls –l, cd, rm, and find.
Test 36 Install ASM into its own Follow procedures as defined in the This will provide two separate and isolated Pass/Functions as designed
ASM_HOME, and Oracle Installation Guide. OUI will ORACLE_HOMEs for ASM and Database.
install Database binaries prompt you for a separate ASM_HOME,
in the ORACLE_HOME if yes, it will install two
ORACLE_HOMEs.
Test 37 Share ASM storage with Create a single instance database in a A clustered ASM instance will manage storage for both Pass/Functions as designed
single instance database diskgroup that is managed by a cluster database instance types.
enabled ASM instance.

Also create a single instance database on


another node, using the same diskgroup
as the RAC database.
Test 38 Rebalance disk with For AIX, HPUX, Linux and Solaris: Control will return to the user when the disk rebalance is Pass/Functions as designed
wait option alter diskgroup dgroup1 add disk completed.
‘/dev/sdd1’ rebalance wait
For Windows:
alter diskgroup DATA add disk
'\\.\ORCLDISKDATA1' rebalance wait;

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Actual Results/Notes
Contact
Test 39 Test dbms_file_transfer Use dbms_file_transfer.put_file and The put_file and get file functions will copy files successfully
by copying files from get_file functions to copy database files to/from filesystem. This provides an alternate option for
ASM to filesystem (datafiles, archives, etc) into and out of migrating to ASM, or to simply copy files out of ASM.
ASM.

This requires that a database directory be


pre-created and available for the source
and destination directories. See PL/SQL
Guide for dbms_file_transfer details

Component Testing: ASM General Functional Tests – Disk Management


Test # / Test Procedure Expected Results Actual Results/Notes
Contact
Test 40 Add disk For AIX, HPUX, Linux and Solaris: This will stage and add disk with no rebalance, since the Pass/Functions as designed
alter diskgroup DATA add disk power level is set to 0. Verify that the rebalance is staged
* Test with cluster 'ORCL:DB0003' NAME DB0003; in v$asm_operation. The next step will complete the add
nodes down. disk with a rebalance.
For Windows:
alter diskgroup data add disk
'\\.\ORCLDISKDATA1' name
DATA_0001;
Test 41 Diskgroup rebalance alter diskgroup DATA rebalance power This will rebalance the disk using a power setting of 4. Pass/Functions as designed
4; Verify operation by viewing v$asm_operation.

Test 42 Drop disk from For AIX, HPUX, Linux and Solaris: Pass/Functions as designed
diskgroup alter diskgroup dgroup1 drop disk
DISK1;
* Test with cluster
nodes down. For Windows:
alter diskgroup DATA drop disk
DATA_0001;

Do not let the rebalance command go to


completion, the next test will validate the
undrop disk command; i.e., before
rebalance completes do next step/test.
Oracle Support Services RAC Starter Kit
RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Actual Results/Notes
Contact
Test 43 Undrop disk. While For AIX, HPUX, Linux and Solaris: Given that the rebalance had not completed from the Pass/Functions as designed
drop disk is still Alter diskgroup dgroup1 undrop disk previous step, the undrop operation should rollback the
underway, perform DISK1; drop operation. Thus, the disk entry will remain in
Undrop disk - verify v$asm_disk. (select name, state from v$asm_disk)
operation via For Windows:
v$asm_operation alter diskgroup DATA undrop disk
DATA_0001;

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Component Testing: ASM General Functional Tests – ASM Objects
Test # / Test Procedure Expected Results Actual Results/Notes
Contact
Test 44 Create template alter diskgroup data add template SQL> select name, stripe, redundancy from
(Note: templates are unreliable attributes(unprotected fine); v$asm_template where name like 'UNREL%';
created while connected
to an ASM instance) NAME STRIPE REDUND
------------------------------ ------ ------
UNRELIABLE FINE UNPROT
Test 45 Apply template. Use the create tablespace test datafile Select name from v$datafile, and find the newly created
template above and apply '+DATA/my_files(unreliable)' size tablespace.
it to a new tablespace to 10M;
be created
(Note: You must connect
to a database instance to
run this command)
Test 46 Drop template alter diskgroup DATA drop template This template should be removed from v$asm_template.
unreliable;

Test 47 Create directory alter diskgroup DATA2 add directory You can use the asmcmd tool to check that the new
'+DATA2/my_files'; directory name was created in the desired diskgroup.

The created directory will have an entry in


v$asm_directory
Test 48 Create alias alter diskgroup DATA add alias Verify that the alias exists in v$asm_alias
'+DATA/my_files/system_datafile' for
'+DATA/
V102/DATAFILE/SYSTEM.256.6545
42113';
Test 49 Drop alias alter diskgroup DATA drop alias Verify that the alias does not exist in v$asm_alias.
'+DATA/my_files/system_datafile';

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Actual Results/Notes
Contact
Test 50 Drop (active) file; i.e., alter diskgroup data drop file This should fail with the following message:
database that is currently '+DATA/V102/DATAFILE/TEST.269. ERROR at line 1:
open 654602409'; ORA-15032: not all alterations performed
ORA-15028: ASM file
'+DATA/V102/DATAFILE/TEST.269.654602409' not
dropped;
currently being accessed
Test 51 Drop (inactive) file; i.e., alter diskgroup data drop file Observe that file number 267 in v$asm_file is now
file that is closed '+DATA/V102/DATAFILE/test.267.62 removed.
(select name, status from 2585735';
v$datafile – pick a
datafile or create one that
is ‘offline’)

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
ASM general Functional Tests – Tools & Utilities
Test # / Test Procedure Expected Results Actual Results/Notes
Contact
Test 52 Dbverify the database Specify each file individually using the The output should be similar to the
files. dbv utility: following, with no errors present:

dbv userid=scott/tiger file='+ DBVERIFY - Verification complete


DATA/DATAFILE/USERS.264.600359 Total Pages Examined : 640
309' blocksize=8192 Total Pages Processed (Data) : 45
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 2
Total Pages Failing (Index): 0
Total Pages Processed (Other): 31
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 562
Total Pages Marked Corrupt : 0
Total Pages Influx :0
Highest block SCN : 0 (0.0)

Component Testing: Miscellaneous Tests


Test #/ Test Procedure Expected Results Measures Actual Results/Notes
Contact
Test 53 Diagnostics Procedure Start client workload Diagnostics collection procedures Time to run diagnostics
for Hang / Slowdown Execute automatic and manual complete normally. procedures. Is it
procedures to collect database, CRS acceptable to wait for
and operating system diagnostics this time before
(hanganalyze, racdiag.sql) restarting instances or
nodes in a production
situation?

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Appendix I: Linux Specific Tests
Test # / Test Procedure Expected Results Actual Results/Notes
Contact
Test L1 If using ASMLIB, As root: /etc/init.d/oracleasm listdisks  Pass/Functions as designed
then all disks shown in /etc/init.d/oracleasm listdisks DB0001
DB0002
above list should map- DB0003
up to the disks listed in FR0001
‘/etc/init.d/oracleasm FR0002
listdisks’ output
Test L2 Verify that the df –k oracle@thorasu1 ~ $ df -k  Pass/Functions as designed
Filesystem 1K-blocks
filesystem is available, Used Available Use% Mounted on
and accessible to all Access files on the filesystem from all /dev/cciss/c0d0p3 16516084
cluster nodes. nodes. 2709584 12967508 18% /
This should be run on none 8127268
0 8127268 0% /dev/shm
all nodes, with same /dev/cciss/c0d0p1 101086
output. 13103 82764 14% /boot
/dev/cciss/c0d0p6 8254240
194600 7640348 3% /home
/dev/cciss/c0d0p8 2063504
35952 1922732 2% /tmp
/dev/cciss/c0d0p5 50394964
18711760 29123248 40% /trvapps
/dev/cciss/c0d0p7 4127076
217512 3699920 6% /var
/dev/cciss/c0d0p9 34361136
5269024 27346652 17% /extr
/dev/mapper/vgxtra200-xtra200
16769024
327040 16441984 2% /xtra200
Test L3 Verify that the OCFS2 Shutdown –r now The OCFS2 filesystem should  Pass/Functions as designed
filesystem is available automatically mount and be accessible
after a system reboot to all nodes after a reboot.
(single and multiple
nodes rebooted).

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Actual Results/Notes
Contact
Test L4 Enable archivelog Archive logs used for RMAN recovery Archivelog files are created, and  Pass/Functions as designed
mode and utilize the testing (see below). available to all nodes.
OCFS2 filesystem as
the log_arch_dest from
multiple instances.
Test L5 Enable RMAN disk Back up ASM based datafiles to OCFS2 RMAN backupsets are created, and  Pass/Functions as designed
backups and utilize the filesystem. available to all nodes. Recovery
OCFS2 filesystem as Execute baseline recovery scenarios scenarios completed with no errors.
the backupset (full, point-in-time, datafile).
destination.
Test L6 Datapump full export Expdp userid=system/manager full=y A full system export should be created
database. Directory=/xtra200/dumpsets without errors or warnings.
dumpfile=expdp.dat

Test L7 Validate OCFS2 OCFS2 filesystem should remain


functionality during available to surviving nodes.
node failures.
Test L8 Validate OCFS2 OCFS2 filesystem should remain
functionality during available as long as disk access is
disk/disk subsystem maintained.
failures.
Test L9 Resize OCFS2 Add disk allocation to existing logical Filesystem should reflect additional
filesystem. volume, and resize OCFS2 filesystem space allocation.
using:
tunefs.ocfs2 –S
/dev/vgxtra200/xtra200
Test L10 NFS mount OCFS2 to mount thorasu1.travt.net:/xtra200 Verify that files are accessible on
lower environments /mnt/xtra200 remote environment with access for
RMAN.

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Actual Results/Notes
Contact
Test L11 Misc:
Check the OCFS2 /etc/init.d/o2cb status Module "configfs": Loaded
cluster status Filesystem "configfs": Mounted
Module "ocfs2_nodemanager": Loaded
Module "ocfs2_dlm": Loaded
Module "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster ocfs2: Online
Checking O2CB heartbeat: Active

unmount OCFS2 fs umount /xtra200

mount OCFS2 fs mount -t ocfs2 -o datavolume,nointr


/dev/vgxtra200/xtra200 /xtra200

Appendix II: Windows Specific Tests


Test # / Test Procedure Expected Results Actual Results/Notes
Contact
Test W1 Run ocfscollect tool OCFSCollect is available as an A .zap file. Can be used as a baseline
attachment to Metalink Note: 332872.1 regarding the ‘health’ of the available
OCFS drives
Test W2 Check that a file Using a tool such as Windows Explorer That file should be read and writeable
created from one node create a .TXT file that contains any from other nodes in the cluster.
is visible from other TEXT whatsoever, store this on an
nodes in the cluster OCFS partition.
Test W3 Enable archivelog Use alter system set command with Archivelog files are created, and
mode and sid=’*’ and scope=both to set this available to all nodes.
*.log_arch_dest_n to log_archive_dest parameter
write to an OCFS
drive

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline
Test # / Test Procedure Expected Results Actual Results/Notes
Contact
Test W4 Enable RMAN disk Back up ASM based datafiles to OCFS RMAN backupsets are created, and
backups and utilize the filesystem. available to all nodes. Recovery
OCFS filesystem as Execute baseline recovery scenarios scenarios completed with no errors.
the backupset (full, point-in-time, datafile).
destination.
Test W5 Datapump full export Expdp userid=system/manager full=y A full system export should be created
database. Directory=c:\dumpsets without errors or warnings.
dumpfile=expdp.dat

Test W6 Validate OCFS OCFS partitions should remain


functionality during available to surviving nodes.
node failures.
Test W7 Remove a drive letter Using Windows disk management use OracleClusterVolumeService should
and ensure that the the ‘Change Drive Letter and Paths …’ restore the drive letter assignment
letter is re-established option to remove a drive letter associated within a short period of time.
for that partition with an OCFS partition.

Oracle Support Services RAC Starter Kit


RAC Assurance Team System Test Plan Outline

You might also like