Michael S.
Abbey1
10 Problems with your
RMAN backup script
Presented by: Yury Velikanov
Senior Oracle DBA
The Pythian Group
April 2012
1
WHY PYTHIAN
Recognized Leader:
Global industry-leader in remote database administration services and consulting for Oracle,
Oracle Applications, MySQL and SQL Server
Work with over 150 multinational companies such as Forbes.com, Fox Sports, Nordion and Western
Union to help manage their complex IT deployments
Expertise:
One of the worlds largest concentrations of dedicated, full-time DBA expertise. Employ 7 Oracle
ACEs/ACE Directors
Hold 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata, Oracle
GoldenGate & Oracle RAC
Global Reach & Scalability:
24/7/365 global remote support for DBA and consulting, systems administration, special projects
or emergency response
FEW WORDS ABOUT YURY
Google Yury Oracle [LinkedIn, twitter, blog, email, mobile, ]
-
Sr. Oracle DBA at Pythian, Oracle ACE and OCM
Started as Oracle DBA
-
Education
-
with 7.2 (in 1997, 14+)
First international appearance
-
2005 - Hotsos Symposium 2005,
(Masters Degree in Computer science)
OCP 7/8/8i/9i/10g/11g + OCM 9i/10g/11g
Oracle DBA consultant experience (14+ years)
Pythian Oracle Clients support (2+ years)
-
Email me to get the presentation
140+ Clients around the world
RMAN scripts audit, troubleshooting, recovery
MISSION
Give you practical 10 hints on
RMAN script improvements.
Encourage you to think on what can
possibly go wrong before it happens.
Give away some prizes
4
RIGHT APPROACH ...
SKEPTICAL !
! BE
If backups and trial recovery works it doesnt mean you dont have
issues (must test/document/practice recovery)
Challenge your backup procedures!
- Think about what can possibly go wrong
- Think now as in the middle of an emergency recovery it may be
way too late or too challenging
Prepare all you may need for smooth recovery while working on
backup procedures
FEW GENERAL THOUGHTS
NEVER rely on backups stored on the same physical media as the database!
Mark Brinsmead, Sr. Oracle DBA, Pythian
Even if your storage is the fanciest disk array (misnamed "SAN" by many) in the world, there
exist failure modes in which ALL data in the disk array can be lost simultaneously. (Aside
from fire or other disaster, failed firmware upgrades are the most common.) You don't
really have a "backup" until the backup is written to separate physical media!
FEW GENERAL THOUGHTS
Avoid situations where the loss of a single piece of physical media can
destroy more than one backup.
When backing up to tape, for example, if the tape capacity is much larger than your
backups, consider alternating backups between multiple tape pools. ("Self-redundant"
backups are of little value if you are able to lose several consecutive backups simply by
damaging one tape cartridge).
If your backup and recovery procedures violate some of the base
state risks clearly
on regular basis.
concepts -
and sign/discuss those with business
#1 RMAN LOG FILES
part of a log file ...
Do you see
RMAN>
Starting backup at 18-OCT-11
current log archived
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=63 device type=DISK
channel ORA_DISK_1: starting compressed archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=4 RECID=2 STAMP=764855059
input archived log thread=1 sequence=5 RECID=3 STAMP=764855937
...
Finished backup at 18-OCT-11
any issues?
Prepare all you may need for smooth recovery while working on backup procedures
#1 RMAN LOG FILES
part of a log file ...
RMAN> backup as compressed backupset database
2> include current controlfile
3> plus archivelog delete input;
How about now?
Starting backup at 2011/10/18 12:30:46
current log archived
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=56 device type=DISK
channel ORA_DISK_1: starting compressed archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=8 RECID=6 STAMP=764856204
input archived log thread=1 sequence=9 RECID=7 STAMP=764857848
...
Finished backup at 2011/10/18 12:33:54
#1 RMAN LOG FILES
Before calling RMAN
export NLS_DATE_FORMAT="YYYY/MM/DD HH24:MI:SS"
export NLS_LANG="XX.XXX_XXX"(for non standard char sets)
before running commands
set echo on
Nice to have: total execution time at the end of log file
c_begin_time_sec=`date +%s`
...
c_end_time_sec=`date +%s`
v_total_execution_time_sec=`expr ${c_end_time_sec} - ${c_begin_time_sec}`
echo "Script execution time is $v_total_execution_time_sec seconds"
10
#1 RMAN LOG FILES
do not overwrite log file from previous backup
full_backup_${ORACLE_SID}.`date +%Y%m%d_%H%M%S`.log
Use case: a backup failed
should I run the backup now?
would it interfere with business activities?
11
KISS = KEEP IT STUPID SIMPLE
crosscheck archivelog all;
delete noprompt expired archivelog all;
backup database
include current controlfile
plus archivelog delete input;
delete noprompt obsolete;
14
Yes!?
& No
#2 DO NOT USE CROSSCHECK (ANTI
KISS)
Do not use CROSSCHECK in your day to day backup scripts!
If you do, RMAN silently ignores missing files, possibly making your
recovery impossible
CROSSCHECK should be a manual activity executed by a DBA to resolve
an issue
15
#3 BACKUP CONTROL FILE AS THE LAST
STEP
backup as compressed backupset database
plus archivelog delete input
include current controlfile;
Is it right?
delete noprompt obsolete;
exit
We are making the controlfile backup inconsistent immediately
16
#3 BACKUP CONTROL FILE AS THE LAST
STEP
backup as compressed backupset database
plus archivelog delete input;
delete noprompt obsolete;
backup spfile;
backup current controlfile;
exit
17
Is it right?
#4 DO NOT RELY ON ONE BACKUP
ONLY
Do not rely on ONE backup only!
You should always have a second option
Especially true talking about ARCHIVE LOGS
If you miss a single ARCHIVE LOG your recoverability is
compromised
-- ONE COPY ONLY
BACKUP DATABASE ... PLUS ARCHIVELOG DELETE INPUT;
-- SEVERAL COPIES
BACKUP ARCHIVELOG ALL NOT BACKED UP $v_del_arch_copies TIMES;
18
#5 DO NOT DELETE ARCHIVE LOGS BASED ON TIME
ONLY
-- TIMESTAMP
DELETE NOPROMPT BACKUP OF ARCHIVELOG ALL COMPLETED BEFORE 'SYSDATE-6/24' DEVICE
TYPE DISK;
-- SEVERAL COPIES + TIME
DELETE NOPROMPT ARCHIVELOG ALL BACKED UP $v_del_arch_copies TIMES TO DISK
COMPLETED BEFORE '$p_start_of_last_db_backup';
-- SEVERAL COPIES + TIME + STANDBY
CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON STANDBY;
19
#6 USE CONTROLFILE IF CATALOG DB ISN'T
AVAILABLE
[oracle@host01 ~]$ rman target / catalog rdata/xxx
Recovery Manager: Release 11.2.0.2.0 - Production on Tue Oct 18 15:15:25
2011
...
connected to target database: PROD1 (DBID=1973883562)
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-04004: error from recovery catalog database: ORA-28000: the account is
locked
[oracle@host01 ~]$
Check if catalog DB is available in your script
If it is, connect to catalog DB
If it isnt, use controlfile only (flagging it as warning)
20
#6 USE CONTROLFILE IF CATALOG DB ISN'T
AVAILABLE
rman target /
RMAN> echo set on
RMAN> connect target *
connected to target database: PROD1 (DBID=1973883562)
RMAN> connect catalog *
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-04004: error from recovery catalog database: ORA-28000: the account is locked
RMAN> backup as compressed backupset database
2> include current controlfile
3> plus archivelog delete input;
Starting backup at 2011/10/18 15:22:30
current log archived
using target database control file instead of recovery catalog
special THX 2 @pfierens 4 discussion in tweeter
21
#6 USE CONTROLFILE IF CATALOG DB ISN'T
AVAILABLE
-- Backup part
rman target / <<!
backup as compressed backupset database
...
!
-- Catalog synchronization part
rman target / <<!
connect catalog rdata/xxx
resync catalog;
!
special THX 2 @martinberx 4 discussion in tweeter
22
#7 DO NOT RELY ON RMAN STORED
CONFIGURATION
Do not rely on controlfile autobackup
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/b01/rman/prod/%F
Oracle creates a controlfile backup copy
each time you make any db files related changes
at the end of each RMAN backup
What would happen if someone switched autobackup off?
23
#7 DO NOT RELY ON RMAN STORED
CONFIGURATION
Document configuration in a log file (show all;)
If you change configuration restore it at the end of your script
show all;
v_init_rman_setup=`$ORACLE_HOME/bin/rman target / <<_EOF_ 2>&1|
grep "CONFIGURE " |
sed s/"# default"/""/g
show all;
_EOF_`
...
< script >
...
echo $v_init_rman_setup | $ORACLE_HOME/bin/rman target /
24
#8 BACKUPS CONSISTENCY
CONTROL
Failure Verification and Notification
How do you report backup failures and errors?
25
We dont report at all
DBA checks logs sometimes
Backup logs are sent to a shared email address (good!)
DBA on duty checks emails (what if no one available/no email received?)
We check RMAN command errors code $? and sending email
#8 BACKUPS CONSISTENCY
CONTROL
Failure Verification and Notification
I would suggest
Run log files check within backup script and page
immediately
Script all checks and use "OR" in between
echo $?
egrep "ORA-|RMAN-" < log file >
Improve your scripts and test previous adjustments on regular basis
DBA makes a judgment and takes a conscious decision
PAGE about any failure to oncall DBA immediately
PAGE about LONG running backups
26
#8 BACKUPS CONSISTENCY
CONTROL
Notifications are not enough!
How do you check if your database is safely backed up based on your
business requirements?
Make a separate check that would page you if your backups dont
satisfy recoverability requirements
REPORT NEED BACKUP ...
-- datafiles that werent backed up last 24 hours! a bit excessive (2)
REPORT NEED BACKUP RECOVERY WINDOW OF 1 DAYS;
REPORT NEED BACKUP REDUNDANCY 10;
REPORT NEED BACKUP DAYS 2;
REPORT UNRECOVERABLE;
27
May not available in all Versions!
IF YOU DONT HAVE RMAN AND MML
INTEGRATION
You should consider using it!
Otherwise it is extremely difficult to ensure backup
consistency
If you dont use it then your backups are exposed to many issues
At best, your backups will take much more space on
tapes than should
In worst case you may miss to backup some of the backup
pieces , putting the database recoverability at risk
28
The next few slides discuss some issues
#9 ENSURE 3 TIMES FULL BACKUPS SPACE +
ARCH
IF you dont have
A smart backup software (incremental/opened files)
Sophisticated backup procedures
THEN you need space on a file system for at least 3 FULL
backups and ARCHIVE LOGS generated in between 3 backups
If REDUNDANCY 1 then previous backup and ARCHIVE LOGS got
removed after next backup is completed. There is no continued
REDO stream on tapes.
29
If REDUNDANCY 2 then you need space for third full backup of
backup time only (as soon as third backup completed you
remove the first one)
#9 DONT USE DELETE OBSOLETE
TAPE)
This way you wipe out RMAN memory. There is no way RMAN knows about
backups available on tapes.
Think about recovery (if you use delete noprompt obsolete)
1. You need to recover a control file (possibly from offsite backups)
2. Find and bring onsite all tapes involved (possibly several iterations)
3. Restore and recover (possibly restoring more ARCH backups)
backup as compressed backupset database
plus archivelog delete input
include current controlfile;
delete noprompt obsolete;
exit
30
(DISK + NO MML
#9 DONT USE DELETE OBSOLETE
TAPE)
(DISK + NO MML
-A- LIST BASED ON DISK RETENTION
report obsolete recovery window of ${DISK_RETENTION_OS} days device type disk;
-B- REMOVE FILES BASED ON DISK RETENTION
!checking if each of reported files have been backed up to tapes & rm it from FS!
-C- WIPEOUT FROM REPOSITORY
delete force noprompt obsolete recovery window of ${TAPE_DAY_RETENTION} days device type disk;
-!- AT THE RECOVERY TIME
RUN
{SET UNTIL SCN 898570;
RESTORE DATABASE PREVIEW;}
31
#9 NEVER KEEP DEFAULT RETENTION
POLICY
NEVER allow the RMAN RETENTION POLICY to remain at the
default or lower level than TAPE retention
other Oracle DBA can run DELETE OBSOLETE command and wipe
all catalog records out
CONFIGURE RETENTION POLICY TO REDUNDANCY 1000;
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 1000 DAYS;
33
#10 HALF WAY BACKED UP FILE SYSTEM
FILES
Make sure that your File System backup doesnt
backup half finished backupset
-A- BACKUP AS TMP
BACKUP DATABASE FORMAT '${file}.tmp_rman';
-B- MOVE TO PERMANENT
mv ${file}.tmp_rman ${file}.rman
-C- MAKE CATALOG AWARE
CHANGE BACKUPPIECE '${file}.tmp_rman' UNCATALOG;
CATALOG BACKUPPIECE '${file}.rman';
34
DO WE HAVE A WINNER?
#1 RMAN Log files
#2 Do not use CROSSCHECK
#3 Backup control file as the last step
#4 Do not rely on ONE backup only
#5 Do not delete ARCHIVE LOGS based on time only
#6 Use controlfile if catalog DB isn't available
#7 Do not rely on RMAN stored configuration
#8 Backups consistency control
#9 Dont use delete obsolete (disk + no mml tape)
#10 Half way backed up File System files
35
CONTACT ME
Yury Oracle
[email protected]
@yvelikanov
www.pythian.com/news/velikanov
36
THANK YOU AND Q&A
To contact us
[email protected]1-866-PYTHIAN
To follow us
http://www.pythian.com/news/
http://www.facebook.com/pages/The-Pythian-Group/
http://twitter.com/pythian
http://www.linkedin.com/company/pythian
37
ADDITIONAL TOPICS
#A DELETE OBSOLETE to be executed at the begging
#B CATALOG or not to CATALOG
#C Number of backup & recovery processes (# of backup pieces)
#D CATALOG Keep as less information as reasonable
#E rman target / catalog rdata/xxx isnt secure
38