Académique Documents
Professionnel Documents
Culture Documents
INTRODUCTION
Most DBAs realize that there is a lot more to Oracle backups with Recovery Manager (RMAN) than "backup database;". We will present a script that can be deployed in under 15 minutes on Linux or Solaris hosts to perform disk-to-disk backups of any Oracle database version 10g or newer. We believe that many DBAs do not take full advantage of the capabilities of shell scripting and work harder than they should. Even worse, many DBAs may actually be setting themselves up for failure when they trust that a critical task is being properly performed by a shell script, only to find that it is not and is too late to take corrective action. This paper presents and discusses a few ideas behind script writing and explores common issues and pitfalls associated with automating tasks using shell scripts. We believe this discussion can be applied in any situation that requires the creation of a program to perform a task. We will apply the concepts of task automation as we walk through the steps and ideas behind the development of a script to backup Oracle databases. The script itself is a work in progress, and the latest version can be downloaded from the web site provided later on. This paper focuses on a few major areas of Oracle DBA expertise: Shell scripting, UNIX, Linux and Oracle Recovery Manager (RMAN). It does not address Windows environments, even though the concepts can be applied to Windows environments as well. Every time the word script is used in this paper it refers to either a bash shell script running on a UNIX or Linux host, or a RMAN script or a SQLPLUS script. The context should make it clear which kind of script is being mentioned.
THE PROBLEM
Our goal is to find the simplest and fastest way to configure good backups for an Oracle database. When we say good backups we mean backups that we know can be used to restore the whole database when the time comes. Being fairly sure is not good enough; we need to know for sure that our database backups will work when we need them. It can be argued that backups are the most important task a DBA has in his hands. No other task that a DBA performs is more critical to the survival of the applications and the business that they support. There is only one skill that is more critical for a DBA, and that is the ability to actually restore the database from backups. But it all starts with good backups. When a new database is deployed our first priority should be to set up backups that are properly configured. Oracle Recovery Manager (RMAN) provides a great set of features that covers most of the basics to assure that a good backup is taken and can be restored when the time comes.
1 Session 475
However, RMAN extensive set of options can be daunting and confusing for the inexperienced DBA or even for the experienced DBA that is overworked and just does not have the time to fine tune and explore all the capabilities RMAN has to offer, and still deploy databases for his customers in a timely manner. The complexity involved in implementing good database backups can be addressed if we spend the time to automate the tasks involved and package all of that in the form of scripts. Not all scripts are created the same. The goal here is provide a clean interface that hides the complexity but still gets the job done, without compromising on the requirement to produce good quality backups.
On the other side, there are a few issues that the DBA should keep in mind, in order to avoid negating the benefits when automating tasks: Scripts can become very long, hard to understand and difficult to support. Poorly documented code will prevent other DBAs from helping support the scripts. Spending most of the time troubleshooting scripts than doing any real DBA work.
Most DBAs nowadays have a pretty good knowledge of script writing and related tools. We believe however that automating tasks is more than writing a script. When looking at automating a task, it is important to keep a few concepts in mind in order to guide the effort and maximize the benefits. The remaining of this section explores some of these concepts.
Session 475
Session 475
I suspect the scenario above is all too familiar. We rush to write a quick script to do the job and two years later the same script is still running, in all its simplicity and potential for problems. We tend to forget Murphys Law: If anything can go wrong, it will and assume that there is no way our script will ever fail. We need to be able to trust our scripts. As we write the code, here are some questions that we should ask ourselves: How critical is this task? What happens if something goes wrong? How will the DBA know if the script fails?
In order to illustrate this concept, let us look at a shell script example, copied from linuxcommand.orgiii and adapted for our use. The two lines of code below are pretty straight forward: Change the current directory to a given location and delete all the files in it.
cd $some_directory rm *
While we would question the use of rm * in any circumstance, it should be obvious that if the cd command fails the results could be catastrophic. In order to ensure that rm * will run only in the right directory, we need to add some code to check for errors returned by the cd command and take proper action:
cd $some_directory if [ "$?" = "0" ]; then rm * else echo "Cannot change directory to $some_directory! Aborting." 1>&2 exit 1 fi
Now the rm * command is executed only if cd $some_directory succeeds. The error handling code is executed when the cd command fails and returns a non-zero status code. Just one test is enough to handle multiple failure situations since it does not matter if the error is caused by insufficient file system privileges or because the directory does not exist. The error handling code above still leaves some room for problems. Suppose the variable some_directory is not set for whatever reason. The return status test will not catch the problem; instead the script will change the current directory to the users home directory and delete all files in there! In order to address that situation we need to include another test:
if [ "$some_directory" = "" ]; then echo "invalid directory: $HOME. Aborting." exit 1 fi cd $some_directory if [ "$?" = "0" ]; then rm * else echo "Cannot change directory to $some_directory! Aborting." 1>&2 exit 1 fi
Session 475
In order to write scripts that can be trusted, the DBA needs to think about the different ways things can go wrong, and determine how he wants to handle the errors. It may be necessary to abort the execution, or just display a message and proceed. It helps to keep in mind that in most cases there is only one way things can go right and multiple ways things can go wrong. Instead of spending time trying to test for all the ways a step can fail, we should test for success and take corrective action only with the step didnt succeed. One of the challenges about error handling is that once we are done bulletproofing our script, it will be a lot longer. In our example, the initial script consisted of two lines of code, but once we were done adding all the error handling code, it was twelve lines long, making it six times longer. We are not advocating that every script should provide extensive error handling that covers each and every possible failure scenario but it is important that we ask the right questions beforehand and make sure we understand the consequences of undetected errors. Remember Murphys Law.
In our opinion, a script is clean if another professional at the same skill level as the author can look at the script code and answer yes to all of the following questions: What are the steps to complete the task? What are the required inputs? What is the expected output? Any warnings or problems to keep in mind? Who should I ask questions about the script?
One of the best ways to achieve cleanliness is to use comments in the body of the script. It does not take long to write a line to explain what it does, and it will save a lot of time in the future when troubleshooting problems or changes need to be made. It does not mean that a README file cannot be added with more details and information, but most of the essential information should be in the body of the script itself.
Session 475
DOCUMENTATION
Use comments. In six months, you will not remember many details about the script, like the reason why you copied and deleted a file instead of using just one mv command to move it. Hopefully one day you will be promoted and someone else will need to maintain the code you wrote. Writing code that others can understand and maintain is the right thing to do. Make the script display a help message if the command line parameters are not trivial. All the information to run the script should be provided in the script. More details should be included in a README file. Avoid hard coding file names, directories, numbers, dates, etc. even when the values do not change. Assign the values at the beginning of the script and write a quick description for each one. It makes it easier to make changes and understand what the script does.
USER INTERFACE
Keep it simple. The script should do useful work out-of-the-box, with minimal installation effort. Installation should be extremely simple (less than 15 minutes). Provide defaults in order to minimize user input. Use getopts to implement UNIX like command line options. Use a configuration file for server and user specific parameters, or for changing default values.
6 Session 475
This is a pretty standard RMAN script and there are dozens of variations of the same general idea. It will do a fine job when the developer calls and asks the DBA to take a backup before upgrading the application. Some will say that it would be a lot easier to use flashback database and create a restore point. Well, I was actually in this exact situation just last week and, to be safe, did both. Why skimp? Be safe. Be sure. Should we use this script to backup my critical production database every night? It would be a good starting point but in our opinion it falls short of the standard of reliability that we should expect for a production database. The sample above leaves out many options that come free with RMAN and would vastly improve the reliability of the backups taken with it. Regardless of what it does or how often it runs, a complete RMAN backup script should fulfill the following requisites:
7 Session 475
Implement backup best practices v. Test every backup. Test for logical corruption with the check logical parameter. Have each file in a single backup piece. Backup control files and spfile. Print a list of backup sets and file copies for reference during restore. Print a list of archived logs required for recovery. Print a list of all RMAN parameters for reference during recovery. Print a list of all the instance parameters for reference during recovery.
This parameter tells RMAN to perform a logical integrity check of the database blocks. It has been reportedvi that it may cause the backups to slow down, but DBAs should give it a try since it can detect logical block corruptions that cannot be detected by other means. It can always be turned off if the performance impact becomes significant. We use it for all the Oracle backups in our data center and have not observed any significant performance impact.
FILESPERSET 1
With this parameter, every backupset holds changed blocks for only one data file. This setup increases the performance for partial recovery tasks.
CONFIGURE CONTROLFILE AUTOBACKUP ON
The control file and parameter files should be backed up every time the database is backed up. It should be the default setting for RMAN, but unfortunately it is not. This needs to be done only once after the database is created.
Session 475
SET ECHO ON
Have you ever looked at the output of a RMAN script, saw a bunch of RMAN errors and had no idea which command caused them? It happened to me many times and set echo on; made my life a lot easier. It forces RMAN to print the command before it executes it. Same behavior as set echo on in sqlplus.
RESTORE VALIDATE
It does a restore of the database from backups without actually overwriting the database. This is as close as it gets to testing our backups without actually overwriting the data files. It should be done right after the backups are completed and will certainly extend the overall time to complete the script. It will use system resources to simulate the restore operation but it does not touch any of the database files. It is well worth since it provides the peace of mind of knowing that we have good backups.
RESTORE PREVIEW
This command displays a list of all files needed to restore and recover the database. It does not simulate a restore like restore validate does. It goes in the catalog or control file and obtains a list of the files that would be needed to restore. If we run it after the backups we have of the needed files right there along with the backup logs.
LIST
The list command is used to print reports of the backup files known to RMAN. If a report is taken and saved right after the backups are taken, the DBA will have a list of all the backup files currently in the repository or control file. It is especially helpful when the tape backups are managed by a different team and the needed files are not available on disk anymore.
SHOW ALL
It is a good idea to record the RMAN configuration parameter values every time backups are taken. If the parameters are changed by accident, backups may fail and having a record will help the troubleshooting process.
SHOW PARAMETERS
This is a sqlplus command that should be used more often during backups. There are multiple database parameters that affect backups. Having a record of the database parameters can be very helpful for troubleshooting and recovery tasks. It is true that the parameter file is saved when controlfile autobackup is on, but it does not hurt to have another copy somewhere else, just in case. It can even be used to build a new spfile from scratch. Another advantage is that show parameters lists all the parameters, including default values, unlike create pfile from spfile.
NLS_DATE_FORMAT='DD-MON-YY HH24:MI:SS'
Session 475
Before running a RMAN script, configure the date and time format to display the time, including seconds. Otherwise the default output will show you only the date and that is not very useful when you are looking at the backup output and trying to figure out how long it took to backup a given file.
$ export NLS_DATE_FORMAT='DD-MON-YY HH24:MI:SS' $ rman target / Recovery Manager: Release 10.2.0.4.0 - Production on Fri Feb 25 17:16:08 2011 Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database: TEST (DBID=2010915295) RMAN> backup database; Starting backup at 25-FEB-11 17:16:14 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: sid=158 devtype=DISK channel ORA_DISK_1: starting full datafile backupset channel ORA_DISK_1: specifying datafile(s) in backupset input datafile fno=00001 name=/u02/oracle/oradata/TEST/system01.dbf input datafile fno=00003 name=/u02/oracle/oradata/TEST/sysaux01.dbf input datafile fno=00005 name=/u02/oracle/oradata/TEST/example01.dbf input datafile fno=00002 name=/u02/oracle/oradata/TEST/undotbs01.dbf input datafile fno=00004 name=/u02/oracle/oradata/TEST/users01.dbf channel ORA_DISK_1: starting piece 1 at 25-FEB-11 17:16:15 channel ORA_DISK_1: finished piece 1 at 25-FEB-11 17:16:22 piece handle=/u03/oracle/recovery/TEST/backupset/2011_02_25/o1_mf_nnndf_TAG20110225T171615_6pjbk hcf_.bkp tag=TAG20110225T171615 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07 Finished backup at 25-FEB-11 17:16:22 Starting Control File and SPFILE Autobackup at 25-FEB-11 17:16:22 piece handle=/u03/oracle/recovery/TEST/autobackup/2011_02_25/o1_mf_s_744052582_6pjbkphc_.bkp comment=NONE Finished Control File and SPFILE Autobackup at 25-FEB-11 17:16:24 RMAN>
FULL BACKUP
set echo on;
10 Session 475
delete noprompt backup of database tag FULL_BACKUP; run { allocate channel dev1 device type disk format '/backups/%U'; backup check logical filesperset 1 tag FULL_BACKUP archivelog all not backed up 2 times tag FULL_BACKUP from tag FULL_BACKUP; backup check logical filesperset 1 tag FULL_BACKUP keep until time 'sysdate + 8' logs database; backup check logical filesperset 1 tag FULL_BACKUP archivelog all not backed up 2 times tag FULL_BACKUP from tag FULL_BACKUP; backup as copy current controlfile tag FULL_BACKUP; backup as copy spfile tag FULL_BACKUP; } restore preview from tag FULL_BACKUP database; restore validate from tag FULL_BACKUP database; list backup of database tag FULL_BACKUP; list backup of archivelog all tag FULL_BACKUP; show all;
ARCHIVELOG BACKUP
set echo on; delete noprompt backup of archivelog until time 'sysdate - 8' tag ARCH_BACKUP; run { allocate channel dev1 device type disk format '/backups/%U'; backup check logical filesperset 1 tag ARCH_BACKUP archivelog all not backed up 2 times tag ARCH_BACKUP from tag ARCH_BACKUP; backup as copy current controlfile tag ARCH_BACKUP; backup as copy spfile tag ARCH_BACKUP; } restore preview from tag ARCH_BACKUP database; list backup of database tag ARCH_BACKUP; list backup of archivelog all tag ARCH_BACKUP; show all;
INCREMENTAL BACKUP
LEVEL 0
set echo on; delete noprompt backup of database tag INCR_BACKUP; run { allocate channel dev1 device type disk format '/backups/%U'; backup check logical filesperset 1 tag INCR_BACKUP archivelog all not backed up 2 times; backup check logical filesperset 1 tag INCR_BACKUP keep until time 'sysdate + 8' logs incremental level 0 database; backup check logical filesperset 1 tag INCR_BACKUP archivelog all not backed up 2 times; backup as copy current controlfile tag INCR_BACKUP; backup as copy spfile tag INCR_BACKUP; } restore preview from tag INCR_BACKUP database; restore validate from tag INCR_BACKUP database; list backup of database tag INCR_BACKUP;
11 Session 475
show all;
LEVEL 1
set echo on; run { allocate channel dev1 device type disk format '/backups/%U'; backup check logical filesperset 1 tag INCR_BACKUP archivelog all not backed up 2 times; backup check logical filesperset 1 tag INCR_BACKUP incremental level 1 for recover of tag INCR_BACKUP database; backup check logical filesperset 1 tag INCR_BACKUP archivelog all not backed up 2 times; backup as copy current controlfile tag INCR_BACKUP; backup as copy spfile tag INCR_BACKUP; } restore preview from tag INCR_BACKUP database; restore validate from tag INCR_BACKUP database; list backup of database tag INCR_BACKUP; show all;
prompt -- look for corrupted blocks: select 'ORA-99999: CORRUPTED BLOCKS FOUND' corrupted_block_status from V\$DATABASE_BLOCK_CORRUPTION where rownum = 1; prompt -- list of corrupted blocks: select * from V\$DATABASE_BLOCK_CORRUPTION; CHECK
BACKORA Once we pulled together the previously presented concepts about script writing and database backups, the backora script was born. This script is not meant to address each and every Oracle backup situation, but focuses on the set of requirements that cover the needs in our data center. Those needs are not unlike what is found in other places and other DBAs may find it useful and take advantage of it. Feel free to download it from the web site provided at the end of this article. In summary, these are the design ideas behind backora: Executed with cron. Performs RMAN disk to disk backups Can be installed and configured in under 15 minutes Supported and tested on Red Hat Linux, Oracle Linux and Solaris 9/10. Supported and tested with Oracle Enterprise 10.2 and 11.2. Assumes defaults for every required parameter, as far as possible. Instance specific customizations are kept in a separated configuration file. Follows Oracle backup best practices. Follows Script writing best practices.
PRE-INSTALLATION
Before installation, the database and operating system must be prepared. Install the Oracle software. Create one or more databases. Create a directory with sufficient disk space to store the backups. A separate file system is highly recommended. Configure the UNIX/Linux account used to run backora as follows: Able to run 'sqlplus / as sysdba' from the shell prompt. Able to run sqlplus / from the shell prompt and login to a database account with DBA privileges. Only needed in order to perform datapump exports. Able to set database variables with the oraenv script.
These sample steps prepare a host to run backups and place them on the /backups directory.
$ login login: root Password: root> mkdir /backups root> chown oracle:dba /backups root> su oracle oracle> mkdir /backups oracle> export ORACLE_SID=ORCL oracle> export ORAENV_ASK=NO oracle> . oraenv oracle> sqlplus / as sysdba SQL> create user ops$oracle identified externally; SQL> conn / Connected. SQL> alter system set db_recovery_file_dest = /backups; SQL> alter system set db_recovery_file_dest_size = 100g; SQL> exit oracle> exit root> exit
BACKORA INSTALLATION
The script must be installed on the server that hosts the databases that need backup. Verify that /bin/bash is available Create a directory dedicated to backora to hold the script files Extract all the contents of backora.zip file to the new directory ($BINDIR) Copy the file backora.conf.sample to backora.conf The config file must be on the same directory as the backora script Edit backora.conf and change it to reflect your environment Read the README to learn how to use it
USAGE
backora [ options ] db1 ... dbn # backup the listed dbs backora [ options ] all # backup all the dbs in /etc/oratab with last field = 'Y' backora # displays this help message
OPTIONS -e export database with Datapump (default: true) -f weekly full backup and daily archivelog backup to offsite dir (default: false) -i weekly incremental lvl 0 and daily lvl 1 to offsite dir (default: false)
14 Session 475
-s
If no options are given in the command line, backora will perform the Oracle suggested strategy backup AND export the database.
CONFIGURATION FILE
The configuration file must be named backora.conf and be placed in the same directory as the backora script ($BINDIR) The parameters listed below can be overridden in the configuration file
RECOVERY_WINDOW=7 # how many days rman keeps backups EXPORT_RETENTION=7 # how many days dump files are kept EXPDIR=/u03/oracle/exports #directory for dump files, backora creates a subdir for each db OFFDIR=/u03/oracle/offsite # directory for backup files, for offsite storage DPDIR=BACKORA # directory name for oracle "create directory" cmd DOEXP=true # true = dump database by default, false = just when set in cmd line DOOSS=true # true = do sugg. strat. by default, false = just when set in cmd line DOINCR=false # true = do incr backup by default, false = just when set in cmd line DOFULL=false # true = do full backup by default, false = just when set in cmd line RMAN_TAG=BACKORA # string used to tag rman backups DAYLVL0=7 # week day to do lvl 0 incr. Mon=1,...,Sun=7 DAYFULL=7 # week day to do full backup. Mon=1,...,Sun=7 ORATAB=/etc/oratab # location of oratab file
Once the script is installed, complete and reliable backups can be taken from the command line with little effort. Let us assume that /etc/oratab looks like this:
*:/u01/app/oracle/product/11.2.0/dbhome_1:N PRD1:/u01/app/oracle/product/10.2.0/db_1:Y PRD2:/u01/app/oracle/product/11.2.0/dbhome_1:Y PRD3:/u01/app/oracle/product/11.2.0/dbhome_1:Y
15 Session 475
Backup all databases using Oracle Suggested Strategy and perform a full export:
bash> backora all
Take level 0 full backups of all databases every Sunday and incremental level 1 during the week:
bash> backora i all
Take full backups of PRD1 on Sunday archive logs during the week:
bash> backora f PRD1
Take full backups of PRD2 and PRD3 on Saturday and incrementals during the week: Do these steps only once, assuming that there are no config. files for PRD2 and PRD3. Otherwise edit the files and add this string in a new line: DAYLVL0=6
bash> cd $BINDIR bash> mkdir conf bash> cd conf bash> cat <<CONF > backora.PRD2.conf DAYLVL0=6 CONF bash> cp backora.PRD2.conf backora.PRD3.conf
XLE
xle is a companion tool that works with backora to handle output logging and DBA notification. xle stands for eXecute, Log and Email. Every time we write a script and run it using cron, we invariably need to save the output, check for errors and email someone if an error occurs. Instead of reinventing the wheel and rewriting the same code over and over, we created a tool that does it all automatically. Any shell command that returns a non-zero exit status in case of errors will work with xle. That includes all UNIX/Linux commands (ls, cp, rm, etc), and any script that we write following the script writing best practices, like backora. The main features of xle:
16 Session 475
Executes any <command> provided in the command line: prompt$ xle <command> Redirects all the output of <command> to a file tagged with date and time. If <command> exits with an error, it sends an email to the system administrator If <command> output contains any lines starting with ** , ORA- or RMAN-, it sends an email to the system administrator The system administrator email is assigned in a configuration file. All output files are stored in one central location that can be changed in the configuration file. Deletes output files older than one month. Retention can be changed in the configuration file.
All the output of backora is displayed on the screen or wherever the standard output is directed. backora is coded so that all error messages that it generates are started with the string ** and can be caught by xle. When an error is detected and it aborts, backora returns a non-zero exit status that can be caught by xle as well. xle can be downloaded from the web site provided at the end of this document.
XLE QUICK REFERENCE
$ ./xle USAGE: xle [ options ] script arg1 ... argn Options: -e -e -e -e a n e s (a)lways sends email (DEFAULT) (n)ever sends email sends email only when there was an (e)rror sends email only when it finishes (s)ucessfully script alias to be used when script name is not descriptive execute <script> file at ./scripts include log file contents in the report sent by email mail to address display help message
All output files are placed in the $BINDIR/logs directory, where $BINDIR is the directory where xle is installed. The output files are always created and kept. Notification emails are optional and can be configured on the command line. It can also email the full output log (-l option), or send a summarized report, which is the default behavior. Edit $BINDIR/xle.conf to change the log directory location and system administrator emails.
BACKORA CRON EXAMPLE
See below what the crontab of the oracle user account would look like in a Red Hat system. The Solaris crontab syntax is not compatible. Do not copy and paste this example in a Solaris system.
17 Session 475
Notice that the directory where xle is installed was added to the PATH for cron: /home/oracle/scripts/xle backora is installed on /home/oracle/scripts/backups.
### oracle user crontab # # 2/24/11 fdesouza - created # PATH=/usr/local/bin:/home/oracle/bin:/bin:/usr/bin:/home/oracle/scripts/xle # # # # # # every night at 7:30 pm take take take send full datapump export of PRD1 incremental backup of PRD1 Oracle suggested strategy backup of PRD1 email to DBA with an status report
30 19 * * * xle /home/oracle/scripts/backups/backora -eis PRD1 # # # # # every night at 8:00 pm export PRD2 database with data pump backup PRD2 using Oracle Suggested strategy send email to DBA only if there are errors
DOWNLOADS
This document and all related files can be downloaded at http://backora.posterous.com, using the password collaborate11.
REFERENCES
Oracle Database Backup and Recovery Advanced User's Guide 10g Release 2 (10.2)
Wikipedia - http://en.wikipedia.org/wiki/Tool Writing Shell Scripts Errors and Signals and Traps (Oh My!) - Part 1 - http://linuxcommand.org/wss0150.php DBAsupport.com Oracle Scripts - http://www.dbasupport.com/oracle/scripts/Detailed/46.shtml Top 10 Backup and Recovery best practices. - Metalink Doc ID 388422.1
Oracle "Check Logical" parameter - 4X times faster if not active! - http://www.symantec.com/connect/forums/oracle-check-logicalparameter-4x-times-faster-if-not-active Incrementally Updated Backup In 10G Metalink Doc ID 303861.1 Oracle Database Reference 10g Release 2 (10.2) http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/dynviews_1074.htm
18 Session 475
vii viii