Vous êtes sur la page 1sur 10

Another verification method could be to failover to the standby database and then

test if it is possible to failback to the primary database. To activate the standby


database as a primary database, use the following statements:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH;
SQL> ALTER DATABASE ACTIVATE STANDBY DATABASE;
If you have not enabled flashback on the original primary database, it�s
recommended that you drop the original primary database and recreate as a standby
database.
We recommend that you enable flashback database on the primary and the standby
databases. When a failover happens, the primary database can be flashed back to the
time before the failover and quickly converted to a standby database.

=============------------------=========================-----------------------
================================
=============------------------=========================-----------------------
================================

Arup Nanda ::
Resolving Gaps in Data Guard Apply Using Incremental RMAN BAckup
Recently, we had a glitch on a Data Guard (physical standby database) on
infrastructure. This is not a critical database; so the monitoring was relatively
lax. And that being done by an outsourcer does not help it either. In any case, the
laxness resulted in a failure remaining undetected for quite some time and it was
eventually discovered only when the customer complained. This standby database is
usually opened for read only access from time to�time.This�time, however, the
customer saw that the data was significantly out of sync with primary and raised a
red flag. Unfortunately, at this time it had become a rather political issue.

Since the DBA in charge couldn�t resolve the problem, I was called in. In this
post, I will describe the issue and how it was resolved. In summary, there are two
parts of the problem:

(1) What happened


(2) How to fix it

What Happened

Let�s look at the first question � what caused the standby to lag behind. First, I
looked for the current SCN numbers of the primary and standby databases. On the
primary:

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1447102

On the standby:

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1301571

Clearly there is a difference. But this by itself does not indicate a problem;
since the standby is expected to lag behind the primary (this is an asynchronous
non-real time apply setup). The real question is how much it is lagging in the
terms of wall clock. To know that I used the scn_to_timestamp function to translate
the SCN to a timestamp:

SQL> select scn_to_timestamp(1447102) from dual;

SCN_TO_TIMESTAMP(1447102)
-------------------------------
18-DEC-09 08.54.28.000000000 AM

I ran the same query to know the timestamp associated with the SCN of the standby
database as well (note, I ran it on the primary database, though; since it will
fail in the standby in a mounted mode):

SQL> select scn_to_timestamp(1301571) from dual;

SCN_TO_TIMESTAMP(1301571)
-------------------------------
15-DEC-09 07.19.27.000000000 PM

This shows that the standby is�two and half days�lagging! The data at this point is
not just stale; it must be rotten.

The next question is why it would be lagging so far back in the past. This is a
10.2 database where FAL server should automatically resolved any gaps in archived
logs. Something must have happened that caused the FAL (fetch archived log) process
to fail. To get that answer, first, I checked the alert log of the standby
instance. I found these lines that showed the issue clearly:


Fri Dec 18 06:12:26 2009
Waiting for all non-current ORLs to be archived...
Media Recovery Waiting for thread 1 sequence 700
Fetching gap sequence in thread 1, gap sequence 700-700
���
Fri Dec 18 06:13:27 2009
FAL[client]: Failed to request gap sequence
GAP - thread 1 sequence 700-700
DBID 846390698 branch 697108460
FAL[client]: All defined FAL servers have been attempted.

Going back in the alert log, I found these lines:

Tue Dec 15 17:16:15 2009


Fetching gap sequence in thread 1, gap sequence 700-700
Error 12514 received logging on to the standby
FAL[client, MRP0]: Error 12514 connecting to DEL1 for fetching gap sequence
Tue Dec 15 17:16:15 2009
Errors in file /opt/oracle/admin/DEL2/bdump/del2_mrp0_18308.trc:
ORA-12514: TNS:listener does not currently know of service requested in connect
descriptor
Tue Dec 15 17:16:45 2009
Error 12514 received logging on to the standby
FAL[client, MRP0]: Error 12514 connecting to DEL1 for fetching gap sequence
Tue Dec 15 17:16:45 2009
Errors in file /opt/oracle/admin/DEL2/bdump/del2_mrp0_18308.trc:
ORA-12514: TNS:listener does not currently know of service requested in connect
descriptor

This clearly showed the issue. On December 15th at 17:16:15, the Managed Recovery
Process encountered an error while receiving the log information from the primary.
The error was ORA-12514 �TNS:listener does not currently know of service requested
in connect descriptor�. This is usually the case when the TNS connect string is
incorrectly specified. The primary is called DEL1 and there is a connect string
called DEL1 in the standby server.

The connect string works well. Actually, right now there is no issue with the
standby getting the archived logs; so there connect string is fine - now. The
standby is receiving log information from the primary. There must have been some
temporary hiccups causing that specific archived log not to travel to the standby.
If that log was somehow skipped (could be an intermittent problem), then it should
have been picked by the FAL process later on; but that never happened. Since the
sequence# 700 was not applied, none of the logs received later � 701, 702 and so on
� were applied either. This has caused the standby to lag behind since that time.

So, the fundamental question was why FAL did not fetch the archived log sequence#
700 from the primary. To get to that, I looked into the alert log of
the�primary�instance. The following lines were of interest:


Tue Dec 15 19:19:58 2009
Thread 1 advanced to log sequence 701 (LGWR switch)
Current log# 2 seq# 701 mem# 0: /u01/oradata/DEL1/onlinelog/o1_mf_2_5bhbkg92_.log
Tue Dec 15 19:20:29 2009Errors in file
/opt/oracle/product/10gR2/db1/admin/DEL1/bdump/del1_arc1_14469.trc:
ORA-00308: cannot open archived log '/u01/oraback/1_700_697108460.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
Tue Dec 15 19:20:29 2009
FAL[server, ARC1]: FAL archive failed, see trace file.
Tue Dec 15 19:20:29 2009
Errors in file /opt/oracle/product/10gR2/db1/admin/DEL1/bdump/del1_arc1_14469.trc:
ORA-16055: FAL request rejected
ARCH: FAL archive failed.
Archiver continuing
Tue Dec 15 19:20:29 2009
ORACLE Instance DEL1 - Archival Error. Archiver continuing.

These lines showed everything clearly. The issue was:

ORA-00308: cannot open archived log '/u01/oraback/1_700_697108460.dbf'


ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory

The archived log simply was not available. The process could not see the file and
couldn�t get it across to the standby site.

Upon further investigation I found that the DBA actually removed the archived logs
to make some room in the filesystem without realizing that his action has removed
the most current one which was yet to be transmitted to the remote site. The
mystery surrounding why the FAL did not get that log was finally cleared.

Solution
Now that I know the cause, the focus was now on the resolution. If the archived log
sequence# 700 was available on the primary, I could have easily copied it over to
the standby, registered the log file and let the managed recovery process pick it
up. But unfortunately, the file was gone and I couldn�t just recreate the file.
Until that logfile was applied, the recovery will not move forward. So, what are my
options?

One option is of course to recreate the standby - possible one but not technically
feasible considering the time required. The other option is to apply the
incremental backup of primary from that SCN number. That�s the key � the backup
must be from a specific SCN number. I have described the process since it is not
very obvious. The following shows the step by step approach for resolving this
problem. I have shown where the actions must be performed � [Standby] or [Primary].

1. [Standby] Stop the managed standby apply process:

SQL> alter database recover managed standby database cancel;

Database altered.

2. [Standby] Shutdown the standby database

3. [Primary] On the primary, take an incremental backup from the SCN number where
the standby has been stuck:

RMAN> run {�
2> allocate channel c1 type disk format '/u01/oraback/%U.rmb';�
3> backup incremental from scn 1301571 database;
4> }

using target database control file instead of recovery catalog


allocated channel: c1
channel c1: sid=139 devtype=DISK

Starting backup at 18-DEC-09


channel c1: starting full datafile backupset
channel c1: specifying datafile(s) in backupset
input datafile fno=00001 name=/u01/oradata/DEL1/datafile/o1_mf_system_5bhbh59c_.dbf
���
piece handle=/u01/oraback/06l16u1q_1_1.rmb tag=TAG20091218T083619 comment=NONE
channel c1: backup set complete, elapsed time: 00:00:06
Finished backup at 18-DEC-09
released channel: c1

4. [Primary] On the primary, create a new standby controlfile:

SQL> alter database create standby controlfile as '/u01/oraback/DEL1_standby.ctl';

Database altered.

5. [Primary] Copy these files to standby host:

oracle@oradba1 /u01/oraback# scp *.rmb *.ctl oracle@oradba2:/u01/oraback


oracle@oradba2's password:
06l16u1q_1_1.rmb 100% 43MB 10.7MB/s 00:04�
DEL1_standby.ctl 100% 43MB 10.7MB/s 00:04�

6. [Standby] Bring up the instance in nomount mode:


SQL> startup nomount

7. [Standby] Check the location of the controlfile:

SQL> show parameter control_files

NAME TYPE VALUE


------------------------------------ ----------- ------------------------------
control_files string /u01/oradata/standby_cntfile.ctl

8. [Standby] Replace the controlfile with the one you just created in primary.

9. $ cp /u01/oraback/DEL1_standby.ctl /u01/oradata/standby_cntfile.ctl

10.[Standby] Mount the standby database:

SQL> alter database mount standby database;

11.[Standby] RMAN does not know about these files yet; so you must let it know � by
a process called cataloging. Catalog these files:

$ rman target=/

Recovery Manager: Release 10.2.0.4.0 - Production on Fri Dec 18 06:44:25 2009

Copyright (c) 1982, 2007, Oracle. All rights reserved.

connected to target database: DEL1 (DBID=846390698, not open)


RMAN> catalog start with '/u01/oraback';

using target database control file instead of recovery catalog


searching for all files that match the pattern /u01/oraback

List of Files Unknown to the Database


=====================================
File Name: /u01/oraback/DEL1_standby.ctl
File Name: /u01/oraback/06l16u1q_1_1.rmb

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files


=======================
File Name: /u01/oraback/DEL1_standby.ctl
File Name: /u01/oraback/06l16u1q_1_1.rmb

12.Recover these files:

RMAN> recover database;

Starting recover at 18-DEC-09


using channel ORA_DISK_1
channel ORA_DISK_1: starting incremental datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00001:
/u01/oradata/DEL2/datafile/o1_mf_system_5lptww3f_.dbf
...�
channel ORA_DISK_1: reading from backup piece /u01/oraback/05l16u03_1_1.rmb
channel ORA_DISK_1: restored backup piece 1
piece handle=/u01/oraback/05l16u03_1_1.rmb tag=TAG20091218T083619
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07

starting media recovery

archive log thread 1 sequence 8012 is already on disk as file


/u01/oradata/1_8012_697108460.dbf
archive log thread 1 sequence 8013 is already on disk as file
/u01/oradata/1_8013_697108460.dbf
���

13. After some time, the recovery fails with the message:

archive log filename=/u01/oradata/1_8008_697108460.dbf thread=1 sequence=8009


RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 12/18/2009 06:53:02
RMAN-11003: failure during parse/execution of SQL statement: alter database recover
logfile '/u01/oradata/1_8008_697108460.dbf'
ORA-00310: archived log contains sequence 8008; sequence 8009 required
ORA-00334: archived log: '/u01/oradata/1_8008_697108460.dbf'

This happens because we have come to the last of the archived logs. The expected
archived log with sequence# 8008 has not been generated yet.

14.At this point exit RMAN and start managed recovery process:

SQL> alter database recover managed standby database disconnect from session;

Database altered.

15.Check the SCN�s in primary and standby:

[Standby] SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1447474
[Primary] SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1447478
Now they are very close to each other. The standby has now caught up.

===============************************================******************==========
=
===============************************================******************==========
=

Don Burleson ::
Rohit Gupta ::

Tips from the trenches by Rohit Gupta


When you are using Dataguard, there are several scenarios when physical standby can
go out of sync with the primary database.
Before doing anything to correct the problem, we need to verify that why standby is
not in sync with the primary. In this particular article, we are covering the
scenario where a log is missing from the standby but apart from the missing log,
all logs are available.
Verify from�v$archived_log�that there is a gap in the sequence number. All the logs
up to that gap should have APPLIED=YES and all the sequence# after the missing log
sequence# are APPLIED=NO. This means that due to the missing log, MRP is not
applying the logs on standby but the logs are still being transmitted to the
standby and are available.
SQL> SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;
So for example, if the missing log sequence# is 400, then the above query should
show that up to sequence#399, all have APPLIED=YES and starting from 401, all are
APPLIED=NO.
There are few steps to be performed when the standby is not in sync with the
primary because there is a gap of logs on standby.
These steps are:
STEP #1:�Take an incremental backup of primary from the SCN where standby is
lagging behind and apply on the standby server
STEP #2:�If step#1 is not able to sync up, then re-create the controlfile of
standby database from the primary
STEP #3:�If after step#2, you still find that logs are not being applied on the
standby, check the alert log and you may need to re-register the logs with the
standby database.
***********************************************************************************
********
STEP#1
1. On STANDBY database query the�v$database�view and record the current SCN of the
standby database:
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
1.3945E+10
SQL> SELECT to_char(CURRENT_SCN) FROM V$DATABASE;
TO_CHAR(CURRENT_SCN)
----------------------------------------
13945141914
2. Stop Redo Apply on the standby database:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
*
ERROR at line 1:
ORA-16136: Managed Standby Recovery not active
If you see the above error, it means Managed Recovery is already off
You can also confirm from the view�v$managed_standby�to see if the MRP is running
or not
SQL> SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;
3. Connect to the primary database as the RMAN target and create an incremental
backup from the current SCN of the standby database that was recorded in step 1:
For example,
BACKUP INCREMENTAL FROM SCN 13945141914 DATABASE FORMAT '/tmp/ForStandby_%U' tag
'FOR STANDBY'
You can choose a location other than /tmp also.
4. Do a recovery of the standby database using the incremental backup of primary
taken above:
On the Standby server, without connecting to recovery catalog, catalog the
backupset of the incremental backup taken above. Before this, of course you need to
copy the backup piece of the incremental backup taken above to a location
accessible to standby server.
$ rman nocatalog target /
RMAN> CATALOG BACKUPPIECE '/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1';
Now in the same session, start the recovery
RMAN> RECOVER DATABASE NOREDO;
You should see something like:
Starting recover at 2015-09-17 04:59:57
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=309 devtype=DISK
channel ORA_DISK_1: starting incremental datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
....
..
..
.
channel ORA_DISK_1: reading from backup piece
/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1
channel ORA_DISK_1: restored backup piece 1
piece handle=/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1 tag=FOR STANDBY
channel ORA_DISK_1: restore complete, elapsed time: 01:53:08
Finished recover at 2015-07-25 05:20:3
Delete the backup set from standby:
RMAN> DELETE BACKUP TAG 'FOR STANDBY';
using channel ORA_DISK_1
List of Backup Pieces

BP Key� BS Key� Pc# Cp# Status����� Device Type Piece Name


------- ------- --- --- ----------- ----------- ----------
17713�� 17713�� 1�� 1�� AVAILABLE�� DISK�������
/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1
Do you really want to delete the above objects (enter YES or NO)? YES
deleted backup piece
backup piece handle=/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1 recid=17713
stamp=660972421
Deleted 1 objects
5. Try to start the managed recovery.
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT
FROM SESSION;
If you get an error here, you need to go to STEP#2 for bringing standby in sync.
If no error, then using the view�v$managed_standby, verify that MRP process is
started and has the status APPLYING_LOGS.
6. After this, check whether the logs are being applied on the standby or not:
SQL> SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;
After doing a recovery using the incremental backup, you will not see the
sequence#'s which were visible earlier with APPLIED=NO because they have been
absorbed as part of the incremental backup and applied on standby during recovery.
The APPLIED column starts showing YES for the logs which are being transmitted now,
this means logs are being applied.
Check the status of MRP process in the view�v$managed_standby. The status should be
APPLYING_LOGS for the duration that available logs are being applied and once all
available logs have been applied, the status should be WAITING_FOR_LOGS
7. Another check to verify that primary and standby are in sync. Run the following
query on both standby and primary:
SQL> select max(sequence#) from v$log_history.
Output should be same on both databases.
***********************************************************************************
********
STEP #2:
Since Managed recovery failed after applying the incremental backup, we need to
recreate the controlfile of standby. The reason for recreating the controlfile is
that the state of the database was same because the database_scn was not updated in
the control file after applying the incremental backup while the scn for datafiles
were updated. Consequently, the standby database was still looking for the old file
to apply.
A good MOSC note for re-creating the controlfile in such a scenario is 734862.1.
Steps to recreate the standby controlfile and start the managed recovery on
standby:
1. Take the backup of controlfile from primary
rman target sys/oracle@proddb catalog rman/cat@emrep
backup current controlfile for standby;
2. Copy the controlfile backup to the standby system (or if it is on the common NFS
mount, no need to transfer or copy) and restore the controlfile onto the standby
database
Shutdown all instances (If standby is RAC) of the standby.
sqlplus / as sysdba
shutdown immediate
exit
Startup nomount, one instance.
sqlplus / as sysdba
startup nomount
exit
Restore the standby control file.
rman nocatalog target /
restore standby controlfile from '/tmp/o1_mf_TAG20070220T151030_.bkp';
exit
3. Startup the standby with the new control file.
sqlplus / as sysdba
shutdown immediate
startup mount
exit
4. �Restart managed recovery in one instance (if standby is RAC) of the standby
database:
sqlplus / as sysdba
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
The above statement may succeed without errors but the MRP process will still not
start. The reason is that since the controlfile has been restored from the primary,
it is looking for datafiles at the same location as are in primary instead of
standby. For example, if the primary datafiles are located at
'+DATA/proddb_1/DATAFILE' and standby datafiles are at '+DATA/proddb_2/DATAFILE',
the new controlfile will show the datafile's location as '+DATA/proddb_1/DATAFILE'.
This can be verified from the query "select name from v$datafile" on the standby
instance. We need to rename all the datafiles to reflect the correct location.
There are two ways to rename the datafiles:
1. Without using RMAN
Change the parameter�standby_file_management=manual�in standby's parameter file.
ALTER DATABASE RENAME FILE '+DATA/proddb_1/datafile/users.310.620229743' TO
'+DATA/proddb_2/datafile/USERS.1216.648429765';
2. Using RMAN
rman nocatalog target /
Catalog the files, the string specified should refer to the diskgroup/filesystem
destination of the standby data files.
RMAN> catalog start with '+diskgroup/<dbname>/datafile/';
e.g.:
RMAN> catalog start with '+DATA/proddb_2/datafile/';
This will give the user a list of files and ask if they should all be cataloged.
The user should review and say YES if all the datafiles are properly listed.
Once that is done, then commit the changes to the controlfile
RMAN> switch database to copy;
Now start the managed recovery as:
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT
and check for processes in the view�v$managed_standby.� MRP process should be
there. It will also start applying all the archived logs that were missing since
last applied log.� This process might take hours.
5. Another check to verify that primary and standby are in sync:
Run the following query on both standby and primary after all logs
in�v$archived_log�show APPLIED=YES:
SQL> select max(sequence#) from v$log_history.
Output should be same on both databases.
***********************************************************************************
******
STEP #3
After recreating the controlfile, you still find that logs are being transmitted
but not being applied on the standby. Check the alert log of standby. For example,
see if you find something similar to below snippet:
Fetching gap sequence in thread 1, gap sequence 74069-74095
Wed Sep 17 06:45:47 2015
RFS[1]: Archived Log:
'+DATA/ipwp_sac1/archivelog/2008_09_17/thread_1_seq_74093.259.665649929'
Wed Sep 17 06:45:55 2015
Fetching gap sequence in thread 1, gap sequence�74069-74092
Wed Sep 17 06:45:57 2015
RFS[1]: Archived Log:
'+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74094.258.665649947'
Wed Sep 17 06:46:16 2015
RFS[1]: Archived Log:
'+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74095.256.665649957'
Wed Sep 17 06:46:26 2015
FAL[client]: Failed to request gap sequence
�GAP - thread 1 sequence�74069-74092
The contents of alert log shows that logs sequence# from 74069 to 74092 may have
been transmitted but not applied. The view�v$archived_log�shows the sequence#
starting from 74093 and APPLIED=NO.
So this situation means that logs up to 74068 were applied as part of the
incremental backup and from 74069 to 74093 have been transferred to standby server
but they must have failed to register with standby database. Try the following
steps:�
1.Locate the log sequence# shown in alert log (for example 74069 to 74092). For
example,+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74069.995.665630861
2.Register all these archived logs with the standby database.
alter database register logfile
'+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74069.995.665630861';
alter database register logfile
'+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74070.998.665631405';
alter database register logfile
'+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74071.792.665633755';
alter database register logfile
'+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74072.263.665633713';
??..
?.and so on till the last one.
3.Now check the view�v$archived_log�and finally should see the logs being applied.
The status of MRP should change from ARCHIVE_LOG_GAP to APPLYING_LOGS and
eventually WAITING_FOR_LOGS.
***********************************************************************************
********

Vous aimerez peut-être aussi