Vous êtes sur la page 1sur 11

Logshipping secondary server is out of sync and transaction log restore job failing.

Problem
You can see that your logshipping is broken. In
the SQL Error log, the message below is
displayed :
Error: 14421, Severity: 16, State: 1.
The log shipping secondary database
myDB.logshippingPrimary has restore threshold
of 45 minutes and is out of sync. No restore was
performed for 6258 minutes.
Description of error message 14420 and error
message 14421 that occur when you use log
shipping in SQL Server
http://support.microsoft.com/default.aspx?scid=329133
Cause

Inside the LSRestore job history, you can find out two kind of
messages :
- Restore job skipping the logs on secondary server
Skipped log backup file. Secondary DB: 'logshippingSecondary',
File:
'\\myDB\logshipping\logshippingPrimary_20090808173803.trn'
- Backup log older is missing
*** Error 4305: The file
'\\myDB\logshipping\logshippingPrimary_20090808174201.trn'
is too recent to apply to the secondary database
'logshippingSecondary'.
**** Error : The log in this backup set begins at LSN
18000000005000001, which is too recent to apply to the
database. An earlier log backup that includes LSN
18000000004900001 can be restored.

Transaction Log backups can only be restored if they are in a


sequence. If the LastLSN field and the FirstLSN field do not
display the same number on consecutive transaction log
backups, they are not restorable in that sequence. There may
be several reasons for transaction log backups to be out of
sequence. Some of the most common reasons are a redundant
transaction log backup jobs on the primary server that are
causing the sequence to be broken or the recovery model of the
database was probably toggled between transaction log
backups.
Resolution
At this time, to check if there are a gaps in the Restore Process. You can run the query below to try to find out whether a redundant Backup Log was performed :
SELECT
s.database_name,s.backup_finish_date,y.physical_device_name
FROM
msdb..backupset AS s INNER JOIN
msdb..backupfile AS f ON f.backup_set_id = s.backup_set_id INNER JOIN
msdb..backupmediaset AS m ON s.media_set_id = m.media_set_id INNER JOIN
msdb..backupmediafamily AS y ON m.media_set_id = y.media_set_id
WHERE
(s.database_name = 'databaseNamePrimaryServer')
ORDER BY
s.backup_finish_date DESC;

You can see that another Backup Log was running out of logshipping process. Now, you have just to restore this backup on the secondary and run the LSRestore Job.

Logshipping monitor incorrectly raises error number 14420 instead of 14421 when the
secondary database is out of sync

Symptoms
Consider the following configuration of a Log Shipping environment:

Server A hosts primary server instance and the database.

Server B hosts secondary server instance and the database.

Server C hosts a monitor server instance where in the Log Shipping Monitor job is configured to use
an impersonated proxy account for the connections to Server A and Server B.
When you use this configuration, Log Shipping Monitor job incorrectly raises error message 14420 instead of
14421 when the secondary database is out of sync. The description of these error messages in SQL Server
2005 and SQL Server 2008 is as follows:
Error: 14420, Severity: 16, State: 1
The log shipping primary database %s.%s has backup threshold of %d minutes and has not performed a
backup log operation for %d minutes. Check agent log and logshipping monitor information.
Error: 14421, Severity: 16, State: 1
The log shipping secondary database %s.%s has restore threshold of %d minutes and is out of sync. No
restore was performed for %d minutes. Restored latency is %d minutes. Check agent log and logshipping
monitor information.
The alert message 14221 indicates that the difference between current time (UTC) and the
last_restored_date_utc value in the log_shipping_monitor_secondary table on the monitor server is greater
than value that is set for the Restore Alert threshold whereas the alert message 14220 indicates that the
difference between current time (UTC) and the last_backup_date_utc value in the
log_shipping_monitor_primary table on the monitor server is greater than value that is set for the Backup
Alert threshold.

Cause
The issue happens because of a problem in the Log Shipping user interface. When creating the monitor job
for the secondary, 14220 is passed instead of 14221.

Resolution
To resolve the problem, correct the @threshold_alert parameter value for the secondary database by
executing the following statement on the monitor server (Server C)

use master
go
sp_change_log_shipping_secondary_database @secondary_database =
@threshold_alert = 14421

'dbname',

SQL SERVER Log Shipping Restore Job Error: The file is too recent to apply to the secondary database
If you are a DBA and handled Log-shipping as high availability solution, there are a number of common errors that come that you would over a period of
time become pro on resolving. Here is one of the common error which you must have seen:

Message
20151013 21:09:05.13*** Error: The file C:\LS_S\LSDemo_20151013153827.trn is too recent to apply to the secondary
database LSDemo.(Microsoft.SqlServer.Management.LogShipping)

***
2015101321:09:05.13*** Error: The log in this backup set begins at LSN 32000000047300001, which is too recent to apply to the
database.
An
earlier
log
backup
that
includes
LSN
32000000047000001
can
be
restored.
RESTORE

LOG

is

terminating

abnormally.(.Net

SqlClient

Data

Provider)

***
Aboveerrorisashowninfailureofthehistoryofrestorejob.Ifthefailureismorethanconfiguredthresholds,thenwewould
start

seen

below

error

in

SQL

ERRORLOG

on

secondary

also:
20151014

06:22:00.240

spid60
Error:

14421,

Severity:

16,

State:

1.
2015101406:22:00.240spid60 ThelogshippingsecondarydatabasePinalServer.LSDemohasrestorethresholdof45minutes

andisoutofsync.Norestorewasperformedfor553minutes.Restoredlatencyis4minutes.Checkagentlogandlogshipping
monitorinformation.

To start troubleshooting, we can look at Job activity monitor on secondary which would fail with the below state:

If you know SQL transaction log backup basics, you might be able to guess the cause. If we look closely to the error, it talks about LSN mismatch. Most of
the cases, a manual transaction log backup was taken. I remember few scenarios where a 3 rd party tool would have taken transaction log backup of
database which was also part of a log shipping configuration.

Since we know the cause now, what we need to figure out is where is that out of band backup? Here is the query which I have written on my earlier
blog.

-- Assign the database name to variable below


DECLARE @db_name VARCHAR(100)
SELECT @db_name = 'LSDemo'
-- query
SELECT TOP (30) s.database_name
,m.physical_device_name
,CAST(CAST(s.backup_size / 1000000 AS INT) AS VARCHAR(14)) + ' ' +'MB' AS bkSize
,CAST(DATEDIFF(second, s.backup_start_date, s.backup_finish_date) AS VARCHAR(4)) + ' ' + 'Seconds' TimeTaken
,s.backup_start_date
,CAST(s.first_lsn AS VARCHAR(50)) AS first_lsn
,CAST(s.last_lsn AS VARCHAR(50)) AS last_lsn
,CASE s.[type]
WHEN 'D'
THEN 'Full'
WHEN 'I'
THEN 'Differential'
WHEN 'L'
THEN 'Transaction Log'
END AS BackupType
,s.server_name
,s.recovery_model
FROM msdb.dbo.backupset s
INNER JOIN msdb.dbo.backupmediafamily m ON s.media_set_id =m.media_set_id
WHERE s.database_name = @db_name
ORDER BY backup_start_date DESC
,backup_finish_date

Once we run the query, we would get list of backups happened on the database. This information is picked from MSDB database.

Below picture is self-explanatory.

Once we found the problematic backup, we need to restore it manually on secondary database. Make sure that we are using either norecovery or
standby option so that other logs can be restored. Once file is restored, the restore job would be able to pick-up from the same place and would catch up
automatically.

Case Study: Troubleshooting a Slow Log Shipping Restore job


Scenario

Consider a scenario where you have a Log shipping setup with a STANDY secondary
database and things are working just fine. One fine day you notice that the
secondary database is not in sync with the primary. The seasoned DBA that you are,
you go ahead and looked at the log shipping jobs and identify that the restore is
taking a lot of time.
The obvious question that come to your mind is whether a lot of transactions have
happened recently causing the log backup to be much larger. So you check the
folder and see the .TRN file sizes remain pretty much the same. What next?
I will cover some basic troubleshooting that you can do, to identify why the
restore process is so slow.
To give you a perspective, lets says that earlier a restore of a 4MB Transaction Log
backup used to take less than a minute. Now, it takes about approximately 20-25
minutes.Before I get into troubleshooting, make sure that you have ruled out these
factors:1. The Log backup size (.TRN) is pretty much the same as it was before.
2. The Disk is not a bottleneck on the secondary server.
3. The Copy job is working just fine and there is no delay here. From the job history
you clearly see that Restore is where the time is being spent.
4. The Restore job is not failing and no errors are reported during this time (e.g. Out
of Memory etc.).
Troubleshooting
The 1st thing to do to get more information on what the restore is doing is to enable
these trace flags
DBCC TRACEON (3004, 3605, -1)

3004 Gives extended information on backup and restore


3605 Prints trace information to the error log.

You can read more about these trace flags


here http://blogs.msdn.com/b/psssql/archive/2008/01/23/how-it-workswhat-is-restore-backup-doing.aspx
Here is a sample output after I enabled these trace flags. Focus on the specific
database which is the secondary database in your log shipping.
<Snippet from SQL Errorlog>
2010-12-29 16:11:19.10 spid64
2010-12-29 16:11:19.10 spid64
2010-12-29 16:11:19.10 spid64
2010-12-29 16:11:19.10 spid64
2010-12-29 16:11:19.10 spid64
2010-12-29 16:11:19.10 spid64
2010-12-29 16:11:19.12 spid64
2010-12-29 16:11:19.12 spid64
2010-12-29 16:11:19.12 spid64
2010-12-29 16:11:19.12 spid64
2010-12-29 16:11:23.46 spid64
2010-12-29 16:11:23.46 spid64
(0x4c122e000 to 0x4c1284000)
2010-12-29 16:11:23.46 spid64
2010-12-29 16:11:23.46 spid64
2010-12-29 16:11:23.51 spid64
2010-12-29 16:11:23.51 spid64
2010-12-29 16:11:23.51 spid64
2010-12-29 16:11:23.51 spid64
2010-12-29 16:11:23.51 spid64
2010-12-29 16:11:23.51 spid64
2010-12-29 16:11:24.24 spid64
2010-12-29 16:11:24.24 spid64
2010-12-29 16:11:24.24 spid64
2010-12-29 16:11:24.24 spid64
2010-12-29 16:11:25.69 spid64
2010-12-29 16:11:25.69 spid64
2010-12-29 16:11:25.74 spid64

RestoreLog: Database TESTDB


X-locking database: TESTDB
Opening backup set
Restore: Configuration section loaded
Restore: Backup set is open
Restore: Planning begins
Dismounting FullText catalogs
Restore: Planning complete
Restore: BeginRestore (offline) on TESTDB
Restore: Undoing STANDBY for TESTDB
SnipEndOfLog from LSN: (296258:29680:1)
Zeroing D:\SQL\SQLLog\TESTDB.ldf from page 2492695 to 2492738

2010-12-29 16:11:26.74 spid64


2010-12-29 16:11:26.76 spid64
does not allow recovery to be run.
2010-12-29 16:11:27.63 spid64
0x4c1769400 to 0x4c176a000.
2010-12-29 16:11:27.63 spid64
(0x4c176a000 to 0x4c17e2000)
2010-12-29 16:11:27.65 spid64

Starting up database 'TESTDB'.


The database 'TESTDB' is marked RESTORING and is in a state that

Zeroing completed on D:\SQL\SQLLog\TESTDB.ldf


Restore: Finished undoing STANDBY for TESTDB
Restore: PreparingContainers
Restore: Containers are ready
Restore: Restoring backup set
Restore: Transferring data to TESTDB
Restore: Waiting for log zero on TESTDB
Restore: LogZero complete
FileHandleCache: 0 files opened. CacheSize: 10
Restore: Data transfer complete on TESTDB
Restore: Backup set restored
Restore-Redo begins on database TESTDB
Rollforward complete on database TESTDB
Restore: Done with fixups
Transitioning to STANDBY

FixupLogTail() zeroing S:\SQLServer\SQLLog\TESTDB.ldf from


Zeroing D:\SQL\SQLLog\TESTDB.ldf from page 2493365 to 2493425
Zeroing completed on D:\SQL\SQLLog\TESTDB.ldf

2010-12-29 16:24:30.55 spid64


Recovery is writing a checkpoint in database 'TESTDB' (5). This is
an informational message only. No user action is required.
2010-12-29 16:24:35.43 spid64
Starting up database 'TESTDB'.
2010-12-29 16:24:39.10 spid64
CHECKDB for database 'TESTDB' finished without errors on 201012-21 23:31:25.493 (local time). This is an informational message only; no user action is required.
2010-12-29 16:24:39.10 spid64
Database is in STANDBY
2010-12-29 16:24:39.10 spid64
Restore: Writing history records
2010-12-29 16:24:39.10 Backup
Log was restored. Database: TESTDB, creation date(time):

2008/01/26(09:32:02), first LSN: 296258:29680:1, last LSN: 298258:40394:1, number of dump devices:
1, device information: (FILE=1, TYPE=DISK:
{'S:\SQL\SQLLogShip\TESTDB\TESTDB_20101229011500.trn'}). This is an informational message. No
user action is required.
2010-12-29 16:24:39.12 spid64
Writing backup history records
2010-12-29 16:24:39.21 spid64
Restore: Done with MSDB maintenance
2010-12-29 16:24:39.21 spid64
RestoreLog: Finished

</Snippet>

From the above output we see that the restore took ~13 minutes. If you look closely
at the output the section highlighted in green is where most of the time is spent.
Now when we talk about log restores, the number of VLFs play a very important
factor. More about the effect of VLFs vs. Restore Time given
here http://blogs.msdn.com/b/psssql/archive/2009/05/21/how-a-log-filestructure-can-affect-database-recovery-time.aspx
Bottom line is that a large number of virtual log files (VLFs) can slow down
transaction log restores. To find out if this is the case here, use the following
command
DBCC LOGINFO (TESTDB) WITH NO_INFOMSGS

The following information can be deciphered from the above output :1. The number of rows returned in the above output is the number of VLFs.

2. The number of VLFs that had to be restored in this log backup can be calculated
from the section highlighted in blue (above).
first LSN: 296258:29680:1, last LSN: 298258:40394:1

298258 296258 = 2000 VLFs

3. The Size of each VLF can be calculated based on the FileSize column.
9175040 8.75 MB
9437184 9 MB
10092544 9.62 MB

Problem (s)
So based on the above there are two possibilities,
1. The number of VLFs is rather large which we know will impact restore performance.
2. The size of each VLF is large is cause for concern if STANDY mode is in effect.

The 2nd problem is aggravated if there are batch jobs or long-running transactions
that span multiple backups (e.g. Rebuild Indexes). In this case the work of
repeatedly rolling back the long-running transaction, writing the rollback work to the
standby file (TUF file), then undoing all the rollback work with the next log restore
just to start the process over again can easily cause a log shipping secondary to get
behind.
While we are talking about the TUF file, I know many people out there are not clear on what this is used for. So here goes,

What is the Transaction Undo File (TUF)?


This file contains information on any modifications that were made as part of incomplete transactions (uncommitted transactions and is used to save the contents of these pages) at the time the
backup was performed. A transaction undo file is required if a database is loaded in read-only state (STANDY mode option in LS). In this state, further transaction log backups may be applied.

In the standby mode (which we have for secondary database), database recovery is
done when the log is restored and this mode also creates a file with the extension
.TUF (which is the transaction Undo file on the destination server). That is why in
this mode we will be able to access the databases (Read-Only access). So before the
next TLOG backup is applied, the saved changes in the undo file are reapplied to the
database. Since this is in STANDBY mode, for any large transactions, the restore
process also does the work of writing the rollback to the standby file (TUF), so we
might be spending time initializing the whole virtual log.

Solution 1

You need to reduce the number of VLFs. You can do this by running DBCC
SHRINKFILE to reduce the ldf (s) to a small size, thereby reducing the number of
VLFs. Note: You need to do this on the Primary database.
After the shrink is complete verify the VLFs have reduced by running DBCC
LOGINFO again. A good range would be somewhere between 500-1000. Resize the
log file to the desired size using a single growth operation. You can do this by
tweaking the Initial Size setting, also pay attention to the Auto-Growth setting for
the LDF file. Setting it too small a value can lead to too many VLFs.
ALTER DATABASE DBNAME MODIFY FILE (NAME='ldf_logical_name', SIZE=<target>MB)

Also remember that you still have to first apply the pending Log backups before we
get to the one which holds the shrink operation. Once we reach this then you can
measure the restore time to see if the changes above had a positive impact.
Solution 2
For problem 2 where the size of the VLFs is causing havoc with the STANDBY mode,
you will have to truncate the transaction log. This means that log shipping has to be
broken.
You can truncate the TLOG on the source database by setting the recovery model to
SIMPLE (using ALTER DATABASE command). If on SQL 2005 or lower versions you
can use the BACKUP LOG DBNAME with TRUNCATE_ONLY command.
Then make modifications to the Log file Auto-Grow setting to an appropriate value.
Pay attention to the value you set here such that it is not too high, else transactions
will have to wait while the file is being grown or too low that it creates too many
VLFs. Take a full database backup immediately and use this to re-setup the log
shipping.
Tip: You can use the DBCC SQLPERF(LOGSPACE) command to find out what percent of
your log file is used.

Vous aimerez peut-être aussi