Vous êtes sur la page 1sur 57

Database Recovery Techniques

Chapter 23

1
What is Database Recovery

Restoring a database to the most recent legal state


after failure (crash) is called Database Recovery.

Recovery is necessary to preserve Atomicity &


Durability of a transaction.

The module of a DBMS responsible for database


recovery is called Recovery Manager.

2
Kind of failures a database system
may encounter
 Transaction failure

 System failure

 Disk failure

 Catastrophic failure

3
Transaction failure

A transaction may fail because of


 incorrect input (e.g. wrong OTP),
 deadlock,
 schedule is not serializable.

4
System failure

System may fail because of


 Addressing error,
 Application error,
 Operating system fault,
 RAM failure,
 Network failure

5
Disk failure

Disk block may lose data because of


 malfunction of read/write head
 read/write head crash

6
Catastrophic failure

 Power failure
 Air conditioning failure
 Fire
 Theft
 Sabotage
 Overwriting disk
 Mounting a wrong tape
 Tsunami
 Hurricane

7
State transition diagram illustrating the states for
transaction execution

8
States for transaction execution

 BEGIN_TRANSACTION. This marks the beginning of


transaction execution.
 READ or WRITE. These specify read or write operations on the
database items that are executed as part of a transaction.
 END_TRANSACTION. This specifies that READ and WRITE
transaction operations have ended and marks the end of
transaction execution. However, at this point it may be
necessary to check whether the changes introduced by the
transaction can be permanently applied to the database
(committed) or whether the transaction has to be aborted
because it violates serializability or for some other reason.

9
States for transaction execution

 COMMIT. This signals a successful end of the transaction


so that any changes (updates) executed by the
transaction can be safely committed to the database and
will not be undone.
 ROLLBACK (or ABORT). This signals that the transaction
has ended unsuccessfully, so that any changes or effects
that the transaction may have applied to the database
must be undone.

10
System log
(Data structure required for recovery)

 Also called transaction log / log. (DBMS journal)

 A log is a sequential (append-only) file.

 Log file stores records that contain information about


operations executed by transactions.

 Every log record contains transaction id & operation


executed by the transaction.

11
System log

The following records are found in the log:

[start_transaction, T]

[read_item, T, X]

[write_item, T, X, old_value, new_value]

[commit, T]

[abort, T]

Log is stored on disk with its last part (most recent part) residing in DBMS

cache.

Part of the log residing in the DBMS cache is called log buffer.

12
System log (an example)

[start_transaction, T1]
[read_item, T1, X]
[write_item, T1, X, 12000, 7000]
[read_item, T1, Y]
[write_item, T1, Y, 20000, 25000]
[commit, T1]
[start_transaction, T2]
[read_item, T2, X]
[write_item, T2, X, 12000, 19000]
[commit, T2]

13
System log (another instance)

System log?

14
Commit point of a transaction

 A transaction is said to be at the commit point


just before commit record [commit, T] is written
to the disk log.
 Beyond the commit point, transaction is said to
have been committed.
 At this stage all changes made by the transaction
must be written to the disk database.
 And these changes must not be lost despite
failure. (Durability)

15
Data update
(Flushing cached data to disk database)
 Deferred update
All modified data items in the cache are written
to the disk either after a transaction commits or
after a fixed number of transactions have
committed.

 Immediate update
Modified data item can be written to the disk
before the transaction commits.

16
Data update

 In-place update
The disk version of the data item is overwritten by its
cache version.

 Shadow update
The modified version of a data item does not
overwrite its disk copy but is written at a separate
disk location.

17
Data caching

 Cache manager manages cached data.

 Lookup table (directory).


 <DiskBlockAddress, BufferLocation, DirtyBit, PinUnpinBit>

 If Dirty_bit = 0 then the buffer need not be flushed to the disk;

 Else buffer should be flushed to the disk.

 If Pin_unpin_bit = 1, buffer cannot be flushed to the disk.

 Else buffer can be flushed to the disk

18
Rollback (Undo) & Roll forward (Redo)

 Undo & Redo are done to maintain Atomicity of a transaction.

 Undo restores BFIM on the disk (removes all AFIM).

 Redo writes all AFIM on the disk.

 Database recovery is achieved using one of the following


approaches:
Redo only
Undo only
Redo & Undo
19
Rollback (Undo) & Redo Roll forward (Redo)

 Redo & Undo operations are recorded on the log as


it happen.

 Redo & Undo operations need to be idempotent.

 Entire recovery process needs to be idempotent.

20
Write-ahead logging (WAL)

 When in-place update (immediate or deferred) is


used then
log is necessary for recovery.

 Log must be available to the recovery manager.

 Recovery algorithm that uses log must follow


Write-ahead logging (WAL) protocol.

21
Write-ahead logging (WAL) protocol

 WAL protocol states that log data must be written


to the disk before the modified data items are
written (flushed) to the database.
 Transactions whose commit record is NOT found
on the disk log must be undone (Atomicity)
 For undo operation to be successful, the BFIM of
data item must be written to the disk log.
 Transactions whose commit record is found on
the disk log may need redo operation (Atomicity)
 For redo operation to be successful, the AFIM of
data item must be written to the disk log.

22
WAL protocol

 Recovery protocol that needs undo as well as redo, BFIM &


AFIM of data item need to be written to the disk log before
the transaction commits. Log entry looks like
[write_item, T, X, old_value, new_value]

 Log entry that contains BFIM only is called undo-type log


entry & of the form
[write_item, T, X, old_value].
 Log entry that contains AFIM only is called redo-type log
entry & of the form
[write_item, T, X, new_value].
23
When to redo & when to undo?

 A transaction is said to have been committed if its


commit record [commit, T] is found on the disk
log.
 If the changes made by a committed transaction had
not been written to the disk database then redo is
necessary.
 If a transaction does not have its commit record
on the disk log then it is said to have been aborted.
 And changes made by this transaction on the disk
database need to be undone.

24
Checkpoint

 From time to time buffers of DBMS cache that hold modified


data items are flushed to the disk.
 According to WAL protocol, associated log data are written to
the disk ahead of flushing modified data items to the disk.
 This force-writing of data item after regular interval is called
Check pointing.
 Check pointing reduces the job of recovery manager in case of
failure.

25
Checkpoint

The following steps define checkpoint operation:


Step 1. Suspend execution of all transactions*.
Step 2. Write all modified data items from DBMS cache
to the disk.
Step 3. Write a checkpoint record to the log buffer and
write the log buffer to the disk.
Step 4. Resume execution of transactions.

*When all transactions are suspended, the database system is said


to be at quiescent state.
26
Fuzzy checkpoint

 While the modified data items are being written to


the disk, execution of transactions can be resumed.
 In this case a special file should hold the last
checkpoint location in the log table.
 If the checkpoint under construction is not
successfully completed then the previous checkpoint
remains valid.
 If the checkpoint under construction is successfully
completed then the new checkpoint location
overwrites old checkpoint in the special file.

27
Checkpoint record & its effect

 A checkpoint record looks like


[checkpoint]
 If the system crashes after checkpoint had been
taken then
transactions whose commit record appear
before checkpoint record, need not be redone
because changes made by these transactions had
already been written to the disk.
 During recovery redo & undo are required only for
transactions with log records appearing after
checkpoint record.

28
Steal/No-steal & Force/No-force

 Steal/No-steal & Force/No-force are different


possible strategies of writing modified data items to
disk.

 Steal:
Modified data items can be written to the disk
before transaction commits.

 No-steal:
Modified data items cannot be written to the disk
before transaction commits.

29
Force/No-force

 Force:
All modified data items must be written to the disk
before the transaction commits.

 No-force:
All modified data items need not be written to the
disk before the transaction commits.

30
Alternative strategies of Recovery Manager

Recovery manager can use one of the following


strategies to recover the database after crash.

 Steal/Force (Undo/No-redo)
 Steal/No-force (Undo/Redo) (ARIES uses)
 No-steal/Force (No-undo/No-redo)
 No-steal/No-force (No-undo/Redo)

31
Recovery scheme based on deferred update

 This scheme is known as No-undo/Redo algorithm.


 In this scheme data update goes as follows:
 Log records of updates made by transactions are
written in log buffer.
 Log buffer is written to the disk log along with
commit record.
 At this stage changes are supposed to be flushed to
the disk.
 If changes could not be flushed to the disk before
failure occurred then only redo is necessary.

32
Deferred update in a single-user system

 In a single-user system, only serial execution of


transactions is possible.

 No concurrent execution of transactions and


hence concurrency control protocol (e.g. 2PL) is not
necessary.

33
A serial schedule & system log with deferred
update

34
Recovery based on deferred update with
concurrent users & a checkpoint (an example)

35
System log at the time of crash

36
Recovery based on deferred update with
concurrent users & a checkpoint

An example of a recovery timeline to illustrate the effect of checkpointing


based on a schedule different from the previous one (in slide 36).
37
Additional data structures required for recovery
using deferred update
 Active list – contains the list of transactions that are
yet to commit.
 Commit list – contains list of committed transactions
since the last checkpoint.
 During recovery all transactions in the commit list are
redone.
 And all transactions in the active list are ignored.

38
Recovery technique based on immediate update

 This scheme is known as Undo/No-redo algorithm.


 In this scheme, AFIM of data item is flushed to the
disk before transaction commits.
 Before writing AFIM of data item, the log records are
written to disk. (WAL protocol)
 Recovery manager needs to perform undo
operations only for transactions whose commit
record is NOT found in the log.

39
Shadow paging

This recovery scheme does not require the use of a log in a single-user
environment. In a multiuser environment, a log may be needed for the
concurrency control method.
Shadow paging considers the database to be made up of a number of
fixed size disk pages (or disk blocks)—say, n—for recovery purposes.
A directory with n entries is constructed, where the ith entry points to
the ith database page on disk. The directory is kept in main memory if it
is not too large, and all references—reads or writes—to database
pages on disk go through it.
When a transaction begins executing, the current directory — whose
entries point to the most recent or current database pages on disk —
is copied into a shadow directory. The shadow directory is then saved
on disk while the current directory is used by the transaction.

40
Shadow paging

During transaction execution, the shadow directory is


never modified. When a write_item operation is
performed, a new copy of the modified database page is
created, but the old copy of that page is not overwritten.
Instead, the new page is written elsewhere—on some
previously unused disk block. The current directory entry
is modified to point to the new disk block, whereas the
shadow directory is not modified and continues to point
to the old unmodified disk block. For pages updated by
the transaction, two versions are kept. The old version is
referenced by the shadow directory and the new version
by the current directory.

41
Shadow paging

To recover from a failure during transaction execution, it is sufficient to


free the modified database pages and to discard the current directory.
The state of the database before transaction execution is available
through the shadow directory, and that state is recovered by
reinstating the shadow directory. The database thus is returned to its
state prior to the transaction that was executing when the crash
occurred, and any modified pages are discarded. Committing a
transaction corresponds to discarding the previous shadow directory.
Since recovery involves neither undoing nor redoing data items, this
technique can be categorized as a
No-undo/No-redo technique for recovery.
In a multiuser environment with concurrent transactions, logs and
checkpoints must be incorporated into the shadow paging technique.

42
Shadow paging (an example)

43
ARIES recovery algorithm

 ARIES
 Algorithm for Recovery and Isolation Exploiting Semantics.
 ARIES uses Steal/No-force approach for writing.
 ARIES is based on:
 Write ahead logging (WAL)
 Repeating history during redo
 Logging changes during undo

44
Repeating history during redo

 ARIES performs Redo operation reading the log in the


forward direction.

 ARIES uses fuzzy check pointing.

45
Logging changes during undo

 ARIES reads the log in the backward direction to


undo changes made by transactions whose commit
record was not found on the disk log when system
crashed.
 During undo, changes made on the database are
logged again so as to avoid repeated undoing of
changes on the same data item.
 Log record written for undo is called compensating
log record (CLR).
 In case the system crashes during recovery data
items undone so far, need not be undone again.

46
ARIES

 ARIES consists of three phases:


 Analysis:
During analysis ARIES identifies
i) Dirty (modified) pages on the buffer.
ii) Active (uncommitted) transactions at the time of crash.
iii) Appropriate point on the disk log table where redo needs to
start from.
 Redo:
Necessary redo operations are applied reading the log in the forward
direction.
 Undo:
Log is scanned in the backward direction and
changes made by the active transactions at the time of crash are
undone.
Recovery completes at the end of undo phase.

47
Data structure required in ARIES

 Log table

 Transaction table (TT)

 Dirty page table (DPT)

48
Log table, TT & DPT

Log table at system crash

TT & DPT at checkpoint 49


Transaction table & Dirty page table
after analysis phase

Log table at system crash

50
TT & DPT after Analysis phase
Record in log table

 Log sequence number (LSN)


 Previous LSN of the transaction
 Transaction id
 Type of log record (update, commit, end)
 Page id (Address of the modified disk block)
 BFIM
 AFIM
 Size of data item
 Offset of data item inside the block

51
Transaction table & Dirty page table

 Transaction table (TT) & Dirty page table (DPT) are


written to disk during check pointing for efficient
recovery.

 TT contains transaction id, last Lsn of the transaction


(found in the log table), status of the transaction
(committed / active)

 DPT contains the page id and earliest LSN corresponding


to the earliest update of the page.

52
Check pointing by ARIES

Check pointing does the following:


 Write a [begin checkpoint] record on the log.
 Write an [end checkpoint] record on the log & content of
TT & DPT are appended at the end of the log.
 Write the Lsn of begin checkpoint record to a special file.
 ARIES uses fuzzy check pointing.
 Contents of the DBMS cache do not have to be flushed to
the disk during checkpoint.

53
Analysis phase

 Access the special file to know the location of [begin


checkpoint] record on the log.
 Start scanning the log from [begin checkpoint] record.
 After reaching [end checkpoint] record, read TT & DPT.
 Keep reading the log table.
 If a log record has end record for a transaction T in the log
table then remove the entry for this transaction from TT.
 If a transaction T’, is not there in the TT then make an entry
for it in TT and make also an entry for the page updated by
T’ in DPT, if it is not already there.
 If a transaction is already there in TT then overwrites the
Last_lsn entry in TT by Lsn of the log record.

54
Redo phase

 M = min(DPT.LSN)

 Start scanning the log table forward from LSN = M.


 For each update log record in the log table,
 If the page is not available in the DPT then NO REDO; OR
 If Lsn of the log record (N) < Lsn of the page in the DPT
then NO REDO.
 If neither of these two conditions hold then redo pass
fetches the page from disk, and if the Earliest_Lsn of the
page is less than the LSN of the log record, redo is
performed.

55
Undo phase

 Undo is performed for active transactions only.

 Find out the active transactions with the highest LSN from TT.

 Start from this LSN and move backward in the log table
undoing changes until every action of the set of active
transactions has been undone.

56
Database Recovery
(Summary)
 Types of failure
 Transaction log
 Data update
 Data caching
 Transaction Rollback (undo) & Roll forward (redo)
 Write-ahead logging (WAL) protocol
 Checkpoint
 Steal/No-steal & Force/No-force
 Recovery scheme based on deferred update
 Recovery scheme based on immediate update
 ARIES

57

Vous aimerez peut-être aussi