Vous êtes sur la page 1sur 40

TECHNICAL TERMS

1. Transaction
Transaction comprises a unit of work performed within a database management system (or
similar system) against a database, and treated in a coherent and reliable way independent of
other transactions
2. Commit
A commit is the making of a set of tentative changes permanent. A popular usage is at the end of
a transaction. A commit is an act of committing.
3. Two-phase commit protocol:
Two-phase commit protocol (2PC) is a type of atomic commitment protocol (ACP). It is a
distributed algorithm that coordinates all the processes that participate in a distributed atomic
transaction on whether to commit or abort (roll back) the transaction (it is a specialized type of
consensus protocol)
4. Atomic Commit
An atomic commit is an operation in which a set of distinct changes is applied as a single
operation. If the changes are applied then the atomic commit is said to have succeeded.
5. ACID
ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee that
database transactions are processed reliably. In the context of databases, a single logical
operation on the data is called a transaction.
6. Durability
It means that once a transaction has been committed, it will remain so, even in the event of
power loss, crashes, or errors. In a relational database, for instance, once a group of SQL
statements execute, the results need to be stored permanently (even if the database crashes
immediately thereafter).
7. Consistency
The consistency property ensures that any transaction will bring the database from one valid
state to another. Any data written to the database must be valid according to all defined rules,
including but not limited to constraints, cascades, triggers, and any combination.

8. Save points
Save points are employed either when a game state is too complex to save at any given point or
as a way to manage the difficulty level (e.g. a save point located before a difficult area, or one
single save point before a set of difficult bosses rather than in between).
9. save states
A save state is a form of saved game in arcade and console emulators. A save state is generated
when the emulator stores the contents of random-access memory of an emulated video game to
disk. Save states are comparable to snapshots in hardware virtualization or Hibernation in
computing.
10. Two-phase locking (2PL) is a concurrency control method that guarantees serializabilityIt is
also the name of the resulting set of database transaction schedules (histories).
11. Isolation
Isolation is a property that defines how/when the changes made by one operation become visible
to other concurrent operations. Isolation is one of the ACID (Atomicity, Consistency, Isolation,
and Durability) properties
12. Serializability
Serializability theory provides the formal framework to reason about and analyze Serializability
and its techniques.
13. Conflict-Serializability
It is defined by equivalence to a serial schedule (no overlapping transactions) with the same
transactions, such that both schedules have the same sets of respective chronologically ordered
pairs of conflicting operations (same precedence relations of respective conflicting operations).
14. Timestamp

Timestamp-based concurrency control algorithm is a non-lock concurrency control method. It is


used in some databases to safely handle transactions, using timestamps.

4.1 TRANSACTION CONCEPTS


A transaction is a logical unit of work. It begins with the execution of a BEGIN
TRANSACTION operation and ends with the execution of a COMMIT or ROLLBACK
operation
A Sample Transaction (Pseudo code)

BEGINTRANSACTION
UPDATE ACC123 (BALANCE:
=BALANCE-$100); If any error occurred
THEN GOTOUNDO;
ENDIF;
UPDATE ACC123 (BALANCE:
=BALANCE+$100); If any error occurred
THEN GOTOUNDO;
GOTO FINISH;
UNDO;
ROLLBA
CK; FINISH;
RETURN;

It is not a single atomic operation; it involves two separate updates on the database.

Transaction involves a sequence of database update operation.

The purpose of this transaction is to transform a correct state of database into another
incorrect state, without preserving correctness at all intermediate points.

Transaction management guarantees a correct transaction and maintains the database


in a correct state.

It guarantees that if the transaction executes some updates and then a failure occurs
before the transaction reaches its planned termination, then those updates will be
undone.

Thus the transaction either executes entirely or totally cancelled.

The system component that provides this atomicity is called transaction manager or
transaction processing monitor or TP monitor.

ROLLBACK and COMMIT are key to the way it works.

1. COMMIT:

The COMMIT operation signals successful end of transaction.

It tells the transaction manager that a logical unit of work has been successfully
completed and database is in correct state and the updates can be recorded or saved.

2. ROLLBACK:
a. By contrast, the ROLLBACK operation signals unsuccessful end of transaction.
b. It tells the transaction manager that something has gone wrong, the database might be
in incorrect state and all the updates made by the transaction should be undone.
3. IMPLICIT ROLLBACK:

Explicit ROLLBACK cannot be issued in all cases of transaction failures or errors. So


the system issues implicit ROLLBACK for any transaction failure.

If the transaction does not reach the planned termination then we ROLLBACK the
transaction else it is COMMITTED.

4. MESSAGE HANDLING:
A typical transaction will not only update the database, it will also send some kind of message
back to the end user indicating what has happened.
Example: Transfer done if the COMMIT is reached, or Error transfer not done

5.RECOVERY LOG:

The system maintains a log or journal or disk on which all particular about the updating is
maintained.

The values of before and after updating is also called as before and after images.

This log is used to bring the database to the previous state in case of some undo
operation.

The log consists of two portions.


a. an active or online portion
b. an archive or offline portion.

The online portion is the portion used during normal system operation to record details

of

updates as they are performed and it is normally kept on disk.

When the online portion becomes full, its contents are transferred to the offline portion,
which can be kept on tape.

6. STATEMENT ATOMICITY:
The system should guarantee that individual statement execution must be atomic
7. PROGRAMEXECUTION IS A SEQUENCEOFTRANSACTIONS:

COMMIT and ROLLBACK terminate the transaction, not the application program.

A single program execution will consist of a sequence of several transactions running one
after another.

8. NO NESTED TRANSACTIONS:

An application program can execute a BEGIN TRANSACTION statement only when it


has no transaction currently in progress.

i.e., no transaction has other transactions nested inside itself.

9. CORRECTNESS:

Consistent means not violating any known integrity constraint.

Consistency and correctness of the system should be maintained.

If T is a transaction that transforms the database from state D1 to state D2, and if D1 is
correct, then D2 is correct as well.

10. MULTIPLE ASSIGNMENTS:

Multiple assignments allow any number of individual assignments (i.e., updates) to be


performedsimultaneously.

Example: UPDATE ACC 123 {BALANCE: =


BALANCE-$100} UPDATE ACC 456
{BALANCE: = BALANCE +$100}

Multiple assignments would make the statement atomic.


Current products do not support multiple assignments.

4.2. TRANSACTION RECOVERY

A transaction begins by executing a BEGIN TRANSACTION operation and ends by


executing either a COMMIT or a ROLLBACK operation.

COMMIT establishes a commit point or synch point.

A commit point corresponds to the successful end of a transaction and the database will
be in a correct state.

ROLLBACK rolls the database back to the previous commit point.

There will be several transactions executing in parallel in a database.

When a commit point is established:


1. When a program is committed, the change is made permanent. i.e., they are guaranteed to be
recorded in the database. Prior to the commit point updates are tentative i.e., they can be
subsequently be undone.
2. All database positioning is lost and all tuple locks are released.
Database positioning means at the time of execution each program will typically have
addressability to certain tuples in the database, this addressability is lost at a COMMIT point.

Transactions are not only a unit of work but also unit of recovery.

If a transaction successfully commits, then the system updates will be permanently


recorded in the database, even if the system crashes the very next moment.

If the system crashes before the updates are written physically to the database, the
systems restart procedure will still record those updates in the database.

The values can be discovered from the relevant records in the log.

The log must be physically written before the COMMIT processing can complete. This is
called write-ahead log rule.

The restart procedure helps in recovering any any transactions that completed
successfully but not physically written prior to the crash.

When a commit point is established:


1. When a program is committed, the change is made permanent. i.e., they are guaranteed to be
recorded in the database. Prior to the commit point updates are tentative i.e., they can be
subsequently be undone.
2. All database positioning is lost and all tuple locks are released.

Database positioning means at the time of execution each program will typically have
addressability to certain tuples in the database, this addressability is lost at a COMMIT
point.

Transactions are not only a unit of work but also unit of recovery.

If a transaction successfully commits, then the system updates will be permanently


recorded in the database, even if the system crashes the very next moment.

If the system crashes before the updates are written physically to the database, the
systems restart procedure will still record those updates in the database.

The values can be discovered from the relevant records in the log.

The log must be physically written before the COMMIT processing can complete. This is
called write-ahead log rule.

The restart procedure helps in recovering any any transactions that completed
successfully but not physically written prior to the crash.

Implementation issues

1. Database updates are kept in buffers in main memory and not physically written to disk until
the transaction commits. That way, if the transaction terminates unsuccessfully, there will be no
need to undo any disk updates.
2. Database updates are physically written to the disk after COMMIT operation. That way, if the
system subsequently crashes, there will be no need to redo any disk updates.

If there is no enough disk space then a transaction may steal buffer space from
another transaction. They may also force updates to be written physically at the
time of COMMIT.

Write a head log rule is elaborated as follows:


1. The log record for a given database update must be physically written to the log before that
update is physically written to the database.
2. All other log records for a given transaction must be physically written to the log before the
COMMIT log record for that transaction is physically written to the log.
3. COMMIT processing for a given transaction must not complete until the COMMIT log record
for that transaction is physically written to the log.
4.3. ACID PROPERTIES
ACID stands for Atomicity, Correctness, Isolation and Durability.
ATOMICITY: Transactions are atomic.
Consider the following example
Transaction to transfer $50 from account A to account B:

Read(X), which transfers the data item X from the database to a local buffer
belonging to the transaction that executed the read operation.

Write(X), which transfers the data item X from the local buffer of the transaction that
executed the write back to the database.

Before the execution of transaction Ti the values of accounts A and B are $1000 and
$2000, respectively.

Suppose if the transaction fails due to some power failure, hardware failure and
system error the transaction Ti will not execute successfully.

If the failure happens after the write(A) operation but before the write(B) operation.
The database will have values $950 and $2000 which results in a failure.

The system destroys $50 as a result of failure and leads the system to inconsistent
state.

The basic idea of atomicity is: The database system keeps track of the old values of
any data on which a transaction performs a write, if the transaction does not terminate
successfully then the database system restores the old values.

Atomicity is handled by transaction-management component.

CORRECTNESS/ CONSISTENCY:

Transactions transform a correct state of the database into another correct state,
without necessarily preserving correctness at all intermediate points. In our example
the transaction is in consistent state if the sum of A and B is unchanged by the
execution of transaction.

ISOLATION:

Transactions are isolated from one another.

Even though there are many transactions running concurrently, any given transactions
updates are concealed from all the rest, until that transaction commits.

The database will be temporarily inconsistent while the transaction is in progress.

When the amount is reduced from A and not yet incremented to B. the database will be
inconsistent.

If a second concurrently running transaction reads A and B at this intermediate point and
computes A+B, it will observe an inconsistent value.

If the second transaction performs updates on A and B based on the inconsistent values
that it read, the database will remain inconsistent even after both transactions are
completed.

In order to avoid this problem serial execution of transaction is preferred.

Concurrency control component maintain isolation of transaction.


DURABILITY:

Once a transaction commits, its updates persist in the database, even if there is a
subsequent system crash.

The computer system failure may lead to loss of data in main memory, but data written to
disk are not lost.

Durability is guaranteed by ensuring the following

The updates carried out by the transaction should be written to the disk.

Information stored in the disk should be sufficient to enable the database to reconstruct
the updates when the database system restarts after failure.

Recovery management component is responsible for ensuring durability.

4.4. SYSTEM RECOVERY

The system must be recovered not only from purely local failures such as an individual
transaction, but also from global failures.

A local failure affects only the transaction in which the failure has actually occurred.

A global failure affects all of the transactions in progress at the time of the failure.

The failures fall into two broad categories:


1. System failures (e.g., power outage), which affect all transactions currently in progress
but do not physically damage the database. A system failure is sometimes called a soft crash.
2. Media failures (e.g., head crash on the disk), which cause damage to the database or some
portion of it. A media failure is sometimes called a hard crash.
System failure and recovery

During system failures the contents of main memory is lost.

The transaction at the time of the failure will not be successfully completed, so
transactions must be undone i.e., rolled back when the system restarts.

It is necessary to redo certain transactions at the time of restart that is not successfully
completed prior to the crash but did not manage to get their updates transferred from the
buffers in main memory to the physical database.

Whenever some prescribed number of records has been written to the log the system
automatically takes a checkpoint.
The checkpoint record contains a list of all transactions that were in progress at the time
the checkpoint was taken.

To see how a check point works consider the following

Figure: 4.1 Check Point

A system failure has occurred at time tf.

The most recent checkpoint prior to time tf was taken at time tc.

Transactions of type T1 completed (successfully) prior to time tc.

Transactions of type T2 started prior to time tc and completed (successfully) after time tc
and before time tf.

Transactions of type T3 also started prior to time tc but did not complete by time tf.

Transactions of type T4 started after time tc and completed (successfully) before time tf.

Finally, transactions of type T5 also started after time tc but did not complete by time tf.

The transactions of types T3 and T5must be undone, and transactions of types T2 and T4
must be redone. At restart time, the system first goes through the following procedure.

1. Start with two lists of transactions, the UNDO list and the REDO list.
2. Set the UNDO list equal to the list of all transactions given in the most recent checkpoint
record and the REDO list to empty.
3. Search forward through the log, starting from the checkpoint record.
4. If a BEGIN TRANSACTION log record is found for transaction T, add T to the UNDO
list.
5. If a COMMIT log record is found for transaction T, move T from the UNDO list to the
REDO list.
6. When the end of the log is reached, the UNDO and REDO lists are identified.

The system now works backward through the log, undoing the transactions in the
UNDO list.

Then works forward, redoing the transactions in the REDO list.

Restoring the database to a correct state by redoing work is sometimes called forward
recovery.

Restoring the database to a correct state by undoing work is called backward


recovery.

When all recovery activity is complete, then the system is ready to accept new work.

ARIES

Earlier recovery system performs UNDO before REDO operations.

ARIES scheme performs REDO before UNDO operation.

ARIES operates in three broad phases:

1. Analysis: Build the REDO and UNDO lists.


2. Redo: Start from the log determined in the analysis phase and restore the database to
the state it was in the time of crash.
3. Undo: Undo the effects of transactions that failed to commit.
The name ARIES stands for Algorithms for Recovery and Isolation Exploiting Semantics.

4.5. TWO PHASE COMMIT


Two-phase commit is important whenever a given transaction can interact with several
independent resource managers.
Example,
Consider a transaction running on an IBM mainframe that updates both an IMS database
and a DB2 database. If the transaction completes successfully, then both IMS data and
DB2 data are committed.

Conversely, if the transaction fails, then both the updates must be rolled back.

It is not possible to commit one database update and rollback the other. If done so
the atomicity will not be maintained in the system.

Therefore, the transaction issues a single global or system-wide COMMIT or


ROLLBACK. That COMMIT or ROLLBACK is handled by a system component called
the coordinator.

Coordinators task is to guarantee the resource managers commit or roll back.

It should also guarantee even if the system fails in the middle of the process.

The two-phase commit protocol is responsible for maintaining such a guarantee.

WORKING

Assume that the transaction has completed and a COMMIT is issued. On receiving the
COMMIT request, the coordinator goes through the following two-phase process:

The resource manager should get ready to go either way on the transaction.

The participant in the transaction should record all updates performed during the
transaction from temporary storage to permanent storage.

In order to perform either COMMIT or ROLLBACK as necessary.

Resource manager now replies OK to the coordinator or NOT OK based on the


write operation.

COMMIT:

When the coordinator has received replies from all participants, it takes a decision
regarding the transaction and records it in the physical log.

If all replies were OK, that the decision is commit; if any reply was Not OK,
the decision is rollback.

The coordinator informs its decision to all the participants.

Each participant must then commit or roll back the transaction locally, as instructed by
the coordinator.

If the system fails at some point during the process, the restart procedure looks for the
decision of the coordinator.

If the decision is found then the two phase commit can start processing from where it has
left off.

If the decision is not found then it assumes that the decision is ROLLBACK and the
process can complete appropriately.

If the participants are from several systems like in distributed system, then some
participants should wait for long time for the coordinators decision.

Data communication manager (DC manager) can act as a resource manager in case of a
two-phase commit process.

4.6. SAVE POINTS

Transactions cannot be nested with in another transaction.

Transactions cannot be broken down into smaller sub transactions.

Transactions establish intermediate save points while executing.

If there is a roll back operation executed in the transaction, instead of performing roll
back all the way to the beginning we can roll back to the previous save point.

Save point is not the same as performing a COMMIT, updates made by the transaction
are still not visible to other transaction until the transaction successfully executes a
COMMIT.

4.7. MEDIA RECOVERY

Media recovery is different from transaction and system recovery.

A media failure is a failure such as a disk head crash or a disk controller failure in which
some portion of the database has been physically destroyed.

Recovery from such a failure basically involves reloading or restoring the database from
a backup or dump copy and then using the log.

There is no need to undo transactions that were still in progress at the time of the failure.

The dump portion of that utility is used to make backup copies of the database on
demand.

Such copies can be kept on tape or other archival storage, it is not necessary that they be
on direct access media.

After a media failure, the restore portion of the utility is used to recreate the database
from a specified backup copy.

4.8. SQL FACILITIES FOR RECOVERY

SQL supports transactions and transaction-based recovery.

All executable SQL statements are atomic except CALL and RETURN.

SQL provides BEGIN TRANSACTION, COMMIT, and ROLLBACK, called START


TRANSACTION, COMMIT WORK, and ROLLBACK WORK, respectively.

Syntax for START TRANSACTION:


START TRANSACTION <option comma list>;
The <option comma list> specifies an access mode, an isolation level, or both
The access mode is either READ ONLY or READ WRITE.
o If neither is specified, READ WRITE is assumed. If READ WRITE is specified, the isolation
level must not be READ UNCOMMITTED.

The isolation level takes the form ISOLATION LEVEL <isolation>, where <isolation>
can be READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, or
SERIALIZABLE.

The syntax for COMMIT and ROLLBACK is:

COMMIT [WORK] [AND [NO] CHAIN];


ROLLBACK [WORK] [AND [NO] CHAIN]; AND CHAIN causes a START TRANSACTION
to be executed automatically after the COMMIT; AND NO CHAIN is the default.

A CLOSE is executed automatically for every open cursor except for the cursors declared
WITH HOLD.

A cursor declared WITH HOLD is not automatically closed at COMMIT

SQL also supports save points.

Syntax: SAVEPOINT <save point name>;


This syntax creates a save point with the specified user-chosen name.
Syntax for roll back: ROLLBACK TO <save point name>;
This statement undoes all updates done since the specified save point.
Syntax for releasing save points: RELEASE<save point name>;
This statement drops the specified save point. All save points are automatically dropped
at transaction termination.
4.9 CONCURRENCY CONTROL - TYPES OF LOCKS
The concurrency-control schemes that we discuss in this chapter are all based on the
serializability property. That is, all the schemes presented here ensure that the schedules are
serializable. One way
to ensure serializability is to require that data items be accessed in a mutually exclusive manner;
that is, while one transaction is accessing a data item, no other transaction can modify that data
item. The most common method used to implement this requirement is to allow a transaction to
access a data item only if it is currently holding a lock on that item.
4.10 NEED FOR CONCURRENCY CONTROL
Transaction-processing systems usually allow multiple transactions to run concurrently.
Allowing multiple transactions to update data concurrently causes several complications with
consistency of the data, as we saw earlier. Ensuring consistency in spite of concurrent execution
of transactions requires extra work; it is far easier to insist that transactions run serially-that is,
one at a time, each starting only after the previous one has completed. However, there are two
good reasons for allowing concurrency:
IMPROVED THROUGHPUT AND RESOURCE UTILIZATION
A transaction consists of many steps. Some involve I/O activity; others involve CPU
activity. The CPU and the disks in a computer system can operate in parallel. Therefore, I/O
activity can be done in parallel with processing at the CPU. The parallelism of the CPU and the
I/O system can therefore be exploited to run multiple transactions in parallel. While a read or
write on behalf of one transaction is in progress on one disk, another transaction can be running
in the CPU, while another disk may be executing a read or write on behalf of a third transaction.
All of this increases the throughput of the system-that is, the number of transactions executed in
a given amount of time.

Correspondingly, the processor and disk utilization also increase; in other words, the
processor and disk spend less time idle, or not performing any useful work.
REDUCED WAITING TIME
There may be a mix of transactions running on a system, some short and some long. If
transactions run serially, a short transaction may have to wait for a preceding long transaction to
complete, which can lead to unpredictable delays in running a transaction. If the transactions are
operating on different parts of the database, it is better to let them run concurrently, sharing the
CPU cycles and disk accesses among them. Concurrent execution reduces the unpredictable
delays in
running transactions. Moreover, it also reduces the average response time: the average time for a
transaction to be completed after it has been submitted.
The motivation for using concurrent execution in a database is essentially the same as the
motivation for using multiprogramming in an operating system. When several transactions run
concurrently, database consistency can be destroyed despite the correctness of each individual
transaction. In this section, we present the concept of schedules to help identify those executions
that are guaranteed to ensure consistency.
The database system must control the interaction among the concurrent transactions to prevent
them from destroying the consistency of the database. It does so through a variety of mechanisms
called concurrency-control schemes. Consider again the simplified banking system of which has
several accounts, and a set of transactions that access and update those accounts. Let T l and T2
be two transactions that transfer funds from one account to another. Transaction T l transfers $50
from account A to account R It is defined as
T1: read(A);
A := A-50;
write(A);
read(B);
B := B + 50;
write(B).
Transaction T2 transfers 10 percent of the balance from account A to account B. It is defined as

T2: read(A);
temp:= A * 0.1;
A :=A - temp;
write(A);
read(B);
B := B + temp;
write(B).

4.11 LOCKING PROTOCOLS


There are various modes in which a data item may be locked. In this section, we restrict our
attention to two modes:

Shared. If a transaction Ti has obtained a shared-mode lock (denoted by S) on item Q,


then Ti can read, but cannot write, Q.

Exclusive. If a transaction Ti has obtained an exclusive-mode lock (denoted by X) on


item Q, then Ti can both read and write Q.

A transaction may be granted a lock on an item if the requested lock is compatible with
locks already held on the item by other transactions

Any number of transactions can hold shared locks on an item, but if any transaction holds
an exclusive on the item no other transaction may hold any lock on the item.

If a lock cannot be granted, the requesting transaction is made to wait till all incompatible
locks held by other transactions have been released. The lock is then granted.
Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)

Locking as above is not sufficient to guarantee serializability if A and B get updated


in-between the read of A and B, the displayed sum would be wrong.If a lock cannot be
granted,
the requesting transaction is made to wait till all incompatible locks held by other
transactions have been released. The lock is then granted.
PITFALLS OF LOCK-BASED PROTOCOLS
Consider the partial schedule
Table: 4.1 Lock based protocols
T3

T4

Lock-x(B)
Read(B)
B:=B-50
Write(B)
Lock-S(A)
Read(A)
Lock-S(B)
Lock-x(A)

Neither T3 nor T4 can make progress executing lock-S(B) causes T4 to wait for T3 to release
its lock on B, while executing lock-X(A) causes T3 to wait for T4 to release its lock on A. Such a
situation is called a deadlock. To handle a deadlock one of T3 or T4 must be rolled back and its
locks released. The potential for deadlock exists in most locking protocols. Deadlocks are a
necessary evil. Starvation is also possible if concurrency control manager is badly designed. For
example:

A transaction may be waiting for an X-lock on an item, while a sequence of other


transactions request and are granted an S-lock on the same item.

The same transaction is repeatedly rolled back due to deadlocks.

Concurrency control manager can be designed to prevent starvation

4.12 TWO PHASES LOCKING


This is a protocol which ensures conflict-serializable schedules.

Phase 1: Growing Phase

transaction may obtain locks

transaction may not release locks

Phase 2: Shrinking Phase

transaction may release locks

transaction may not obtain locks

The protocol assures serializability.

It can be proved that the transactions can be serialized in the order of their lock points (i.e.
the point where a transaction acquired its final lock).
Two-phase locking does not ensure freedom from deadlocks cascading roll-back is possible
under two-phase locking. To avoid this, follow a modified protocol called strict two-phase
locking. Here a transaction must hold all its exclusive locks till it commits/aborts. Rigorous
two-phase locking is even stricter: here all locks are held till commit/abort. In this protocol
transactions can be serialized in the order in which they commit. There can be conflict
serializable schedules that cannot be obtained if two-phase locking is used. However, in the
absence of extra information (e.g., ordering of access to data), two-phase locking is needed
for conflict serializability in the following sense: Given a transaction Ti that does not follow
two-phase locking, we can find a transaction Tj that uses two-phase locking, and a schedule
for Ti and Tj that is not conflict serializable.

LOCK CONVERSIONS
Two-phase locking with lock conversions:

First Phase:

can acquire a lock-S on item

can acquire a lock-X on item

can convert a lock-S to a lock-X (upgrade)

Second Phase:

can release a lock-S

can release a lock-X

can convert a lock-X to a lock-S (downgrade)

This protocol assures serializability. But still relies on the programmer to insert the various
locking instructions.
Automatic Acquisition of Lock
A transaction Ti issues the standard read/write instruction, without explicit locking calls. The
operation read(D) is processed as:
if Ti has a lock on D
then
read(D)
else
begin
if necessary wait until no other transaction has a lock-X on D
grant Ti a lock-S on D;
read(D)
end
write(D) is processed as:
if Ti has a lock-X on D
then
write(D)
else
begin
if necessary wait until no other trans. has any lock on D,
if Ti has a lock-S on D

then
upgrade lock on D to lock-X
else
grant Ti a lock-X on D
write(D)
end;
All locks are released after commit or abort.
IMPLEMENTATION OF LOCKING
A Lock manager can be implemented as a separate process to which transactions send lock
and unlock requests

The lock manager replies to a lock request by sending a lock grant messages (or a
message asking the transaction to roll back, in case of a deadlock).

The requesting transaction waits until its request is answered! The lock manager
maintains a data structure called a lock table to record granted locks and pending
requests.

The lock table is usually implemented as an in-memory hash table indexed on the name
of the data item being locked.

LOCK TABLE
Black rectangles indicate granted locks, white ones indicate waiting requests. Lock table also
records the type of lock granted or requested. New request is added to the end of the queue of
requests

Figure:4.2 Lock table

For the data item, and granted if it is compatible with all earlier locks. Unlock requests result
in the request being deleted, and later requests are checked to see if they can now be granted.
If transaction aborts, all waiting or granted requests of the transaction are deleted lock
manager may keep a list of locks held by each transaction, to implement this efficiently.
GRAPH-BASED PROTOCOLS
Graph-based protocols are an alternative to two-phase locking. The impose a partial ordering
on the set D = {d1, d2 ,..., dh} of all data items. If di dj then any transaction accessing
both di
and dj must access di before accessing dj. This implies that the set D may now be viewed as a
directed acyclic graph, called a database graph. The tree-protocol is a simple kind of graph
protocol. Only exclusive locks are allowed.
The first lock by Ti may be on any data item. Subsequently, a data Q can be locked by Ti only
if the parent of Q is currently locked by Ti. Data items may be unlocked at any time. The tree
protocol ensures conflict serializability as well as freedom from deadlock. Unlocking may
occur earlier in the tree-locking protocol than in the two-phase locking protocol.

shorter waiting times, and increase in concurrency

protocol is deadlock-free, no rollbacks are required

the abort of a transaction can still lead to cascading rollbacks.

However, in the tree-locking protocol, a transaction may have to lock data items that it does
not access.

increased locking overhead, and additional waiting time

potential decrease in concurrency

Schedules not possible under two-phase locking are possible under tree protocol, and vice versa.
4.13

INTENT LOCKING
The locking protocols that we have described thus far determine the order between every pair of
conflicting transactions at execution time by the first lock that both members of the pair request
that involves incompatible modes. Another method for determining the serializability order is to
select an

ordering among transactions in advance. The most common method for doing so is to use a
timestamp-ordering scheme.
TIMESTAMPS
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by
TS(Ti). This timestamp is assigned by the database system before the transaction Ti starts
execution. If a
transaction Ti has been assigned timestamp TS(Ti), and a new transaction Tj enters the system,
then TS(Ti) < TS(Tj ). There are two simple methods for implementing this scheme:
1. Use the value of the system clock as the timestamp; that is, a transactions timestamp
is equal to the value of the clock when the transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been assigned;
that is, a transactions timestamp is equal to the value of the counter when the
transaction enters the system.

The timestamps of the transactions determine the serializability order. Thus, if TS(Ti) < TS(Tj ),
then the system must ensure that the produced schedule is equivalent to a serial schedule in
which transaction Ti appears before transaction Tj . To implement this scheme, we associate with
each data item Q two timestamp values:

W-timestamp(Q) denotes the largest timestamp of any transaction that executed


write(Q) successfully.

R-timestamp(Q) denotes the largest timestamp of any transaction that executed


read(Q) successfully.

These timestamps are updated whenever a new read(Q) or write(Q) instruction is executed.The
timestamp ordering protocol ensures that any conflicting read and write operations are executed
in timestamp order. Suppose a transaction Ti issues a read(Q)

If TS(Ti) W-timestamp(Q), then Ti needs to read a value of Q that was already


overwritten. Hence, the read operation is rejected, and Ti is rolled back.

If TS(Ti) W-timestamp(Q), then the read operation is executed, and Rtimestamp(Q) is set to the maximum of Rtimestamp(Q) and TS(Ti).

4.14

DEADLOCK HANDLING
A system is in a deadlock state if there exists a set of transactions such that every transaction in
the set is waiting for another transaction in the set. More precisely, there exists a set of waiting
transactions {T0, T1, . . ., Tn} such that T0 is waiting for a data item that T1 holds, and T1 is
waiting
for a data item that T2 holds, and . . ., and Tn1 is waiting for a data item that Tn holds, and Tn
is waiting for a data item that T0 holds. None of the transactions can make progress in such a
situation.
The only remedy to this undesirable situation is for the system to invoke some drastic action,
such as rolling back some of the transactions involved in the deadlock. Rollback of a transaction
may be partial: That is, a transaction may be rolled back to the point where it obtained a lock
whose release resolves the deadlock.
There are two principal methods for dealing with the deadlock problem. We can use a deadlock
prevention protocol to ensure that the system will never enter a deadlock state. Alternatively, we

can allow the system to enter a deadlock state, and then try to recover by using a deadlock
detection and deadlock recovery scheme. As we shall see, both methods may result in
transaction rollback. Prevention is commonly used if the probability that the system would enter
a deadlock state is relatively high; otherwise, detection and recovery are more efficient. Note that
a detection and recovery scheme requires overhead that includes not only the run-time cost of
maintaining the necessary information and of executing the detection algorithm, but also the
potential losses inherent in recovery from a deadlock.
DEADLOCK PREVENTION
There are two approaches to deadlock prevention. One approach ensures that no cyclic waits can
occur by ordering the requests for locks, or requiring all locks to be acquired together. The

simplest scheme under the first approach requires that each transaction locks all its data items
before it begins execution.Moreover, either all are locked in one step or none are locked. There
are two main disadvantages to this protocol:

it is often hard to predict, before the transaction begins, what data items need to be
locked;

Data-item utilization may be very low, since many of the data items may be locked but
unused for a long time.

Another approach for preventing deadlocks is to impose an ordering of all data items, and to
require that a transaction lock data items only in a sequence consistent with the ordering. We
have seen one such scheme in the tree protocol, which uses a partial ordering of data items. A
variation of this approach is to use a total order of data items, in conjunction with two-phase
locking. Once a transaction has locked a particular item, it cannot to implement, as long as the
set of data items accessed by a transaction is known when the transaction starts execution. There
is no need to change the underlying concurrency-control system if two-phase locking is used: All
that is needed it to ensure that locks are requested in the right order.
The second approach for preventing deadlocks is to use preemption and transaction
rollbacks. In preemption, when a transaction T2 requests a lock that transaction T1 holds, the
lock granted to T1 may be preempted by rolling back of T1, and granting of the lock to T2. To
control

the preemption, we assign a unique timestamp to each transaction. The system uses these
timestamps only to decide whether a transaction should wait or roll back. Locking is still used
for concurrency control. If a transaction is rolled back, it retains its old timestamp when
restarted. Two different deadlock prevention schemes using timestamps have been proposed:
1. The waitdie scheme is a non-preemptive technique. When transaction Ti requests a data
item currently held by Tj , Ti is allowed to wait only if it has a timestamp smaller than that of
Tj (that is, Ti is older than Tj ). Otherwise, Ti is rolled back (dies). For example, suppose that
transactions T22, T23, and T24 have timestamps 5, 10, and 15, respectively. If T22 requests a

data item held by T23, then T22 will wait. If T24 requests a data item held by T23, then T24
will be rolled back.
2. The woundwait scheme is a preemptive technique. It is a counterpart to the waitdie
scheme. When transaction Ti requests a data item currently held by Tj , Ti is allowed to wait
only if it has
a timestamp larger than that of Tj (that is, Ti is younger than Tj ). Otherwise, Tj is rolled back
(Tj is wounded by Ti). Returning to our example, with transactions T22, T23, and T24, if T22
requests a data item held by T23, then the data item will be preempted from T23, and T23
will be rolled back. If T24 requests a data item held by T23, then T24 will wait.
Whenever the system rolls back transactions, it is important to ensure that there is no
starvationthat is, no transaction gets rolled back repeatedly and is never allowed to make
progress. Both the woundwait and the waitdie schemes avoid starvation: At any time, there
is
a transaction with the smallest timestamp. This transaction cannot be required to roll back in
either scheme. Since timestamps always increase, and since transactions are not assigned new
timestamps when they are rolled back, a transaction that is rolled back repeatedly will
eventually have the smallest timestamp, at which point it will not be rolled back again. There
are, however, significant differences in the way that the two schemes operate. In the waitdie
scheme, an older transaction must wait for a younger one to release its data item. Thus, the
older the transaction gets, the more it tends to wait. By contrast, in the woundwait scheme,
an older transaction never
waits for a younger transaction. In the waitdie scheme, if a transaction Ti dies and is rolled
back because it requested a data item held by transaction Tj, then Ti may reissue the same
sequence of requests when it is restarted. If the data item is still held by Tj , then Ti will die

again. Thus, Ti may die several times before acquiring the needed data item. Contrast this
series of events with what happens in the woundwait scheme. Transaction Ti is wounded
and rolled

back because Tj requested a data item that it holds. When Ti is restarted and requests the data
item now being held by Tj , Ti waits. Thus, there may be fewer rollbacks in the woundwait
scheme. The major problem with both of these schemes is that unnecessary rollbacks may
occur.
Another simple approach to deadlock handling is based on lock timeouts. In this approach, a
transaction that has requested a lock waits for at most a specified amount of time. If the lock
has not been granted within that time, the transaction is said to time out, and it rolls itself
back and restarts. If there was in fact a deadlock, one or more transactions involved in the
deadlock will time out and roll back, allowing the others to proceed. This scheme falls
somewhere between deadlock prevention, where a deadlock will never occur, and deadlock
detection and recovery. The timeout scheme is particularly easy to implement, and works
well if transactions are short and if long waits are likely to be due to deadlocks. However, in
general it is hard to decide how long a transaction must wait before timing out. Too long a
wait results in unnecessary delays once a deadlock has occurred. Too short a wait results in
transaction rollback even when there is
no deadlock, leading to wasted resources. Starvation is also a possibility with this scheme.
Hence, the timeout-based scheme has limited applicability.
DEADLOCK DETECTION AND RECOVERY
If a system does not employ some protocol that ensures deadlock freedom, then a detection and
recovery scheme must be used. An algorithm that examines the state of the system is invoked
periodically to determine whether a deadlock has occurred. If one has, then the system must
attempt to recover from the deadlock. To do so, the system must:

Provide an algorithm that uses this information to determine whether the system has
entered a deadlock state.

Recover from the deadlock when the detection algorithm determines that a deadlock
exists.

Deadlocks can be described as a wait-for graph, which consists of a pair G = (V,E), V is a set of
vertices (all the transactions in the system). E is a set of edges; each element is an ordered pair Ti
Tj.

If Ti Tj is in E, then there is a directed edge from Ti to Tj, implying that Ti is waiting


for Tj to release a data item.

When Ti requests a data item currently being held by Tj, then the edge Ti Tj is inserted in
the wait-for graph.

This edge is removed only when Tj is no longer holding a data item needed by Ti.

The system is in a deadlock state if and only if the wait-for graph has a cycle. It must invoke
a deadlock-detection algorithm periodically to look for cycles

Figure: 4.3 Wait-for graph without a cycle

Figure: 4.4 Wait-for graph with a cycle

When deadlock is detected, some transaction will have to rolled back (made a victim) to
break deadlock. Select that transaction as victim that will incur minimum cost. Rollback is done
by determining how far to roll back transaction. A total rollback will abort the transaction and
then restart it. It is more effective to roll back transaction only as far as necessary to break
deadlock.
Starvation happens if same transaction is always chosen as victim. We can include the number
of rollbacks in the cost factor to avoid starvation.
INSERT AND DELETE OPERATIONS
If two-phase locking is used, A delete operation may be performed only if the transaction
deleting the tuple has an exclusive lock on the tuple to be deleted. A transaction that inserts a
new tuple into the database is given an X-mode lock on the tuple. Insertions and deletions can
lead to the phantom phenomenon. A transaction that scans a relation (e.g., find all accounts in
Perryridge) and a transaction that inserts a tuple in the relation (e.g., insert a new account at
Perryridge) may conflict in spite of not accessing any tuple in common. If only tuple locks are
used, non-serializable schedules can result: the scan transaction may not see the new account, yet

may be serialized before the insert transaction. The transaction scanning the relation is reading
information that indicates what

tuples the relation contains, while a transaction inserting a tuple updates the same information.
The information should be locked.
One solution could be as follows:

Associate a data item with the relation, to represent the information about what
tuples the relation contains.

Transactions scanning the relation acquire a shared lock in the data item,

Transactions inserting or deleting a tuple acquire an exclusive lock on the data


item. (Note: locks on the data item do not conflict with locks on individual
tuples.)

The above protocol provides very low concurrency for insertions/deletions. Index locking
protocols provide higher concurrency while preventing the phantom phenomenon, by requiring
locks on certain index buckets.
INDEX LOCKING PROTOCOL
Every relation must have at least one index. Access to a relation must be made only through one
of the indices on the relation. A transaction Ti that performs a lookup must lock all the index
buckets that it accesses, in S-mode. A transaction Ti may not insert a tuple ti into a relation r
without updating all indices to r. Ti must perform a lookup on every index to find all index
buckets that could have possibly contained a pointer to tuple ti, had it existed already, and obtain
locks in X-mode on all these index buckets. Ti must also obtain locks in X-mode on all index
buckets that it modifies. The rules of the two-phase locking protocol must be observed.
WEAK LEVELS OF CONSISTENCY
Serializability is a useful concept because it allows programmers to ignore issues related to
concurrency when they code transactions. If every transaction has the property that it maintains
database consistency if executed alone, then serializability ensures that concurrent executions
maintain consistency. However, the protocols required to ensure serializability may allow too
little concurrency for certain applications. In these cases, weaker levels of consistency are used.

The use of weaker levels of consistency places additional burdens on programmers for ensuring
database correctness.
The purpose of degree-two consistency is to avoid cascading aborts without necessarily
ensuring serializability. The locking protocol for degree-two consistency uses the same two lock
modes that
we used for the two-phase locking protocol: shared (S) and exclusive (X). A transaction must
hold the appropriate lock mode when it accesses a data item. In contrast to the situation in twophase locking, S-locks may be released at any time, and locks may be acquired at any time.
Exclusive locks cannot be released until the transaction either commits or aborts. Serializability
is not ensured
by this protocol. Indeed, a transaction may read the same data item twice and obtain different
results. In Figure, T3 reads the value of Q before and after that value is written by T4. The
potential for inconsistency due to nonserializable schedules under degree-two consistency makes
this approach undesirable for many applications.
CURSOR STABILITY is a form of degree-two consistency designed for programs written in
host languages, which iterate over tuples of a relation by using cursors. Instead of locking the
entire relation, cursor stability ensures that

The tuple that is currently being processed by the iteration is locked in shared mode.

Any modified tuples are locked in exclusive mode until the transaction commits.

These rules ensure that degree-two consistency is obtained. Two-phase locking is not required.
Serializability is not guaranteed. Cursor stability is used in practice on heavily accessed relations
as a means of increasing concurrency and improving system performance. Applications that use
cursor stability must be coded in a way that ensures database consistency despite the possibility
of nonserializable schedules. Thus, the use of cursor stability is limited to specialized situations
with simple consistency constraints.SQL allows non-serializable executions. The levels of
consistency specified by SQL-92 are as follows:
1. Serializable: is the default
2. Repeatable read: allows only committed records to be read, and repeating a read
should return the same value (so read locks should be retained) - However, the

Phantom phenomenon need not be prevented. T1 may see some records inserted by
T2, but may not see others inserted by T2
3. Read committed: same as degree two consistency, but most systems implement it as
cursor-stability
4.

4.15

Read uncommitted: allows even uncommitted data to be read

SERIALIZABILITY AND SCHEDULES


The database system must control concurrent execution of transactions, to ensure that the
database state remains consistent. Before we examine how the database system can carry out this
task, we must first understand which schedules will ensure consistency, and which schedules will
not. Since transactions are programs, it is computationally difficult to determine exactly what
operations a transaction performs and how operations of various transactions interact.For this
reason,
we shall not interpret the type of operations that a transaction can perform on a data item.
Instead, we consider only two operations: read and write. We thus assume that, between a read
(Q) instruction and a write (Q) instruction on a data item Q, a transaction may perform an
arbitrary sequence of operations on the copy of Q that is residing in the local buffer of the
transaction. Thus, the only significant operations of a transaction, from a scheduling point of
view, are its read and write instructions.
We shall therefore usually show only read and write instructions in schedules, as we do in
schedule 3. In this section, we discuss different forms of schedule equivalence; they lead to the
notions of conflict serializability and view serializability.
CONFLICT SERIALIZABILITY
Let us consider a schedule 5 in which there are two consecutive instructions Ii and Ij, of
transactions Ti and Tj, respectively (i j). If I i and Ij refer to different data items, then we can
swap Ii and Ij without affecting the results of any instruction in the schedule. However, if I i and
Ij refer to the same data item Q, then the order of the two steps may matter. Since we are dealing
with only read and write instructions, there are four cases that we need to consider:

1. Ii = read(Q), Ij = read(Q). The order of Ii and Ij does not matter, since the same value
of Q is read by Ti and Tj, regardless of the order.
2. Ii = read(Q), Ij = write(Q). If Ii comes before Ij, then Ti does not read the value of Q
that is written by Tj in instruction Ij. If Ij comes before Ii, then Ti reads the value of Q
that is written by Tj. Thus, the order of Ii and Ij matters.
3. Ii = write(Q), Ij = read(Q). The order of Ii and Ij matters for reasons similar to those
of the previous case.
4. Ii = write(Q), Ij = write(Q). Since both instructions are write operations, the order of
these instructions does not affect either Ti or Tj. However, the value obtained by the
next read(Q) instruction of 5 is affected, since the result of only the latter of the two
write instructions is preserved in the database. If there is no other write(Q) instruction
5. after Ii and Ij in 5, then the order of Ii and Ij directly affects the final value of Q in the
database state that results from schedule 5.
Thus, only in the case where both Ii and Ij are read instructions does the relative order of their
execution not matter. We say that I i and Ij conflict if they are operations by different transactions
on the same data item, and at least one of these instructions is a write operation.
To illustrate the concept of conflicting instructions, we consider schedule 3. The write(A)
instruction of Tl conflicts with the read(A) instruction of T2. However, the write(A) instruction
of T2 does not conflict with the read(B) instruction of TI, because the two instructions access
different data items. Let Ii and Ij be consecutive instructions of a schedule 5. If I i and Ij are
instructions of different transactions and Ii and Ij do not conflict, then we can swap the order of
!
!
Ii and Ij to produce a new schedule S . We expect S to be equivalent to S , since all instructions
appear in the same order in both schedules except for I i and Ij, whose order does not matter.
Since the write(A) instruction of T2 in schedule 3 of Figure 23.7 does not conflict with the
read(B) instruction of TI, we can swap these instructions to generate an equivalent schedule,

schedule 5. Regardless of the initial system state, schedules 3 and 5 both produce the same final
system state.
We continue to swap non-conflicting instructions:

Swap the read (B) instruction of TI with the read (A) instruction of T2.

Swap the write (B) instruction of TI with the write (A) instruction of T2.

Swap the write(B) instruction of TI with the read(A) instruction of T2.

The final result of these swaps, schedule 6 of Figure, is a serial schedule. Thus, we have shown
that schedule 3 is equivalent to a serial schedule. This equivalence implies that, regardless of the
initial system state, schedule 3 will produce the same final state as will some serial schedule. If a
schedule S can be transformed into a schedule S' by a series of swaps of non-conflicting
instructions, we say that Sand S' are conflict equivalent. In our previous examples, schedule 1 is
not conflict equivalent

to schedule 2. However, schedule 1 is conflict equivalent to schedule 3, Because the read(B) and
write(B) instruction of Tl can be swapped with the read(A) and write(A) instruction of T2. The
concept of conflict equivalence leads to the concept of conflict serializability. We say that a
schedule S is conflict serializable if it is conflict equivalent to a serial schedule. Thus, schedule 3
is conflict serializable, since it is conflict equivalent to the serial schedule 1. Finally, consider
schedule 7 of Figure; it consists of only the significant operations (that is, the read and write) of
transactions T3
and T4. This schedule is not conflict serializable, since it is not equivalent to either the serial
schedule <T3,T4> or the serial schedule
<T4,T3>. It is possible to have two schedules that produce the same outcome, but that
are not conflict equivalent. For example, consider transaction T5, which transfers $10.
from account B to account A. Let schedule 8 be as defined in Figure 42.4. We claim that
schedule 8 is not conflict equivalent to the serial schedule <T I, T5>, since, in schedule 8,
the write (B) instruction of Ts conflicts with the read (B) instruction of T I. Thus, we
cannot move all the instructions of TI before those of T5 by swapping consecutive non-

conflicting instructions. However, the final values of accounts A and B after the execution
of either schedule 8 or the serial schedule <TI,T5> are the same -$960 and $2040,
respectively. We can see from this example that there are less stringent definitions of
schedule equivalence than conflict equivalence. For the system to determine that schedule
8 produces the same outcome as the serial schedule <TI,T5>, it must analyze the
computation performed by TI and T5, rather than just the read and write operations. In
general, such analysis is hard to implement and is computationally expensive. However,
there are other definitions of schedule equivalence based purely on the read and write
operations.
VIEW SERIALIZABILITY
In this section, we consider a form of equivalence that is less stringent than conflict
equivalence, but that, like conflict equivalence, is based on only the read and write operations
of transactions. Consider two schedules S and S', where the same set of transactions
participates in both schedules. The schedules S and S' are said to be view equivalent if three
conditions are met:

1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then
transaction Ti must, in schedule S', also read the initial value of Q.
2. For each data item Q, if transaction Ti executes read(Q) in schedule S, and if that value
was produced by a write(Q) operation executed by transaction Ti, then the read(Q)
operation of transaction Ti must, in schedule S', also read the value of Q that was
produced by the same write(Q) operation of transaction Ti.
3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in
schedule S must perform the final write(Q) operation in schedule S'.
Conditions 1 and 2 ensure that each transaction reads the same values in both schedules and,
therefore, performs the same computation. Condition 3, coupled with conditions 1 and 2, ensures
that both schedules result in the same final system state. In our previous examples, schedule 1 is

not view equivalent to schedule 2, since, in schedule 1, the value of account A read by transaction
T2 was produced by T1 whereas this case does not hold in schedule 2. However, schedule 1 is
view equivalent to schedule 3, because the values of account A and B read by transaction T 2 were
produced by T1 in both schedules.
The concept of view equivalence leads to the concept of view serializability. We say that a
schedule 5 is view serializable if it is view equivalent to a serial schedule. As an illustration,
suppose that we augment schedule 7 with transaction T6, and obtain schedule 9 in Figure 42.5.
Schedule 9 is view serializable. Indeed, it is view equivalent to the serial schedule <T3, T4, T6>,
since the one read(Q) instruction reads the initial value of Q in both schedules, and T6 performs
the final write of Q in both schedules. Every conflict-serializable schedule is also view
serializable, but there are view-serializable schedules that are not conflict serializable. Indeed,
schedule 9 is not conflict serializable, since every pair of consecutive instructions conflicts, and,
thus, no swapping of instructions is possible. Observe that, in schedule 9, transactions T 4 and T6
perform write(Q) operations without having performed a read(Q) operation. Writes of this sort are
called blind writes. Blind writes appear in any view-serializable schedule that is not conflict
serializable.

5.16

SCHEDULE AND RECOVERABILITY


So far, we have studied what schedules are acceptable from the viewpoint of consistency
of the database, assuming implicitly that there are no transaction failures. We now address the
effect of transaction failures during concurrent execution. If a transaction Ti fails, for whatever
reason, we need to undo the effect of this transaction to ensure the atomicity property of the
transaction. In a system that allows concurrent execution, it is necessary also to ensure that any
transaction Tj that is dependent on Ti (that is, Tj has read data written by Ti) is also aborted. To
achieve this surety, we need to place restrictions on the type of schedules permitted in the
system.
RECOVERABLE SCHEDULES
Consider schedule 11 in Figure, in which T9 is a transaction that performs only one instruction:
read(A). Suppose that the system allows T9 to commit immediately after executing the read(A)

instruction. Thus, T9 commits before T8 does. Now suppose that T8 fails before it commits.
Since T9 has read the value of data item A written by T8, we must abort T9 to ensure transaction
atomicity. However, T9 has already committed and cannot be aborted. Thus, we have a situation
where it is impossible to recover correctly from the failure of T8. Schedule 11, with the commit
happening immediately after the read(A) instruction, is an example of a nonrecoverable
schedule, which should not be allowed. Most database system require that all schedules be
recoverable. A recoverable schedule is one where, for each pair of transactions Ti and Tj such
that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the
commit operation of Tj .
CASCADELESS SCHEDULES
Even if a schedule is recoverable, to recover correctly from the failure of a transaction Ti, we
may have to roll back several transactions. Such situations occur if transactions have read data
written by Ti. As an illustration, consider the partial schedule of Figure. Transaction T10 writes a
value of A that is read by transaction T11. Transaction T11 writes a value of A that is read by
transaction T12. Suppose that, at this point, T10 fails. T10 must be rolled back. Since T11 is
dependent on T10, T11 must be rolled back. Since T12 is dependent on T11, T12 must be rolled
back. This phenomenon, in which a single transaction failure leads to a series of transaction
rollbacks, is called cascading rollback.
Cascading rollback is undesirable, since it leads to the undoing of a significant amount of work.
It is desirable to restrict the schedules to those where cascading rollbacks cannot occur. Such
schedules
are called cascadeless schedules. Formally, a cascadeless schedule is one where, for each pair of
transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit
operation of Ti appears before the read operation of Tj . It is easy to verify that every cascadeless
schedule is also recoverable.
IMPLEMENTATION OF ISOLATION
So far, we have seen what properties a schedule must have if it is to leave the database in a
consistent state and allow transaction failures to be handled in a safe manner. Specifically,
schedules

that are conflict or view serializable and cascadeless satisfy these requirements. There are
various concurrency-control schemes that we can use to ensure that, even when multiple
transactions are executed concurrently, only acceptable schedules are generated, regardless of
how the operating-system time-shares resources (such as CPU time) among the transactions.
As a trivial example of a concurrency-control scheme, consider this scheme: A transaction
acquires a lock on the entire database before it starts and releases the lock after it has committed.
While a transaction holds a lock, no other transaction is allowed to acquire the lock and all must
therefore wait for the lock to be released. As a result of the locking policy, only one transaction
can execute at a time. Therefore, only serial schedules are generated. These are trivially
serializable, and it is easy to verify that they are cascadeless as well. A concurrency-control
scheme such as this one leads to poor performance, since it forces transactions to wait for
preceding transactions to finish before they can start. In other words, it provides a poor degree of
concurrency.

The goal of concurrency-control schemes is to provide a high degree of

concurrency, while ensuring that all schedules that can be generated are conflict or view
serializable, and are cascadeless.
The schemes have different trade-offs in terms of the amount of concurrency they allow and the
amount of overhead that they incur. Some of them allow only conflict serializable schedules to
be generated; others allow certain view-serializable schedules that are not conflict-serializable to
be generated.
TRANSACTION DEFINITION IN SQL
A data-manipulation language must include a construct for specifying the set of actions that
constitute a transaction. The SQL standard specifies that a transaction begins implicitly.
Transactions are ended by one of these SQL statements:

Commit work commits the current transaction and begins a new one.

Rollback work causes the current transaction to abort.

The keyword work is optional in both the statements. If a program terminates without either of
these commands, the updates are either committed or rolled back which of the two happens is not
specified by the standard and depends on the implementation. The standard also specifies that the
system must ensure both serializability and freedom from cascading rollback. The definition of
serializability used by the standard is that a schedule must have the same effect as would some

serial schedule. Thus, conflict and view serializability are both acceptable. The SQL.92 standard
also
allows a transaction to specify that it may be executed in a manner that causes it to become
nonserializable with respect to other transactions.
TESTING FOR SERIALIZABILITY
When designing concurrency control schemes, we must show that schedules generated by the
scheme are serializable. To do that, we must first understand how to determine, given a particular
schedule S, whether the schedule is serializable. We now present a simple and efficient method
for determining conflict serializability of a schedule. Consider a schedule S. We construct a
directed graph, called a precedence graph, from S. This graph consists of a pair G = (V, E),
where V is a set of vertices and E is a set of edges. The set of vertices consists of all the
transactions participating in the schedule. The set of edges consists of all edges Ti Tj for which
one of three conditions holds:
1. Ti executes write(Q) before Tj executes read(Q)
2. Ti executes read(Q) before Tj executes write(Q)
3. Ti executes write(Q) before Tj executes write(Q)
Figure shows precedence graph for schedule 1 and 2.
If an edge Ti Tj exists in the precedence graph, then, in any serial schedule S equivalent to S,
Ti must appear before Tj . For example, the precedence graph for schedule 1 in (a) contains the
single edge T1 T2, since all the instructions of T1 are executed before the first instruction of
T2 is executed. Similarly, Figure (b) shows the precedence graph for schedule 2 with the single
edge T2 T1, since all the instructions of T2 are executed before the first instruction of T1 is
executed. The precedence graph for schedule 4 appears in Figure. It contains the edge T1 T2,
because T1 executes read(A) before T2 executes write( A). It also contains the edge T2 T1,
because T2 executes read(B) before T1 executes write(B).
If the precedence graph for S has a cycle, then schedule S is not conflict serializable. If the graph
contains no cycles, then the schedule S is conflict serializable. A serializability order of the
transactions can be obtained through topological sorting, which determines a linear order
consistent with the partial order of the precedence graph. There are, in general, several possible
linear orders that can be obtained through a topological sorting. For example, the graph of Figure

(a) has the two acceptable linear orderings shown in Figures (b) and (c). Thus, to test for conflict
serializability, we need to construct the precedence graph and to invoke a cycle-detection
algorithm. Cycle-detection algorithms can be found in standard textbooks on algorithms. Cycledetection algorithms, such as
those based on depth-first search, require on the order of n2 operations, where n is the number of
vertices in the graph (that is, the number of transactions). Thus, we have a practical scheme for
determining conflict serializability. Returning to our previous examples, note that the precedence
graphs for schedules 1 and 2 indeed do not contain cycles. The precedence graph for schedule 4,
on the other hand, contains a cycle, indicating that this schedule is not conflict serializable.
Testing for view serializability is rather complicated. In fact, it has been shown that the problem
of testing for view serializability is itself NP-complete. Thus, almost certainly there exists no
efficient algorithm to test for view serializability.

Vous aimerez peut-être aussi