Vous êtes sur la page 1sur 14

Transaction Management and Concurrency control

Definition;
Transaction – refers to an action or series of actions carried out by a single user or
application program which accesses or changes the contents of the database.

It is a logical unit of work on the database.

A transaction can have one of two outcomes;


1. If it completes successfully, it is said to have committed and the database reaches
a new consistent state.
2. On the other hand, if the transaction does not execute successfully, it is said to
have aborted and the database is restored to its original consistent state before the
concerned transaction started.

Such a transaction is said to have rolled back or undone. A committed transaction


cannot be aborted.

If a transaction is committed by mistake, then another one has to be initiated to reverse


the previous transaction’s effects. This new transaction is sometimes called a
compensating transaction.

Most DBMSs do not have an inbuilt way of determining which actions are grouped
together to form a single transaction.

A way of circumventing this is to provide key indicators for these important boundaries.
Keywords like BEGIN, END, COMMIT and/or ROLLBACK (or their equivalent) are
used.
If these delimiters are not used, the entire program is usually regarded as a single
transaction, with the DBMS performing an automatic COMMIT when the program
terminates correctly and a ROLLBACK if it does not.

Properties of transactions
1. Atomicity (the “all or nothing property”)
A transaction is an indivisible unit that is either performed in its entirety or not performed
at all.

2. Consistency
A transaction must transform the database from one consistent state to another consistent
state.

3. Isolation
Transactions execute independently of one another. This means that any partial effects of
one incomplete transaction are not “visible” to other transactions.
Transaction management & concurrency control

4. Durability
The effects of a successfully completed (committed) transaction are permanently
recorded in the database and must not be lost as a result of a subsequent failure.

If a failure occurs during a transaction, then the database could be inconsistent. Different
DBMSs provide for different ways of avoiding this. Oracle, for instance has the
following provisions;

Use of the RECOVERY MANAGER (RMAN) to ensure the database is restored to the state
it was in before the start of the transaction, and therefore a consistent state.

The BUFFER MANAGER is responsible for the transfer of data between disk storage
and main memory.

Concurrency Control
Refers to the process of managing simultaneous operations on the database without
having them interfere with each other.

A key objective of developing a database is to enable many users to access shared data
concurrently.

Concurrent access is relatively easy if all users are only reading; as there is no way that
they can interfere with one another.
Alternatively, when two or more transactions are accessing the database simultaneously
and at least one is updating data, there may be interference that can result in
inconsistencies.

Although two transactions may be perfectly correct in themselves, the interleaving of


operations may produce incorrect results, thus compromising the integrity and
consistency of the database.

There are 3 such occurrences;

1. Lost update problem


This is whereby an apparently successfully completed update operation by one
user/application is prevailed/dominated by another user/application.

Example
Assume we have two transactions T1 and T2.
T1 is withdrawing KSh. 1,000 from an account with a balance of Ksh. 10,000 and T 2 is
depositing Ksh. 5000 into the same account.

If these transactions were executed serially, one after the other without interleaving of
operations, the final balance would be Ksh. 14,000 no matter which transaction is
performed first.

2
Transaction management & concurrency control

Problem:
Transaction T1 and T2 start at nearly the same time, and both read the balance as Ksh.
10,000. T2 increases the balance by Ksh. 5,000 to have the balance at Ksh. 15,000 and
stores the update in the database. Meanwhile transaction T1 reduces its copy of the
balance by Ksh. 1,000 to Ksh. 9,000 and stores the value in the database, overwriting the
previous update by T2 and thereby “losing” the Ksh. 5,000 previously added to the
balance.
Illustration (Problem).
Time T1 T2 Balance
t1 Begin transaction 10,000
t2 Begin transaction Read (Balance) 10,000
t3 Read (Balance) Balance = Balance + 5,000 10,000
t4 Balance = Balance - 1,000 Write (Balance) 15,000
t5 Write (Balance) Commit 9,000
t6 Commit 9,000

The loss of T2’s update can be avoided by preventing T1 from reducing the value of the
Balance until T2’s update has been completed.

Solution:
T2 first requests a write_lock on the Balance. It can the proceed to read the value of the
balance from the database, increase it by Ksh. 5,000 and write back the new value back to
the database. When T1 starts, it also requests a write_lock on the Balance. However, since
the balance is already write locked by T 2, the request is not immediately granted and T 1
has to wait until the lock is released by T2. This happens only after T2 commits.

Illustration - Solution
Time T1 T2 Balance
t1 Begin transaction 10,000
t2 Begin transaction Write_lock (Balance) 10,000
t3 Write_lock (Balance) Read (Balance) 10,000
t4 WAIT Balance = Balance + 5,000 15,000
t5 WAIT Write (Balance) 15,000
t6 WAIT Commit/unlock (Balance) 15,000
t7 Read (Balance) 15,000
t8 Balance = Balance - 1,000 15,000
t9 Write (Balance) 14,000
t10 Commit/unlock )Balance) 14,000

2. The uncommitted dependency problem


This occurs when one transaction is allowed to see the intermediate results of another
transaction before it has committed.

Assume 2 transactions T3 and T4.

3
Transaction management & concurrency control

Transaction T4 reads the balance as KSh. 10,000 and increases by Ksh. 5,000 and updates
the figure to Ksh. 15,000, but it aborts the transaction such that the balance is restored
back to the original value of Ksh. 10,000.
However, by this time, transaction T3 has read the new value of the balance (Ksh. 15,000)
and is using it as the basis of a Ksh. 1,000 withdrawal, giving a new incorrect balance of
KSh. 14,000 instead of Ksh. 9,000.

The reason for the aborting of the transaction is immaterial to us but the effect is the
assumption by T3 that T4‘s update completed successfully, even though it was rolled
back.
Illustration (Problem).
Time T3 T4 Balance
t1 Begin transaction 10,000
t2 Read (Balance) 10,000
t3 Balance = Balance + 5,000 10,000
t4 Begin transaction Write (Balance) 15,000
t5 Read (Balance) . 15,000
t6 Balance = Balance - 1,000 Rollback 10,000
t7 Write (Balance) 14,000
t8 Commit 14,000

The solution to the problem is to prevent T3 from reading the Balance till after T4 is
through (whether it completes successfully or not).

T4 first requests write_lock on the balance. It then proceeds to read the value of the
balance, increments the value by Ksh. 5,000 and writes back the value to the database,
but it does not commit. Instead, a roll back is issued. When the roll back is executed, the
updates of T4 are undone and the value of the Balance is returned to its original value of
Ksh. 10,000.

Illustration - solution.
Time T3 T4 Balance
t1 Begin transaction 10,000
t2 Begin transaction Write_lock (Balance) 10,000
t3 Write_lock (Balance) Read (Balance) 10,000
t4 WAIT Balance = Balance + 5,000 15,000
t5 WAIT Write (Balance) 15,000
t6 WAIT Rollback/unlock Balance 10,000
t7 Read (Balance) 10,000
t8 Balance = Balance - 1,000 10,000
t6 Write (Balance) 9,000
t7 Commit/unlock Balance 9,000

4
Transaction management & concurrency control

3. The inconsistent analysis problem


Introduction
The lost update problem and the uncommitted dependency problem mainly concentrate
on transactions that are updating the database and their interference may corrupt the
database.

On the other hand, transactions that only read the database can also produce inaccurate
results, if they are allowed to read partial results of other incomplete transactions that are
simultaneously updating the database, a situation referred to as a dirty read or
unrepeatable read.

The inconsistent analysis problem occurs when a transaction reads several values from
the database but another transaction updates one or more of the values during the
execution of the first.

Assume two transactions T5 and T6


T6 is calculating the total of balances in 3 accounts X (1,000), Y (700) and Z (500). In the
meantime, transaction T5 has transferred Ksh. 100 from account X to account Z, so that
T6 now has the wrong result.

Illustration (Problem).
Time T5 T6 Balances Total
X Y Z
t1 Begin transaction 1,000 700 500 -
t2 Begin transaction Total = 0 1,000 700 500 0
t3 Read (X) Read (X) 1,000 700 500 0
t4 X = X - 100 Total = Total + X 1,000 700 500 1,000
t5 Write (X) Read (Y) 900 700 500 1,000
t6 Read (Z) Total = Total +Y 900 700 500 1,700
t7 Z = Z + 100 . 900 700 500 1,700
t8 Write (Z) . 900 700 600 1,700
t9 Commit Read (Z) 900 700 600 1,700
t10 Total = Total + Z 900 700 600 2,300

The solution to this problem is to prevent transaction T6 from reading balances X, Y and
Z until T5 has completed its updates.

5
Transaction management & concurrency control

Illustration (solution).
Time T5 T6 Balances Total
X Y Z
t1 Begin transaction 1,000 700 500 -
t2 Begin transaction Total = 0 1,000 700 500 0
t3 Write_lock (X) Read_lock (X) 1,000 700 500 0
t4 Read (X) WAIT 1,000 700 500 0
t5 X = X - 100 WAIT 1,000 700 500 0
t6 Write (X) WAIT 900 700 500 0
t7 Write_lock (Z) WAIT 900 700 500 0
t8 Read (Z) WAIT 900 700 500 0
t9 Z = Z + 100 WAIT 900 700 500 0
t10 Write (Z) WAIT 900 700 600 0
t11 Commit/unlock (X, Z) WAIT 900 700 600 0
t12 Read (X) 900 700 600 0
t13 Total = Total + X 900 700 600 900
t14 Read_lock (Y) 900 700 600 900
t15 Read (Y) 900 700 600 900
t16 Total = Total +Y 900 700 600 1,600
t17 Read_lock (Z) 900 700 600 1,600
t18 Read (Z) 900 700 600 1,600
t19 Total = Total + Z 900 700 600 2,200
t20 Commit/unlock (X, Y & Z) 900 700 600 2,200

Serializability and recoverability


We have seen the problems associated with allowing transactions to execute
concurrently.

The aim of concurrency control is to schedule transactions in such a way as to avoid


interference.
One way of doing this is to allow only one transaction to execute at any one given time:
one transaction is committed before the next transaction is allowed to begin. However,
the aim of a multi-user DBMS is to maximize the degree of concurrency or parallelism
within the system, such that transactions that can execute without interfering with each
other can and should be allowed to run in parallel.

Definition;
Schedule – refers to a sequence of operations by a set of concurrent transactions that
preserves the order of the operations in each of the individual transactions.
A transaction is made up of a sequence of operations consisting of reads and writes to the
database, followed by a commit or abort actions.

A schedule S could be defined to consist of a sequence of operations from a set of n


transactions T1, T2, T3, …,Tn, subject to the constraint that the order of operations for each
transaction is preserved in the schedule. Therefore, for each transaction T i in schedule S,
the order of the operations in Ti must be the same in the schedule S.

6
Transaction management & concurrency control

Serial schedule – refers to a schedule where the operations of each transaction are
executed consecutively without any interleaved operations from other transactions.

In a serial schedule, the transactions are performed in serial order. For instance, if we
have 2 transactions T1 and T2, the serial order would be T1 then T2, or T2 then T1.
Evidently, in serial execution, there is no interference between transactions, since only
one transaction is executing at any given time.
It may not be guaranteed that the outcome of all serial executions of a given set of
transactions will be identical. For instance, in a bank, it matters a lot whether the interest
is calculated before or after a large deposit is made.

Non-serial schedule – refers to a schedule where the operations from a set of concurrent
transactions are interleaved.

The 3 problems of concurrency control described earlier arise from the mismanagement
of concurrency control, which left the database in an inconsistent state for the first two
problems and presented the user with a wrong result in the last problem (inconsistent
analysis).

Serial execution prevents such problems. Whatever schedule is chosen, serial execution
never leaves the database in an inconsistent state. Thus any serial schedule is considered
correct even though different results may arise.

Serializability aims at finding a non-serial schedule that allows transactions to execute


concurrently without interfering with one another, and thus produce a database state that
could be produced by a serial execution.

If a set of transactions execute concurrently, the non-serial schedule is termed correct if it


produces the same results as some serial execution, and such a schedule is said to be
serializable.
To prevent the inconsistent analysis problem, it is very important to guarantee
serializability of concurrent transactions.

In serializability, the ordering of reads and writes is important;


 If two transactions only read a data item, they do not conflict and order is not
important.
 If two transactions wither read or write completely separate data items, they do not
conflict and order is not important.
 If one transaction writes a data item and another either reads or writes the same data
item, the order of execution is important.

7
Transaction management & concurrency control

Consider the following schedule S1 containing operations from two concurrently


executing transactions T7 and T8.

Time T7 T8
t1 Begin_transaction
t2 Read (Balance X)
t3 Write (Balance X)
t4 Begin_transaction
t5 Read (Balance X)
t6 Write (Balance X)
t7 Read (Balance Y)
t8 Write (Balance Y)
t9 Commit
t10 Read (Balance Y)
t11 Write (Balance Y)
t12 Commit

Since the write operation on the balance in T8 does not conflict with the subsequent read
operation on the Balance in T7, the order of these two operations can be changed to
produce an equivalent schedule S2 shown below.
Time T7 T8
t1 Begin_transaction
t2 Read (Balance X)
t3 Write (Balance X)
t4 Begin_transaction
t5 Read (Balance X)
t6 Read (Balance Y)
t7 Write (Balance X)
t8 Write (Balance Y)
t9 Commit
t10 Read (Balance Y)
t11 Write (Balance Y)
t12 Commit

8
Transaction management & concurrency control

If we also change the order of the following non-conflicting operation, we produce an


equivalent schedule S3 as follows;
Time T7 T8
t1 Begin_transaction
t2 Read (Balance X)
t3 Write (Balance X)
t4 Read (Balance Y)
t5 Write (Balance Y)
t6 Commit
t7 Begin_transaction
t8 Read (Balance X)
t9 Write (Balance X)
t10 Read (Balance Y)
t11 Write (Balance Y)
t12 Commit

We have simply
 Changed the order of the Write (Balance X) of T8 with the Write (Balance Y) of T7.
 Changed the order of the Read (Balance X) of T8 with the Read (Balance Y) of T7.
 Changed the order of the Read (Balance X) of T8 with the Write (Balance Y) of T7.

The schedule S3 is a serial schedule and since S1 and S2 are equivalent to S3, S1 and S2
are serializable schedules.
This type of serializability is known as conflict serializability. This is a schedule that
orders any conflicting operations in the same way as some serial execution.
Under the unconstrained write rule (i.e. a transaction updates a data item based on its old
value, which is first read by the transaction), a precedence graph can be produced to test
for conflict serializability.

A precedence graph consists of;


 A node for each transaction
 A directed edge Ti → Tj, if Tj reads the value of an item written by Ti.
 A directed edge Ti → Tj, if Tj writes a value into an item after it has been read by Ti.

If the precedence graph contains a cycle then the schedule is not conflict serializable.

9
Transaction management & concurrency control

Non-conflict serializable schedule.


Consider two transactions T9 and T10. Transaction T9 is transferring Ksh. 1,000 from
one account with balance X to another account with balance Y, whilst T10 is increasing
the balance of these two accounts by 10%. The diagram follows;

Time T9 T10
t1 Begin transaction
t2 Read (Balance X)
t3 Balance X = Balance X –1,000 .
t4 Write (Balance X) Begin transaction
t5 Read (Balance X)
t6 Balance X = Balance X * 1.1
t7 Write (Balance X)
t8 Read (Balance Y)
t9 Balance Y = Balance Y * 1.1
t10 Write (Balance Y)
t11 Read (Balance Y) Commit
t12 Balance Y = Balance Y + 1,000
t13 Write (Balance Y)
t14 Commit

The precedence graph is as follows; X

T9 T10

Y
As the precedence graph has a cycle, then this schedule is not conflict serializable.

View serializability.
This is one other type of serializability that offers less stringent definitions of schedule
equivalence than that offered by conflict serializability.
Two schedules S1 and S2 consisting of the same operations from n transactions T1, T2, T3,
…, Tn are view equivalent if the following three conditions hold;

 For each data item x, if transaction Ti reads the initial value of x in the schedule S 1,
then transaction Ti must also read the initial value of x in the schedule S2.
 For each record operation on data item x by transaction Ti in schedule S1, if the
value read by x has been written by transaction T j, then transaction Ti must also
read the value of x produced by transaction Tj in Schedule S2.
 For each data item x, if the last write operation on x was performed by transaction
Ti in schedule S1, the same transaction must perform the final write on data item x
in schedule S2.

10
Transaction management & concurrency control

Concurrency control techniques

Serializability is achievable is several ways.


The most basic ways of attaining this is to use techniques that allow transactions to
proceed safely subject to certain constraints: locking and timestamping methods.

The above two methods are conservative (or pessimistic) techniques in that they cause a
delay in transactions in case there is a conflict with other transactions at some future time.
Alternative methods, called the Optimistic methods, are base don the premise that
conflict is rare and so transactions are allowed to proceed unsynchronized and only check
for conflict at the end, when the transactions reach the “commit” stage.

Locking
This is a procedure used to control concurrent access to data. When one transaction is
accessing the database, a lock may deny access by other transactions to prevent incorrect
results.
There are two types of lock;
Read lock: if a transaction has a read lock on a data item, it can read the item but not
update it. A read lock is shared i.e. many users can be granted a read lock at the same
time without an adverse effect on the database!
Writelock: if a transaction has write lock on a data item, it can both read and update the
item. A write lock is exclusive i.e. only one user/application can be granted a write lock
at any one particular time.

Locks work as follows;


 Any transaction that requires access to a data item must first lock the item by
requesting a read lock for read only access or a write lock for both read and writes
access.
 If the item is not already locked by another transaction, the lock will be granted.
 If the item is currently locked, the DBMS determines whether the request is
compatible with the existing lock – if a read lock is requested on an item that already
has a read lock on it, the request is granted; on the other hand, if a write lock is
requested on an item that already has a write-lock on it, then the transaction must
WAIT until the existing lock is released.
 A transaction continues to hold a lock until it explicitly releases it either during
execution or when it terminates (aborts or commits). It is only when the write lock is
released that the effects of the write operation will be made visible to other
transactions.

Some systems permit a transaction to issue a read lock on an item and then later upgrade
the lock to a write lock. This allows a transaction to examine data first and then decide
whether to update or not. If upgrades are not supported, a transaction must hold write
locks on all data items that it may update at some time during the execution of the
transaction, thereby potentially reducing the level of concurrency in the system.
For similar reasons, some systems also permit a transaction to issue a write lock and then
later downgrade the lock to a read lock.

11
Transaction management & concurrency control

Two-phase locking (2PL)


A transaction follows the two-phase locking protocol if all locking operations precede the
first unlock operation in the transaction.

According to the rules of this protocol, every transaction can be divided into tow phases;
first a growing phase, in which it acquires all the locks required but cannot release any
locks, and then the shrinking phase, in which it releases its locks but cannot acquire any
new locks. It is not mandatory that all locks be acquired simultaneously - a transaction
will normally acquire some locks, does some processes and goes on to acquire additional
locks as needed. However, no locks are released until the transaction has reached a stage
at which no new locks are needed.

The rules are;


 A transaction must acquire a lock on an item before operating on that item. The lock
may be read or write, depending on the type of access required.
 Once a transaction releases a lock, it can never acquire any new locks.

If upgrading of locks is supported, then it can only happen during the growing phase and
may dictate that the transaction wait until another transaction releases a read lock on the
item. Downgrading can only take place during the shrinking phase.

Deadlock
It refers to an impasse that may occur when two (or more) transactions are each waiting
for locks held by the other to be released.

Assume we have two transactions TA and TB;


Time TA TB
t1 Begin transaction
t2 Write_lock (balx) Begin transaction
t3 Read (balx) Write_lock (baly)
t4 Bal x = balx - 1000 Read (baly)
t5 Write (balx) Baly = baly + 2000
t6 Write_lock (baly) Write (baly)
t7 WAIT Write_lock (balx)
t8 WAIT WAIT
t9 WAIT WAIT
t10 . WAIT
t11 . .

In the above case, there's only one way to break deadlock: abort one or more of the other
transactions, which will involve undoing all the changes made by the transactions.
Assume we abort transaction TB. Once this is done, the locks held by transaction TB are
released and TA is able to proceed. Deadlocks should be transparent to the users, and
therefore the DBMS should automatically restart the aborted transactions.

There are two techniques for handling deadlock;

12
Transaction management & concurrency control

Deadlock prevention and deadlock detection and recovery.


In the deadlock prevention, the DBMS looks ahead to determine if a transaction would
cause deadlock and never allows deadlock to occur.
On the other hand, in deadlock detection and recovery, the DBMS allows deadlock to
occur but recognizes occurrences of the deadlock and breaks it.
It is generally easier to test for deadlock and break it when it occurs than to prevent it,
many systems use the deadlock detection and recovery.

Deadlock prevention
A common approach used in deadlock prevention is to order transactions using
transaction timestamps.
There are two algorithms used here;
Wait-die - it allows only an older transaction to wait for a younger one, otherwise the
transaction is aborted (dies) and restarted with the same timestamp, so that eventually it
will become the oldest active transaction and will not die.
Wound-wait - it works such that only younger transactions can wait for older ones. If
older transaction requests a lock held by a younger one, the younger one is aborted
(wounded).

Deadlock detection
It is usually handled by the construction of a wait-for Graph (WFG), showing
transaction dependencies; i.e. transaction Ti is dependent on Tj, if Tj holds a lock on a data
item that Ti is waiting for.
The WFG is constructed as follows;
Create a node for each transaction.
Create a directed edge Ti → Tj, if transaction Ti is waiting to lock an item that is currently
locked by Tj.

Deadlock exists if and only if the WFG contains a cycle. Since it is a necessary and
sufficient condition to have a cycle in the WFG for a deadlock to exist, the deadlock
detection algorithm generates the WFG regularly and examines it for a cycle.

Timestamping
A timestamp is a unique identifier created by the DBMS that indicates the relative
starting time of a transaction.
Timestamping, on the other hand is a concurrency control protocol in which the key
objective is to order transactions globally in a such a way that older transactions (those
with smaller timestamps) get priority in the event of conflict.

Optimistic techniques
In some systems, conflicts between transactions are rare, and the additional processing
required by locking or timestamping protocols is unnecessary for many transactions.
In this approach, it is assumed that conflict is rare and that it is more efficient to allow
transactions to proceed unsynchronised. When a transaction wishes to commit, a check is
performed to determine whether conflict has occurred.

13
Transaction management & concurrency control

If there has been conflict, the transaction must be rolled back and restarted. Since conflict
is rare, rollback is rare too.
The overhead is involved in restarting a transaction may be considerable, since it
effectively means redoing the entire transaction. This may be tolerated only if it happens
very infrequently, in which case majority of transactions will be processed without being
subjected to any delays. This allows for greater concurrency than traditional procotols,
since no locking is needed.

There are three phases to an optimistic concurrency control protocol, depending on


whether it is read only or an update transaction;
Read phase: this extends from the start of the transaction until immediately before the
commit. The transaction reads the values of all the data items it needs from the database
and stores them as local variables. Updates are applied to a local copy of the data, not to
the database.
Validation phase: if follows the read phase. Checks are performed to ensure
serializability is not violated if the transaction updates are applied to the database.
For a read-only transaction, this consists of checking that the data values read are still the
current values for the corresponding data items. If no interference occurred, the
transaction is committed. However, if interference occurred, the transaction is aborted
and restarted.
For an update transaction, validation consists of determining whether the current
transaction leaves the database in a consistent state, with serializability maintained. If not,
the transaction.
Write phase: this follows a successful validation phase for an update transaction. During
this phase, the updates made to the local copy are applied to the database.

Granularity of data items


This refers to the size of data items chosen as the unit of protection by a concurrency
control protocol.
The granule may be;
 The entire database
 A data file
 A page (sometimes called an area or database space - a section of physical disk in
which relations are stored.
 A record
 A field value of a record.

14

Vous aimerez peut-être aussi