Vous êtes sur la page 1sur 13

Transaction Processing Concepts

Introduction:
A transaction is an executing program that forms a logical unit of the database
processing. A transaction includes one or more database operations like insertion,
deletion, modification or retrieval operations. The transaction boundaries can be specified
by begin transaction and end transaction statements.
Examples of transaction processing systems:
Reservation systems, Credit card processing system, Insurance Processing.
Multiprogramming operating systems execute some commands from one process, then
suspend that process and execute some commands from the next process and so on. A
process is then resumed at the point where it was suspended whenever it gets its turn to
use the CPU .Concurrent execution of processes is actually said to be interleaved.
Interleaving keeps the CPU busy.
Granularity:
The size of the data item is called the granularity. A data item can be a field of some
record in the database, record or disk block.
BLOCK:
The basic unit of data transfer from disk to main memory is one block.A transaction
includes two basic database access operations.
1. read_item(X):
It reads a database item named X into a program variable. It works as follows,
1. The address of the disk block that contain item X is found.
2. The disk block is copied into a buffer in main memory.
3. The item X is copied from the buffer to the program variable named X.
2. write_item(X):
It writes the value of the program variable X into the database item X. It works as
follows,
1. The address of the disk block that contains item X is found.
2. The disk block is copied into a buffer in main memory.
3. The item X is copied from the program variable named X into its correct location in
the buffer.
4. The updated block from the buffer is stored back to disk either immediately or at
sometime later.
Concurrency control and recovery mechanisms are mainly concerned with the database
access commands in a transaction.Problems that can occur are,
Transactions execute concurrently and may access and update the same database items.
Database access commands in a transaction can occur concurrently.

Uncontrolled concurrent execution can cause inconsistent database.


Problems occurring because the transactions run concurrently are,
The Lost Update Problem:
Suppose that transactions T1 and T2 are submitted at approximately the same
time and suppose that their operations are interleaved then the final value of the

item X is incorrect because T2 reads the value of X before T1 changes it in the


database and hence the updated value resulting from T1 is lost. This is the lost
update problem.
Dirty read problem:
This problem occurs when one transaction updates a database and then the
transaction fails for some reason. The updated item is accessed by another
transaction before it is changed back to its original value.T2 has read the
temporary value of X which will not be recorded permanently in the database.
Value of X read by T2 is called dirty data.
Incorrect Summary Problem:
If one transaction is calculating the aggregate summary function on a number of
records while other transactions are updating some of these records, the aggregate
function may calculate some values before they are updated and others after they
are updated. This is the Incorrect summary problem.

Database failures:
Failures can be Transaction failure, Media failure or System failure.
Transaction failures can be due to,
1. Transaction error: This is due to the logical programming error or erroneous
value in the input.
2. Local errors: It may be necessary to cancel a transaction because some
exception condition may occur.
3. Concurrency control enforcement: The concurrency control method may
require a transaction to be aborted or restarted later.
System failures can be due to,
1. Computer crash: This may be due to hardware, software or network
problem.
2. Physical problems and catastrophes : Problems due to power, earthquake,
fire, theft etc.
Media failure can be due to,
1. Disk failure: This is due to read or write malfunction or disk read/write
head crash etc.

Transaction Operations:

BEGIN_TRANSACTIONMarks the beginning of a


transaction.

READ OR WRITETo specify the read or write operations on


the database.

END_TRANSACTIONmarks the end of transaction


execution.

COMMIT_TRANSACTIONmarks the successful end of


transaction and transactions are safely committed to database.

ROLLBACKSignals that transaction is not ended


successfully and the transaction changes has to be undone.

Transaction States:

Active State:

When transaction begins it goes into active state where it issues READ and
WRITE operations.
Partially Committed state:

When the transaction ends it moves to this state. Some recovery protocols will ensure
that a system failure will not result in an inability to record the changes of the
transaction permanently.
Committed State:

In this state the transaction has reached its commit point and the changes can be
made permanent to the database.
Failed State:

If any check fails or if transaction is aborted during its active state the transaction
goes to this state.
Terminated State:
This is when transaction leaves the system.

System Log:
It is the DBMS journal to keep track of all the transaction operations to affect the values
of the database items.To identify each transaction there is a unique id.
The entries in the log are as follows,
[start_transaction,T]This indicates that the transaction has started execution.
[write_item,T,X,old_value,new_value]This indicates that T has changed the value of X
from old value to new value.
[Read_item,T,X]This indicates T has read value X
[commit,T]This indicates T has completed successfully.
[abort,T]This indicates T has been aborted.

Commit point of the transaction:


A transaction T reaches its commit point when all its operations that access the database
have been executed successfully and the effect of all transactions operations is recorded
in the log.
If a system failure occurs the log is looked upon to find the transactions that have
[start_transaction,T] and have no[commit,T].All their write transactions are rolled
back.Then the write operations of those transactions that has committed is also undone.

Properties of transactions(ACID properties):


1.

Atomicity:

2.

A transaction is an atomic unit in processing. Either it is completed


entirely or not performed at all.
Consistency preservation:

3.

A transaction is consistency preserving if its completion takes the database


from one consistent state to another.
Isolation:

The execution of a transaction should not be interfered with any other


transactions.
4.
Durability or permanency: The changes applied to the database by a
committed transaction should persist in the database.

Schedules :
When transactions are executing concurrently in an interleaved fashion the order of
execution of operations from different transactions is called the schedule.
Two operations are said to conflict if,
1.They belong to different transactions.
2.They access the same item X.
3.Atleast one of them do a write operation.
Complete schedule:
A schedule S is a complete schedule if,
The operations in Schedule S are exactly those operations in T1,T2....Tn including
a commit or abort operation at the last.

For any pair of operation from the same transaction the order of appearance of
operations in the schedule should be same as in transaction .

For the conflicting operations one of the two must appear before the other in the
schedule.

Partially ordered Schedule:


When two non conflicting operations occur in the schedule without defining who
should come first then it is Partially ordered Schedule
Committed projection:
This is the schedule where the operations are only of committed transactions.
Recoverable schedule, Non Recoverable schedule:
A transaction based on the criteria that a transaction once committed will not
rollback is recoverable schedule and the vice versa is non recoverable schedule.
Cascadeless schedule:
This is to avoid the cascading rollback.
Strict schedule:

In this a transaction can neither read or write item X until the last transaction that
has wrote X has committed or aborted.

Serializability of schedules:
Based on serializability they can be classified into three,
1.Serial schedule:
Schedules are serial if the operations of one transaction is consecutively executed
without any interleaved operations from another transaction.
2.Nonserial schedule:
A schedule is non serial if the sequence interleaves operations of other
transactions.
3.Conflict serializable schedule:
A schedule is conflict serializable if it is conflict equivalent to some serial
schedule S.We can reorder the non conflicting operations in S until we form the
equivalent serial schedule S.

Transaction support i n SQL:


The charecterestics of a transaction are,
1.Access mode:
It can be read only mode or read write mode.
2.Diognostic area size:
Diagnostic size n, specifies a integer value n indicating the number of conditions
(i.e,errors or exceptions)that can be simultaneously held in diagnostic area.
3.Isolationlevel:
<isolation> value can be
READ UNCOMMITTED.
READ COMMITTED
REPEATABLE READ
SERIALIZABLE.
Three violations that can occur are,
1.Dirty Read:
This happens when the transaction T1 may read the update of transaction
T2 which has not committed and T2 fails or gets aborted.
2.Nonrepeatable read:
This occurs when a transaction T1 may read a value from the table, later
another transaction updates it. Now when T1 reads the data it finds another
value.
3.Phantoms:
This occurs when a transaction T1 reads the records from the table based
on where clause condition, later another transaction T2 inserts a record
which also satisfies the condition. Now when T1 repeats the read
operation with the same condition it sees additional records.
READ UNCOMMITTED violates all three.

READ COMMITTED violates repeatable read and phantom.


NON REPEATALE violates phantoms
SERIALIZABILTY doesnt violate any and is the default value.

CONCURRENCY CONTROL TECHNIQUES


There are several techniques to avoid the interference when the transactions are executing
concurrently.
Locking techniques for concurrency control:
Concurrent execution of transactions can be controlled by locking the data items. A lock
is a variable associated with a data item that describes the status of the item with respect
to the operations applied to it.
Two types of locks:
1.Binary Lock.
2.Shared/Exclusive lock.
Binary Lock
Binary lock values:
It can have only 2 states or values:
Locked (1)
Unlocked (0)
If LOCK(x)=1,then x cannot be accessed by the database operation that requests the
item.
If LOCK(x)=0,then the item can be accessed when requested.
Binary Lock operations:
2 operations with binary lock:
Lock_ item
Unlock_ item
When a transaction needs to access the item x it has to issue lock_ item(x) operation. If
LOCK(x)=1 the transaction has to wait else it can access it. Similarly after using the item
it has to issue unlock_ item(x) operation, which sets LOCK(x) =0, so that other
transaction can use it.
Implementation of Binary Lock:
Each lock can be a record with 3 fields <dataitemname, LOCK,locking transaction>.The
system has to store these details in a lock table which could be organized as a hash file.
The DBMS lock manager subsystem will keep track and control the access to the locks.
The rules followed in binary locking mechanism:
1. The operation lock_ item(x) should be issued before any read_ item(x) or write_
item(x) are performed in transaction T.
2. The operation unlock_ item(x) should be issued after all read_ item(x) or write_
item(x) in transaction T.
3. A lock_ item(x) should not be issued if the transaction is already holding lock on
x.
4. An unlock_ item(x) should not be issued unless the transaction is already holding
lock on x.

Between lock_ item(x) and unlock-item(x) operations the transaction T is holding the
lock on x, therefore allowing only one transaction to hold the lock on an item.

Shared/Exclusive Locks:
Several transactions should be allowed to access the item X if it is for reading purpose
only.If a transaction has to do write operation then it should be given exclusive access to
x.There are 3 lock operations.
1.read_ lock(x)
2.write_lock(x)
3.Unlock(x).
Therefore LOCK(x) can have the above 3 possible states.
The locking operations can be handled by maintaining lock table and keeping track of the
number of transactions holding a shared lock on the item in the lock table.
Each record in the lock table will have 4 fields.
1. Data item name.
2. LOCK.
3. Number of reads.
4. Locking transactions.
The state of the LOCK is write locked or read locked .No interleaving is allowed unless
the transaction started terminates by giving the lock or the transaction is placed on a
waiting queue for the item.
Algorithm for read_lock(x):
B: If LOCK(x)=unlocked then begin LOCK(x)read locked
No_of_reads(x)1
End
Else if LOCK(x)=read locked
then no_of_reads(x)no_of-reads(x)+1
Else begin wait (until LOCK(x)=unlocked and the lock manager wakes up the
transaction)
goto B
End.
Algorithm for write_lock(x):
If LOCK(x)=unlocked then LOCK(x)write locked
Else begin wait (until LOCK(x)=unlocked and the lock manager wakes up the
transaction)
goto B
End.
Algorithm for unlock(x):
If LOCK(x)=write-locked
then begin LOCK(x)=unlocked , wake up one of the waiting transactions, if any

End.
Else if LOCK(x)=read_locked
Begin
No-of-reads(X)no_of_reads(x)-1;
If no_of_reads(x)=0 then begin LOCK(x)=unlocked,wakeup one of the waiting
transaction if any
end
end;
The following rules should be enforced by the Shared/Exclusive Locks,
1) Transaction T must issue the operation read-lock(x) or write_lock(x) before any
read_item(x)operation is performed in T.
2) A transaction T must issue the operation write-lock(x) before any write_item(x)
operation is performed in T.
3) A transaction T must issue the operation unlock(x) after all read_item(x) and
write_item(x) operation is completed in T.
4) A transaction T will not issue a read_lock(x) if it already holds a read lock or
write lock on item x.
5) A transaction T will not issue a write_lock(x) if it already holds a read lock or
write lock on item x.
6) A transaction T will not issue a unlock(x) unless it already holds a read lock or
write lock on item x.
Upgrading the lock:
A transaction T can issue a read_lock(x) and then change to another lock by issuing a
write_lock(x) operation. This is called Upgrading the lock.
Downgrading the lock:
A transaction T can issue a write_lock(x) and then change to another lock by issuing a
read_lock(x) operation. This is called downgrading the lock.

Two Phase Locking (2PL):


The transaction can can be in two phases
1. Expanding or growing phase:
In this phase new locks can be acquired on items but none can be released. Upgrading of
locks takes place.
2.Shrinking Phase:
Existing locks can be released but no new locks can be acquired. Downgrading of locks
takes place.
Basic two phase locking:
A transaction T may not be able to release the item X even after using it because T might
need to access a lock on another item Y.T must lock Y before it is needed to release lock
on X.Therefore all items needed by T has to be locked to release X.We find the other
transactions needing X is forced to wait, similarly if Y is locked earlier the transactions
needing Y is also forced to wait. This method is known as
basic two phase locking.

Conservative 2PL/static 2PL:


This method requires a transaction to lock all the items it requires before the transaction
begins by predeclaring as read set and write set. If any item cannot be locked then it
doesnt lock any item and waits till all items are available.
Strict 2PL:
In this a transaction does not release any of its write lock until it commits or aborts.
Therefore no transaction can read or write the item written by T until T has committed.
Rigorous 2PL:
In this a transaction does not release any of its write lock or read lock until it commits or
aborts.

Problems caused by Locks:


1. Deadlocks.
2. Starvation.

Deadlock:
Deadlock occurs when each transaction T in asset of two or more transactions is waiting
for some item that is locked by some other transaction T in the set. Each transaction in
the set is waiting for the other transaction to release the lock. This state where none of
them can proceed further is called deadlock.
Deadlock prevention protocols:
The protocol used in conservative locking requires that a transaction locks all the items it
needs in advance, if the items are not obtained, the transaction waits and tries to lock all
items.
Another protocol requires that all items are ordered and the transactions have to lock
them in the order.
Another protocol works based on the timestamp, a unique identifier for each transaction.
Two schemes,
1. Wait-die scheme:
If TS (Ti)<TS (Tj)then
Ti is older than Tj,
Ti is allowed to wait.
Else
Ti is younger than Tj,
Abort Ti so that Ti dies,
Restart it later with the same timestamp.
In wait-die the older transaction waits on the younger transaction and the younger
transaction requesting the item held by older transaction is aborted and restarted.
2. Wound-wait scheme:
If TS (Ti) <TS (Tj) then

Ti is older than Tj
Abort Tj (Ti wounds Tj)
Restart later with the same timestamp.
Else
Ti is younger than Tj
Ti is allowed to wait.
In this the younger one is allowed to wait on the older one whereas the older one waiting
for the item held by the younger transaction pre-empts the younger transaction by
aborting it.
3. No waiting Algorithm:
In this if a transaction is unable to obtain the lock it is immediately aborted and
restarted after a certain time delay without checking whether a deadlock will
actually occur or not.
4. Cautious Waiting Algorithm:
In this the transaction Ti needs to lock an item X which is locked by another transaction
Tj with a conflicting lock. If Tj is not waiting for some other locked item, then Ti is
blocked and allowed to wait, otherwise abort Ti.
Deadlock detection and Timeouts:
Dead lock is detected using wait-for-graph. A node is created for every transaction in the
wait for graph. A directed edge exists between Ti and Tj, if Ti is waiting for a lock held
by Tj. If a cycle occurs in the graph then the system is deadlocked. If the system is
deadlocked a transaction has to be selected to be aborted and this selection is called
victim selection. Time out method can also be used where each transaction when it
exceeds the system defined timeout period will be aborted regardless of if the deadlock
occurred or not.
Starvation:
Starvation occurs when the transaction cannot proceed for indefinite period of time while
other transactions are normally carried out. This happens when the same transaction is
repeatedly chosen as the victim for aborting. The solution to this can be to use the first
come first served queue or to increase the priority of the transactions they wait longer.

Concurrency control based on the timestamp ordering


Timestamp is a unique identifier created by DBMS to each transaction. They are assigned
in the order in which the transactions are submitted by the user. It can be generated in 2
ways,
1. A counter can be initialized to 0 and can be incremented each time and assigned to a
transaction.
2. The current date time value itself can be used provided no two timestamp values are
generated at the same tick of the clock.
Timestamp ordering:
In a serializable schedule transactions are ordered according to their timestamp
values.The algorithm uses two timestamp values.

1. Read_TS(X) (read timestamp of item X):Read_TS(X)=TS(T),where T is the last


transaction that has read the item X successfully.
2. Write_TS(X) (write timestamp of item X):Write_TS(X)=TS(T),where T is the last
transaction that has written the item X successfully.
Basic Timestamp Ordering:
When some transaction T tries to issue read_item(X)or write_item(X) operation the basic
timestamp ordering algorithm compares the timestamp(T) with read_TS(X) and
write_TS(X) to ensure that the timestamp order of transaction execution is not violated. If
violated then the transaction is rejected and started later with the new timestamp.
The concurrency control algorithm checks the 2 cases,
1. When transaction T issues write_item(X) operation,
i) If read_TS(X)>TS (T) or write_TS(X)>TS(T),then abort and roll back T and
reject the operation.
ii) Otherwise execute write_item(X) and update write_TS(X) =TS(T)
2. When transaction T issues read_item(X) operation,
i) If write_TS(X)>TS (T),then abort and roll back T and reject the operation.
ii) Otherwise execute read_item(X) and update read_TS(X) = larger of TS (T)
,timestamp of transaction which performed the last read operation of X.
Thomass write rule:
It rejects fewer write operations by modifying the conditions for write_item(X)
operation.
i) If read_TS(X)>TS (T) ,then abort and roll back T and reject the operation.
ii) If write_TS(X)>TS(T),donot execute the write operation but continue the
processing, because some transaction with the timestamp greater than TS(T)
has already written
the value of X and the write_item(X)operation of T
must be ignored as it is already outdated and obsolete.
iii) If the above two cases doesnt hold then execute write_item(X) and update
write_TS(X) =TS (T).
Strict timestamp ordering:
A transaction T that issues a read_item(X) or write_item(X) such that TS
(T)>write_TS(X) has its read or write operation delayed until the transaction
T that wrote the value of X have committed or aborted.

Optimistic Concurrency control techniques


Concurrency control protocol has three phases,
1. Read Phase:
A transaction can read values of committed data items from the database.
Updates are applied only to local copies of the data items kept in the
transaction workspace.
2. Validation Phase:
Checking is done to ensure that the serializability is not violated if the
transaction updates are applied to the database.

3. Write Phase:
If the validation phase is successful the transaction updates are applied to
the database, otherwise the updates are discarded and the transactions are
restarted.

Using Locks for concurrency control in indexes


When an index search is performed the path in the tree is traversed from the root to the
leaf. Once the lower level leaf node is accessed the higher level nodes in the path will not
be used again .Therefore the lock on the child node can be obtained and the lock on
parent node can be released.
When an insertion occurs,
1. The simplest approach would be to lock the root node in exclusive mode and then to
access the appropriate child node of the root. If the child node is not full, then the lock on
the root node can be released. The exclusive locks can be released later.
2. Another optimistic approach is to request and hold the shared locks on the nodes
leading to the leaf node with the exclusive lock on the leaf. If insertion causes the leaf
node to split the insertion will propagate to the higher level node. Then the locks on the
higher level will be upgraded to exclusive locks.
3. Another approach is the use of B-link tree, where the siblings at the same level are
linked together at every level. This allows shared locks to be used when requesting a page
and requires that the lock be released before accessing the child node. For an insert
operation the shared lock on a node would be upgraded to exclusive node. If split occurs
the parent node must be relocked in exclusive mode.
For deletion where two or more nodes from the index tree merges locks on the nodes to
be merged are held as well as the lock on the parent of the two nodes to be merged.

Vous aimerez peut-être aussi