Vous êtes sur la page 1sur 44

Transaction Processing Recovery & Concurrency Control

What is a Transaction?
A logical unit of work on a database
An entire program A portion of a program A single command

The entire series of steps necessary to accomplish a logical unit of work Successful transactions change the database from one CONSISTENT STATE to another (One where all data integrity constraints are satisfied)

All database access operations between Begin Transaction and End Transaction statements are considered one logical transaction. If the database operations in a transaction do not update the database but only retrieve data , the transaction is called a read-only transaction. Basic database access operations : read_item(X) : reads a database item X into program variable. Write_item(X) : Writes the value of program variable X into the database item X.

Example of a Transaction
Updating a Record
Locate the Record on Disk Bring record into Buffer Update Data in the Buffer Writing Data Back to Disk

What is a transaction
database in a consistent state database in a consistent state

Account A Rs.1000 Account B Smith Rs.0


begin Transaction

Transfer Rs.500

Account A Rs.500 Account B Smith Rs.500

execution of Transaction database may be temporarily in an inconsistent state during execution

end Transaction

Properties of a Transaction (ACID Properties)


Atomic All or Nothing All parts of the transaction must be completed and committed or it must be aborted and rolled back Consistent Each user is responsible to ensure that their transaction (if executed by itself) would leave the database in a consistent state

Properties of a Transaction
Isolation The final effects of multiple simultaneous transactions must be the same as if they were executed one right after the other Durability If a transaction has been committed, the DBMS must ensure that its effects are permanently recorded in the database (even if the system crashes)

Example of Fund Transfer


Transaction to transfer $50 from account A to account B:
1. 2. 3. 4. 5. 6. read(A) A := A 50 write(A) read(B) B := B + 50 write(B)

Atomicity requirement if the transaction fails after step 3 and before step 6, money will be lost leading to an inconsistent database state Failure could be due to software or hardware the system should ensure that updates of a partially executed transaction are not reflected in the database Durability requirement once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the database by the transaction must persist even if there are software or hardware failures.

Example of Fund Transfer (Cont.)


Transaction to transfer $50 from account A to account B: 1. read(A) 2. A := A 50 3. write(A) 4. read(B) 5. B := B + 50 6. write(B) Consistency requirement in above example: the sum of A and B is unchanged by the execution of the transaction A transaction must see a consistent database. During transaction execution the database may be temporarily inconsistent. When the transaction completes successfully the database must be consistent Erroneous transaction logic can lead to inconsistency

Example of Fund Transfer (Cont.)

Isolation requirement if between steps 3 and 6, another transaction T2 is allowed to access the partially updated database, it will see an inconsistent database (the sum A + B will be less than it should be). T1 T2 1. read(A) 2. A := A 50 3. write(A) read(A), read(B), print(A+B) 4. read(B) 5. B := B + 50 6. write(B Isolation can be ensured trivially by running transactions serially that is, one after the other. However, executing multiple transactions concurrently has significant benefits, as we will see later.

Active the initial state; the transaction stays in this state while it is executing Partially committed after the final statement has been executed. Failed -- after the discovery that normal execution can no longer proceed. Aborted after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted:
restart the transaction
can be done only if no internal logical error

Transaction State

kill the transaction

Committed after successful completion.

Transaction State (Cont.)

System Log (Transaction Log)


The system maintains a log to keep track of all transaction operations that affect the values of database items. This log may be needed to recover from failures.

For every transaction a unique transaction-id is generated by the system. [start_transaction, transaction-id]: the start of execution of the transaction identified by transaction-id

Entries in the System Log


Credit_labmark (sno NUMBER, cno CHAR, credit NUMBER) old_mark NUMBER; new_mark NUMBER; SELECT labmark INTO old_mark FROM enrol WHERE studno = sno and courseno = cno FOR UPDATE OF labmark; new_ mark := old_ mark + credit; UPDATE enrol SET labmark = new_mark WHERE studno = sno and courseno = cno ; COMMIT; EXCEPTION WHEN OTHERS THEN ROLLBACK; END credit_labmark;

[read_item, transaction-id, X]: the transaction identified by transaction-id reads the value of database item X. Optional in some protocols. [write_item, transaction-id, X, old_value, new_value]: the transaction identified by transaction-id changes the value of database item X from old_value to new_value [commit, transaction-id]: the transaction identified by transaction-id has completed all accesses to the database successfully and its effect can be recorded permanently (committed) [abort, transaction-id]: the transaction identified by transaction-id has been aborted

Transaction execution
A transaction reaches its commit point when all operations accessing the database are completed and the result has been recorded in the log. It then writes a [commit, transaction-id].
BEGIN TRANSACTION active END TRANSACTION partially committed ROLLBACK

COMMIT committed

READ, WRITE

ROLLBACK

failed

terminated

If a system failure occurs, searching the log and rollback the transactions that have written into the log a [start_transaction, transaction-id] [write_item, transaction-id, X, old_value, new_value] but have not recorded into the log a [commit, transaction-id]

Concurrent Executions
Multiple transactions are allowed to run concurrently in the system. Advantages are:
increased processor and disk utilization, leading to better transaction throughput
E.g. one transaction can be using the CPU while another is reading from or writing to the disk

reduced average response time for transactions: short transactions need not wait behind long ones.

Concurrency control schemes mechanisms to achieve isolation


that is, to control the interaction among the concurrent transactions in order to prevent them from destroying the consistency of the database

Schedules
Schedule a sequences of instructions that specify the chronological order in which instructions of concurrent transactions are executed a schedule for a set of transactions must consist of all instructions of those transactions must preserve the order in which the instructions appear in each individual transaction. A transaction that successfully completes its execution will have a commit instructions as the last statement by default transaction assumed to execute commit instruction as its last step A transaction that fails to successfully complete its execution will have an abort instruction as the last statement

Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. A serial schedule in which T1 is followed by T2 :

Schedule 1

A serial schedule where T2 is followed by T1

Schedule 2

Let T1 and T2 be the transactions defined previously. The following schedule is not a serial schedule, but it is equivalent to Schedule 1.

Schedule 3

In Schedules 1, 2 and 3, the sum A + B is preserved.

The following concurrent schedule does not preserve the value of (A + B ).

Schedule 4

Serial and Non-serial Schedules


A schedule S is serial if, for every transaction T participating in the schedule, all of T's operations are executed consecutively in the schedule; otherwise it is called non-serial. Non-serial schedules mean that transactions are interleaved. There are many possible orders or schedules. Serialisability theory attempts to determine the 'correctness' of the schedules.

Serializability of Schedules
Serial, Nonserial, and Conflict-Serializable schedules (continued)

the problems of serial schedules :


they limit concurrency or interleaving of operations
if a transaction waits for an I/O operation to complete, we cannot switch the CPU Processor to another transaction if some transaction T is long , the other transactions must wait for T to complete all its operations.

IS 533 - Transactions

23

Serial, Nonserial, and Conflict-Serializable schedules (continued) A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the same n transactions. Two schedules are called result equivalent if they produce the same final state of the database. Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the same in both schedules

schedule S is conflict serializable if it is conflict equivalent to some serial schedule S.


IS 533 - Transactions 24

Basic Assumption Each transaction preserves database consistency. Thus serial execution of a set of transactions preserves database consistency. A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of schedule equivalence give rise to the notions of: 1. conflict serializability 2. view serializability Simplified view of transactions We ignore operations other than read and write instructions We assume that transactions may perform arbitrary computations on data in local buffers in between reads and writes. Our simplified schedules consist of only read and write instructions.

Serializability

Read/Write Conflict Scenarios: Conflicting Database Operations Matrix

Transactions T1 and T2 are executing concurrently over the same data When they both access the data and at least one is executing a WRITE, a conflict can occur

Conflict Serializability
If a schedule S can be transformed into a schedule S by a series of swaps of nonconflicting instructions, we say that S and S are conflict equivalent. We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule

Conflict Equivalence - Intuition


If you can transform an interleaved schedule by swapping consecutive non-conflicting operations of different transactions into a serial schedule, then the original schedule is conflict serializable. R(A) R(B) W(B) W(A) Example:

R(A) W(A)

R(B) W(B)

R(A) W(A) R(B) R(B) W(B) W(B) R(B) W(B)


R(A) R(A) W(A)R(A) W(A) W(A) R(B) W(B)

Conflict Equivalence (Continued)


Heres another example:

R(A)

R(A) W(A)

W(A)

Serializable or not????

NOT!

Consider some schedule of a set of transactions T1, T2, ..., Tn


Precedence graph a direct graph where the vertices are the transactions (names). We draw an arc from Ti to Tj if the two transaction conflict, and Ti accessed the data item on which the conflict arose earlier. We may label the arc by the item that was accessed. Example 1 x

Testing for Serializability

Dependency Graph
Dependency graph: One node per Xact; edge from Ti to Tj if an operation of Ti conflicts with an operation of Tj and Tis operation appears earlier in the schedule than the conflicting operation of Tj. Theorem: Schedule is conflict serializable if and only if its dependency graph is acyclic

Example
A schedule that is not conflict serializable:
T1: T1: T2: T2: R(A), R(A), W(A), W(A), R(B), R(B), W(B) W(B) R(A), R(A), W(A), W(A), R(B), R(B), W(B) W(B)

A
T1 B T2 Dependency graph

The cycle in the graph reveals the problem. The output of T1 depends on T2, and viceversa.

Testing for Conflict Serializability of a Schedule (Precedence Graph)

Algorithm for Testing conflict serializability of schedule S


For each transaction Ti participating in schedule S, create a node labeled Ti in the percedence graph For each case in S where Tj executes a read_item(X) after Ti executes a write_item(X),create an edge (TiTj) For each case in S where Tj executes a write_item(X) after Ti executes a read_item(X) create an edge (TiTj) For each case in S where Tj executes a write_item(X) after Ti executes a write_item(X) create an edge (TiTj) The schedule S is serializable if and only if the precedance graph has no cycles.
IS 533 - Transactions 33

Example
T1 T2 R(X) W(X) T3 T1 R(X) W(X) R(Y) W(Y) R(Y) W(Y) T2 T3

Acyclic: serializable

34

Examples (Precedence Graph)


Assume we have these three transactions:
T1: r1(x);w1(x);r1(y);w1(y) T2: r2(z);r2(y);w2(y);r2(x);w2(x) T3: r3(y);r3(z);w3(y);w3(z) Assume we have these schedule: S1: r2(z);r2(y);w2(y); r3(y);r3(z); r1(x);w1(x); w3(y);w3(z);r2(x); r1(y);w1(y); w2(x) No equivalent serial schedule (cycle x(T1T2),y(T2T1)) (cycle x(T1T2),yz(T2T3),y(T3T1))
IS 533 - Transactions

y
T2 y,z T3
35

T1 y

Examples (Precedence Graph)


Assume we have another schedule for the same transactions: S2: r3(y);r3(z);r1(x); w1(x);w3(y);w3(z);r2(z); r1(y);w1(y);r2(y); w2(y);r2(x);w2(x)

Equivalent serial schedule T3T1T2 T1 x,y T2

y
T3
IS 533 - Transactions

y,z

36

Producing the Equivalent Serial Schedule


If the precedence graph for a schedule is acyclic, then an equivalent serial schedule can be found by a topological sort of the graph

37

Let S and S be two schedules with the same set of transactions. S and S are view equivalent if the following three conditions are met, for each data item Q, 1. If in schedule S, transaction Ti reads the initial value of Q, then in schedule S also transaction Ti must read the initial value of Q. 2. If in schedule S transaction Ti executes read(Q), and that value was produced by transaction Tj (if any), then in schedule S also transaction Ti must read the value of Q that was produced by the same write(Q) operation of transaction Tj . 3. The transaction (if any) that performs the final write(Q) operation in schedule S must also perform the final write(Q) operation in schedule S. As can be seen, view equivalence is also based purely on reads and writes alone.

View Serializability

View Serializability (Cont.)


A schedule S is view serializable if it is view equivalent to a serial schedule. Every conflict serializable schedule is also view serializable. Below is a schedule which is view-serializable but not conflict serializable. What serial schedule is above equivalent to?

Concurrent Executions
Multiple transactions are allowed to run concurrently in the system. Advantages are:
increased processor and disk utilization, leading to better transaction throughput
E.g. one transaction can be using the CPU while another is reading from or writing to the disk

reduced average response time for transactions: short transactions need not wait behind long ones.

Concurrency control schemes mechanisms to achieve isolation


that is, to control the interaction among the concurrent transactions in order to prevent them from destroying the consistency of the database

Problems due to the Concurrent Execution of Transactions


The Lost Update Problem The Incorrect Summary or Unrepeatable Read Problem The Temporary Update (Dirty Read) Problem

The Lost Update Problem Two transactions accessing the same database item have their
operations interleaved in a way that makes the database item incorrect

T1: read_item(X); X:= X - N;

T2:

X 4 2

X=4 Y=8 N=2 M=3

read_item(X); X:= X + M; write_item(X); read_item(Y); write_item(X); 2 8

4 7 7

Y:= Y + N; 10 write_item(Y); 10 item X has incorrect value because its update from T1 is lost (overwritten) T2 reads the value of X before T1 changes it in the database and hence the updated database value resulting from T1 is lost

The Incorrect Summary Problem


One transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records. The aggregate function may calculate some values before they are updated and others after.

T1

Sum:=0; Read_item(A); sum:=sum+A;

T3

Time

Read_item(X); X=:X-N; write_item(X);

. . .

Read_item(X); sum:=sum+x; read_item(Y); sum:=sum+Y; read_item(Y); Y:=Y+N; write_item(Y);

T3 reads X after N is subtracted and reads Y before N is added;

Unrepeatable Read problem: A transaction reads items twice with two different
values because it was changed by another transaction between the two reads.
IS 533 - Transactions 43

The temporary Update (or Dirty read) problem


One transaction updates a database item and then the transaction fails. The updated item is accessed by another transaction before it is changed back to its original value

T1 Read_item(X); X=:X-N; write_item(X); Time read_item(Y);


T1 fails and must rollback, meanwhile T2 has read the temporary value
IS 533 - Transactions

T2

Read_item(X); X=:X+M; write_item(X);

44

Vous aimerez peut-être aussi