Académique Documents
Professionnel Documents
Culture Documents
DB Concepts TRANSPARENCIES
Page 1 of 75
Page 2 of 75
Page 4 of 75
Page 5 of 75
DATABASE .
Page 6 of 75
DATABASE (Contd..)
Why Database ?
Application needs constant changing. Rapid access is frequently required to answer ad hoc queries. Many data elements must be shared by users throughout the organization. Need to communicate data across department boundaries. Need to improve the consistency of data. Need to control access to data.
Page 7 of 75
DATABASE (Contd..)
What is a Database ?
Shared collection of interrelated data capable of serving multiple applications. Two important properties Integrated Shared Integrated : Distinct data files are logically organized to reduce redundancy. Shared : All qualified users in the organization have access to the data.
Page 8 of 75
DATABASE (Contd..)
Benefits (Contd...)
Maintains links between related data Reduces data redundancy.
Saves space.
Improves data consistency. Less redundancy and greater sharing leads to less confusion. Centralized control over data standards & security.
Page 9 of 75
DATABASE (Contd..)
Benefits (Contd...)
Data Independence Data organization and access techniques do not have to be built into the code of every application using the data. It can be categorized in two ways :
Logical Data Independence. Physical Data Independence.
10
DATABASE (Contd..)
The Database Approach - Benefits (Contd...)
Logical Data Independence
Users and user programs are independent of the logical structure of the database. So Growth of the database is possible, in order to incorporate new kinds of information. e.g. The expansion of an existing base table to include a new column. The inclusion of a new base table in a database. These kinds of changes normally do not have any effect on existing users/programs.
11
DATABASE (Contd..)
The Database Approach - Benefits (Contd...)
Physical Data Independence Degree to which a program is unaffected by changes to the storage structure.
e.g. In DB2 it is possible to remove an index from the database without having to change or recompile the programs that have used that index.
12
DATABASE (Contd..)
Benefits (Contd...)
Data Integrity. Ensuring the correctness of database contents. Values in a database can become invalid due to the following :
Values specified incorrectly (mistyping). Two or more data modification transactions individually operate correctly. But their interaction can produce invalid results. Eg :Two agents allocate the same seat to different customers.
13
DATABASE (Contd..)
Benefits (Contd...)
Concept Of Transaction What is a Transaction ?
A logical unit of work.
DATABASE (Contd...)
Benefits (Contd....)
Database Failure and Recovery Failures can be broadly categorized of two types :
System Failure Media Failure
System Failure : Eg. Power supply, which effects all current transactions. It do not physically damage the database. So here, the current transactions effected due to system failure will either Undo or Redo. Recovery for such failure is CHECKPOINT.
DATABASE (Contd...)
Benefits (Contd.....)
Database Failure and Recovery (Contd..) CHECKPOINT
When Checkpoint occurs on a database, two tasks are performed : force to write database buffer to physical storage. Writing Checkpoint record to physical log. Checkpoint record consist of list of all transactions that were in progress when checkpoint occurred.
DATABASE (Contd....)
CHECKPOINT (Contd.)
Time tc tf
T r a n s a c t i o n s
T 1 T 2 T3 T 4 T5
DATABASE (Contd....)
Benefits (Contd.....)
Database Failure and Recovery (Contd..) Media Failure : Eg. Hard disk crash, which cause physical damage to database and effects those transactions currently using that portion. Here there is no need to Undo or Redo transactions that were in progress at the time of media failure. Recovery for such failure is to use backups or redump/reload utility to restructure the whole database.
DATABASE (Contd....)
Benefits (Contd....)
Audit Trail It is a special file or a database in which system keeps track of all operations performed by users on regular database. It is required when :
when the data is sufficiently sensitive. The processing performed on the data is very critical. To know the person who is leading the data-discrepancies.
DATABASE (Contd.....)
Benefits (Contd.....)
Concurrency It is said to occur when many transactions access the same data at the same time. We can look at three common concurrency problems :
The lost update problem. The uncommitted dependency problem. The inconsistent analysis problem.
DATABASE (Contd.....)
Benefits (Contd......)
The Lost Update Problem
Transaction A FETCH R
TIME T1 T2
UPDATE R
T3 T4
Transaction A loses an update at time T4 because Transaction B overwrites it without even looking at it
DATABASE (Contd.....)
T1 FETCH R T2
UPDATE R
T3 ROLLBACK
DATABASE (Contd......)
T1 UPDATE R T2 T3
UPDATE R
ROLLBACK
Transaction A updates an uncommitted change at time T2 and loses that update at time T3
DATABASE (Contd.......) Three Concurrency Problems (Contd..) The Inconsistent Analysis Problem Here transaction A is adding ACC1,ACC2 and ACC3
transaction B is transferring 10 from ACC3 to ACC1.
ACC 1 40
ACC 2 50
ACC 3 30
Transaction A
FETCH ACC1 sum = 40 FETCH ACC2 sum = 90
TIME Transaction B
T1 T2 T3 T4 T5 T6 T7 FETCH ACC3 UPDATE ACC3
30-- 20
FETCH ACC 1 UPDATE ACC 1
40-- 50
COMMIT FETCH ACC3 T8 sum = 110 NOT 120
DATABASE (Contd.......)
Benefits (Contd......)
Locking The standard mechanism to deal with concurrency is LOCKING. In general locks can be of two types :
Shared Lock (S) Exclusive Lock. (X)
Availability of locks.
Locks S X For S lock Y N For X lock N N
DATABASE (Contd.....)
Locking Services(Contd.)
Locking The Uncommitted Dependency Problem-1 Transaction A TIME Transaction B T1 FETCH R T2 (S lock on R) wait T3 Resume : FETCH R (S lock on R) T4 UPDATE R (X lock on R)
SYNCPOINT
(Release X lock on R)
DATABASE (Contd.....)
Locking Services(Contd.)
Locking - The Lost Update Program Transaction A FETCH R TIME Transaction B T1 T2 UPDATE R
(request X lock on R)
(acquire S lock on R)
FETCH R (S lock on R)
T3 T4 UPDATE R
( X lock on R)
T5 T6
DATABASE (Contd.....)
Locking Services(Contd.)
Locking The Uncommitted Dependency Problem-2 Transaction A TIME Transaction B T1 UPDATE R (X lock on R) wait T2 T3 Resume : UPDATE R T4 (X lock on R)
Here A is prevented from updating an uncommitted change at time T2
UPDATE R (X lock on R)
SYNCPOINT
(Release X lock on R)
DATABASE (Contd.....)
Locking Services (Contd.)
Locking - The Incosistent Analysis Problem
ACC 1 40 ACC 2 50 ACC 3 30
Transaction A
FETCH ACC 1 (S lock) sum = 40 FETCH ACC 2 (S lock) sum = 90
TIME Transaction B
T1 T2 T3 T4 FETCH ACC3 (S lock on ACC 3) UPDATE ACC3
(X lock on ACC 3) 30 -->20 T5 FETCH ACC 1 (X lock on ACC 1) T6 UPDATE ACC ` 40 --> 50 wait T7 T8 wait
30
31
32
564
FAN
WHITE
800
33
34
35
A A A
B B B
37
38
39
40
41
42
: : : :
44
DBMS (Contd..)
DBMS stands between the user and data, insulating the user from the actual physical storage medium. Hence, user cannot interact with the stored data directly. User requests DBMS to obtain a specific record from the database. DBMS is built upon the foundation provided by existing file access methods.
e.g. DB2 uses VSAM access method as its storage strategy.
The file access services of the DBMS retrieve the requested data.
DB2_Tr Ver. 1.0.0 04/12/1998 Page 45 of 75 45
DBMS (Contd..)
46
DBMS (Contd..)
47
DBMS (Contd..)
DBMS are categorized by the manner in which they present the data to the user.
Hierarchical data model uses tree structure to represent data. Example : IMS Network model is the same as hierarchical model other than it allows many to many relationships i.e. one child linked to many parents. Example : IDMS Relational model shows data in the form of tables. Example : DB2
DB2_Tr Ver. 1.0.0 04/12/1998 Page 48 of 75 48
Hierarchical Model
An Example
Consider a database containing records for various courses selected by various students. In the Hierarchical Model, the logical representation would be :
A102
Amit Singh
Marks
Parent record
C01 C03
English
86 94
Science
C06 Economics 81
49
Network Model
An Example
Student Code Student Name
C01
Course Code
English
Course Name
Science
C06 Economics
51
52
53
54
Relational Model
Example
Representation of student-course database using Relational Model
Student Code A101 A102 Student Name Atul Kahate Amit Singh Course Code C01 C02 C03 C06 Marks 87 85 83 86 94 81 Course Name English Maths Science Economics
55
Operations
Insert, Update or Delete - None poses a problem as all tables are independent of each other
56
Links between the associated tables are made by the DBMS. User specifies only what data is to be accessed and not how to access it. Increases the simplicity of the request.
DB2_Tr Ver. 1.0.0 04/12/1998 Page 57 of 75 57
E. F. Codds 12 rules
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. The Information rule The rule of guaranteed access The systematic treatment of NULL values rule The database description rule The comprehensive data sub-language rule The view updation rule The insert, update and delete rule The physical data independence rule The logical data independence rule Integrity independence rule Distribution independence rule Non-subversion rule
58
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS) A DBMS based on the relational model is called Relational Database Management System (RDBMS). RDBMS presents complete information content of the database as a collection of tables. An ideal RDBMS should satisfy Codds rules.
59
RDBMS (Contd..)
A table is made up of rows and columns. One table for each entity. One row for each instance (occurrence) of the entity. One column for each attribute of the entity.
60
RDBMS (Contd..)
All data values are atomic.
Within each row, only one value exists for each column. Valid table
S# ---S2 S2 S4 S4 S4 P# ---P1 P2 P2 P4 P5
Invalid table
S# ---S2 S4 P# ---(P1, P2) (P2, P4, P5)
61
RDBMS (Contd..)
Total collection of all values that may occur within a given column is its domain. All data in a given column must be of the same type. Table can contain a maximum of 300 columns. A row is the smallest unit of data that can be inserted or deleted. Data is presented with values at the intersection of a row and column. Rows and columns are not stored in any specific order.
DB2_Tr Ver. 1.0.0 04/12/1998 Page 62 of 75 62
RDBMS (Contd..)
No duplicate rows are allowed.
Those columns which are responsible for ensuring row's uniqueness are the table's PRIMARY KEY. A primary key may consist of one column or multiple columns.
63
RDBMS (Contd..)
64
RDBMS (Contd..)
Terminologies
------------------------------------------------------------Relational RDBMS Conventional ------------------------------------------------------------Relation Table File Tuple Attribute Degree Cardinality Row Column Number of Columns Number of Rows Record Occurrence Field Number of Fields Number of Records
65
NORMALIZATION
Process of successive reduction of a given collection of relations to some more desirable form. Data items are combined to form relations in such a manner so as to :
Achieve controlled redundancy. Avoid anomalies arising due to Insertion, Updation and Deletion
66
NORMALIZATION (Contd..)
Advantages of Normalization
Improves the maintainability of the data structures. Adding or dropping an entity is a simple matter of adding or deleting a table. Adding an attribute requires only the altering of a table to add a column. Reduces complex user views to a set of small stable data structures. Adding a relationship also requires only the addition of a foreign key column.
67
NORMALIZATION (Contd..)
Advantages of Normalization
Relations are normalized by simplifying them into normal forms. There are mainly three normal forms. First Normal Form. Second Normal Form. Third Normal Form.
68
NORMALIZATION (Contd..)
Project Number
Completion Date
Hours Worked
Completion
Date 7/17 1/12 3/21 7/17 1/12 3/21
Hours Worked 37 30 21 20 15 20
NORMALIZATION (Contd..)
First Normal Form (1NF or FNF)
The intersection of each row and column should contain only one value. Relational tables are automatically in the First Normal Form.
70
NORMALIZATION (Contd..)
First Normal Form (Contd..)
FNF may contain highly redundant data resulting in the following problems : Insertion : A project cannot be entered unless there are employees working on it. Deletion of a record of an only employee working on a project causes project details also to be deleted. Suppose an employee job is to be changed. It involves searching and updating of several records of that employee.
71
Deletion :
Updation :
NORMALIZATION (Contd..)
Second Normal Form (2NF or SNF) :
Relation has to be in First Normal Form Every non-key attribute has to be fully functionally dependent on the primary key. In the above example : Employee Name, Job Code, Job Title are functionally dependent on Employee Number only. Hours-worked is functionally dependent on combination of Employee Number and Project Number. Completion Date is functionally dependent on Project Number only.
DB2_Tr Ver. 1.0.0 04/12/1998 Page 72 of 75 72
NORMALIZATION (Contd..)
Second Normal Form (Contd..)
EMPLOYEE Employee Number 120 121 125 129 Employee Name Jones Thomas James Mark Job Code 01 11 01 08 Job Title Prog. Ana. Prog. Des.
NORMALIZATION (Contd..)
74
NORMALIZATION (Contd..)
Second Normal Form (Contd..)
Insertion : A Job Title cannot be
entered for a Job Code unless there is an employee doing that job.
Deletion :
If there is only one employee doing a particular job, deletion of that record causes job details also to be deleted. If the Job Code is changed for a particular Job Title, it involves a lot of searching and updating of records for that Job.
75
Updation :
NORMALIZATION (Contd..)
Third Normal Form (3NF or TNF) :
Any non-key attribute should not be transitively dependent on the primary key. In the above SNF example : The Job Title is functionally dependent on Job Code and Job Code is functionally dependent on Employee Number. This means that the Job Title is transitively dependent on Employee Number.
76
NORMALIZATION (Contd..)
Third Normal Form (Contd..)
EMPLOYEE Emp. No. 120 121 125 129 Emp. Name Jones Thomas James Mark Job Code 01 11 01 08 JOBS Job Code 01 11 08 Job Title Prog. Ana. Des.
NORMALIZATION (Contd..)
Third Normal Form (Contd..)
HOURS
Employee No. 12 120 120 120 121 125 125 129 129 129 Project No. Hours Worked 01 08 02 01 08 02 09 01 09 08 37 30 21 20 15 20 43 30 20 25
78