Vous êtes sur la page 1sur 18

Normalization in DBMS, notes prepared by Mahendra Patil Normalization:

A technique for producing a set of tables with desirable properties that support the requirements of a user or company.

Process of decomposing relations with anomalies to produce smaller, well-structured relations. Normalisation is a process for deciding which attributes should be grouped together in a relation.

Use to validate and improve logical design to satisfy certain constraints - avoid unnecessary duplication of data. Objective of Normalization: The basic objectives of normalization are: 1) To reduce redundancy which means that information is to be stored only once. 2) To reduce file storage space required by base tables. 3) To reduce the inconsistency caused by redundancy. 4) To make it feasible to represent any relation in the database. 5) To free relations from undesirable insertion, update, and deletion anomalies. Properties of Normalized Relations: a. No data value should be duplicated in different rows unnecessarily. b. A value must be specified (and required) for every attribute in a row.
c.

Each relation should be self-contained. In other words, if a row from a relation is deleted, important information should not be accidentally lost.

d. When a row is added to a relation, other relations in the database should not be affected. e. A value of an attribute in a tuple may be changed independent of other tuples in the relation and other relations.

Normalization in DBMS, notes prepared by Mahendra Patil

Data redundancy and update anomalies:


Problems associated with data redundancy are illustrated by comparing the Staff and Branch tables with the StaffBranch table.

Fig: StaffBranch Table StaffBranch table has redundant data; the details of a branch are repeated for every member of staff. In contrast, the branch information appears only once for each branch in the Branch table and only the branch number (branchNo) is repeated in the Staff table, to represent where each member of staff is located. Tables that contain redundant information may potentially suffer from update anomalies. Types of update anomalies include
1)

insertion

2) deletion
3) 1)

modification/updation

Insert Anomalies: Try to insert details for a new member of staff into StaffBranch.

You also need to insert branch details that are consistent with existing details for the same branch. Hard to maintain data consistency with StaffBranch
2

Normalization in DBMS, notes prepared by Mahendra Patil

2)

Delete Anomalies:

Try to delete details for a member of staff from StaffBranch. You also lose branch details in that row (tuple).

3) Update Anomalies: Try to update the value of one of the attributes of a branch.

You also need to update that information in all the rows about the same branch.

Decomposition of Relations:

Two important properties of decomposition:

Lossless-join property enables us to find any instance of original relation from corresponding instances in the smaller relations. Dependency preservation property enables us to enforce a constraint on original relation by enforcing some constraint on each of the smaller relations.

Staff and Branch relations which are obtained by decomposing StaffBranch do not suffer from these anomalies. Steps in Normalisation: First normal form: Any multivalued attributes (repeating groups) have been removed Second normal form: Any partial functional dependencies have been removed Third normal form: Any transitive dependencies have been removed Boyce/Codd normal form: Any remaining anomalies that result from functional dependencies have been removed Fourth normal form: Any multivalued dependencies have been removed Fifth normal form: Any remaining anomalies have been removed Usually only bother with First to third Following Fig shows process:

Normalization in DBMS, notes prepared by Mahendra Patil

Relationship of Normal Forms:

Normalization in DBMS, notes prepared by Mahendra Patil

The Process of Normalization: Given a relation, use the following cycle


1. 2.

Find out what normal form it is in. Transform the relation to the next higher form by decomposing it to form simpler relations You may need to refine the relation further if decomposition resulted in undesirable properties

3.

First normal form (1NF): A relation is in 1NF if and only if all underlying domains contain atomic values only. Or A table in which the intersection of every column and record contains only one value. Steps from UNF to 1NF 1. Nominate an attribute or group of attributes to act as the key for the unnormalized table.
2.

Identify repeating group(s) in unnormalized table which repeats for the key attribute(s).

Fig: Branch table is not in 1NF

Normalization in DBMS, notes prepared by Mahendra Patil

Second normal form (2NF): A relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on primary key of the relation. 2NF only applies to tables with composite primary keys. Functional dependency: Functional Dependency Describes relationship between attributes in a relation or columns in a table. If A and B are columns of table R, B is functionally dependent on A if each value of A in R is associated with exactly one value of B in R. It is represented by A->B. We are interested in finding such functional dependencies among database relations

Normalization in DBMS, notes prepared by Mahendra Patil

Determinant of a functional dependency refers to attribute or group of attributes on left-hand side of the arrow. If the determinant can maintain the functional dependency with a minimum number of attributes, then we call it fully functional dependency.

1NF to 2NF: Steps: 1. Identify primary key for the 1NF relation. 2. Identify functional dependencies in the relation. 3. If partial dependencies exist on the primary key remove them by placing them in a new relation along with copy of their determinant. For ex:

Normalization in DBMS, notes prepared by Mahendra Patil

Fig: TempStaffAllocation table is not in 2NF

Normalization in DBMS, notes prepared by Mahendra Patil

Normalization in DBMS, notes prepared by Mahendra Patil

Third normal form (3NF): A relation R is in third normal form if it is in 2NF and every non-key attribute of R is non-transitively dependent on primary key of R. For example, consider a table with A, B, and C. If B is functional dependent on A (A-> B) and C is functional dependent on B (B-> C), then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C). If a transitive dependency exists on the primary key, the table is not in 3NF.

2NF to 3NF: Steps: 1. Identify the primary key in the 2NF relation. 2. Identify functional dependencies in the relation. 3. If transitive dependencies exist on the primary key, remove them by placing them in a new relation along with copy of their determinant. For ex:

10

Normalization in DBMS, notes prepared by Mahendra Patil

Fig: StaffBranch table is not in 3NF

11

Normalization in DBMS, notes prepared by Mahendra Patil

Fig: Converting the StaffBranch table to 3NF

12

Normalization in DBMS, notes prepared by Mahendra Patil

Boyce/Codd Normal Form (BCNF): A relation is BCNF every determinant is a candidate key A determinant is an attribute, possibly composite, on which some other attribute is fully functionally dependent For ex: Consider a relation SJT (Student-Subject-Teacher relation)

S Smith Smith Jones

J Math Physics Math

T Prof. White Prof. Green Prof. White

Jones Physics Prof. Brown 1. For each subject(J), each student (S) of that subject taught by only one teacher(T): FD: S,J -> T 2. Each teacher (T) teaches only one subject(J): FD: T -> J 3. Each subject (J) is taught by several teacher: MVD: J -> -> T

There exists a relation SJT with attributes S (student), J (subject) and T (teacher). The meaning of SJT tuple is that the specified student is taught the specified subject by the specified teacher. There are two determinants: (S, J) and T in functional dependency. Anomalies in update: If the fact that Jones studies physics is deleted, the fact that Professor Brown teaches physics is also lost. It is because T is a determinant but not a candidate key.

13

Normalization in DBMS, notes prepared by Mahendra Patil

Fig: relation SJ

Fig: relation TJ

Relations (S, J) and (T, J) are in BCNF because all determinants are candidate keys.

BCNF vs 3NF: It should be noted that most relations that are in 3NF are also in BCNF. Infrequently, a 3NF relation is not in BCNF and this happens only if (a) the candidate keys in the relation are composite keys (that is, they are not single attributes), (b) there is more than one candidate key in the relation, and (c) the keys are not disjoint, that is, some attributes in the keys are common. The BCNF differs from the 3NF only when there are more than one candidate keys and the keys are composite and overlapping.

BCNF: For every functional dependency X->Y in a set F of functional dependencies over relation R, either: Y is a subset of X or, X is a superkey of R 3NF: For every functional dependency X->Y in a set F of functional dependencies over relation R, either: Y is a subset of X or, X is a superkey of R, or Y is a subset of K for some key K of R

For Example: Consider a 3NF schema which is not in BCNF:


14

Normalization in DBMS, notes prepared by Mahendra Patil


Client, Office -> Client, Office, Account Account -> Office Account A B A C Client Joe Mary John Joe Office 1 1 1 2

3NF has some redundancy BCNF does not. Unfortunately, BCNF is not dependency preserving, but 3NF is.

Account A B C

Office 1 1 2

Account A B A C

Client Joe Mary John Joe

No No-trival FDs

15

Normalization in DBMS, notes prepared by Mahendra Patil


Multi-valued Dependency:

Given a relation R with attributes A, B and C. The multi-valued dependence R.A R.B holds the set of B-values matching a given (A-value, C-value) pair in R depends only on the A-value and is independent of the C-value

16

Normalization in DBMS, notes prepared by Mahendra Patil


Fourth Normal Form(4 NF):A relation is in 4NF whenever there exists an multivalued dependence (MVD), say A B, then all attributes are also functionally dependent on A, i.e. A X for all attribute X of the relation For Ex: Relation CTX (not in 4NF) Course Physics Physics Physics Physics Physics Physics Math Math

Teacher Prof. Green Prof. Green Prof. Brown Prof. Brown Prof. Black Prof. Black Prof. White Prof. White

Text Basic Mechanics Principles of Optics Basic Mechanics Principles of Optics Basic Mechanics Principles of Optics Modern Algebra Projective Geometry

A tuple (C, T, X) appears in CTX course C can be taught by teacher T and uses X as a reference. For a given course, all possible combinations of teacher and text appear that is, CTX satisfies the constraint: if tuples (C, T1, X1), (C, T2, X2) both appears, then tuples (C, T1, X2), (C, T2, X1) both appears also. CTX contains redundancy CTX is in BCNF as there are no other functional determinants But CTX is not in 4NF as it involves an MVD that is not an FD at all, let alone an FD in which the determinant is a candidate key Anomalies in insert: For example, to add the information that the physics course uses a new text called Advanced Mechanism, it is necessary to create three new tuples, one for each of the three teachers.

Fig: Relation CT

Fig: Relation CX
17

Normalization in DBMS, notes prepared by Mahendra Patil


4NF is an improvement over BCNF, in that it eliminates another form of undesirable structure Fifth Normal Form (5NF)/ Projection-Join Normal form: Join dependency: relation R satisfies the JD (X, Y,Z) it is the join of its projections on X, Y,Z where X, Y,Z are subsets of the set of attributes of R A relation is in 5NF/PJNF (Projection-join normal form) every join dependency in R is implied by the candidate keys of R 5NF is the ultimate normal form with respect to projection and join. For Ex:

Summary:

Relations are categorized as a normal form based on which modification anomalies or other problems that they are subject to:

18

Vous aimerez peut-être aussi