NORMALIZATION

Relational Database Design
Relational database design: The grouping of attributes to

form "good" relation schemas
Two levels of relation schemas:
The logical "user view" level
The storage "base relation" level
Criteria for "good" base relations:
Discuss informal guidelines for good relational design
Discuss formal concepts of functional dependencies
and normal forms 1NF 2NF 3NF BCNF
There are two popular approaches for designing the db

. Top down design
. Bottom up design
ER modeling technique is called Top down approach it
involves
i) Identifying entities and their attributes
ii) Identifying the relationship between entities
iii)Draw the ER diagram
iv)Mapping diagrams to the tables
Normalization is the bottom up approach. It is step by

step decomposition of complex records into simple records.
Normalization controls the redundancy and removes
inconsistency and update anomalies
Normalization is based on the functional dependency
and primary key
Normalization: The process of decomposing unsatisfactory
"bad" relations by breaking up their attributes into smaller
relations
Normal form: Condition using keys and FDs of a relation to
certify whether a relation schema is in a particular normal
form
Informal design guidelines for relation schemas

1) Semantics of the relation attributes
2) Reducing the redundant values in tuples
3) Reducing the null values in tuples
4) Disallowing the possibility of generating spurious
tuples
Semantics of the Relation Attributes

GUIDELINE 1: Informally, each tuple in a relation
should represent one entity or relationship instance.
Attributes of different entities (EMPLOYEEs,
DEPARTMENTs, PROJECTs) should not be mixed

in the same relation
Only foreign keys should be used to refer to other
entities
Entity and relationship attributes should be kept
apart as much as possible.
Redundant Information in Tuples and Update

Anomalies
GUIDELINE 2:
Mixing attributes of multiple entities may cause
problems
Information is stored redundantly wasting storage
Problems with update anomalies
Insertion anomalies
Deletion anomalies
Modification anomalies
Insert Anomaly: Cannot insert a project unless an

employee is assigned to .
Inversely - Cannot insert an employee unless an
he/she is assigned to a project.
Delete Anomaly: When a project is deleted, it will
result in deleting all the employees who work on that
project. Alternately, if an employee is the sole
employee on a project, deleting that employee would
result in deleting the corresponding project
Update
Anomaly: Changing the name of project

number P1 from Billing to CustomerAccounting may cause this update to be made for
all 100 employees working on project P1.
GUIDELINE
2: Design a schema that does not

suffer from the insertion, deletion and update
anomalies. If there are any present, then note them
so that applications can be made to take them into
account
If a database design is not perfect, it may contain anomalies, which

are like a bad dream for any database administrator. Managing a
database with anomalies is next to impossible.
Update anomalies If data items are scattered and are not linked
to each other properly, then it could lead to strange situations. For
example, when we try to update one data item having its copies
scattered over several places, a few instances get updated properly
while a few others are left with old values. Such instances leave the
database in an inconsistent state.
Deletion anomalies We tried to delete a record, but parts of it
was left undeleted because of unawareness, the data is also saved
somewhere else.
Insert anomalies We tried to insert data in a record that does not
exist at all.
Null Values in Tuples

GUIDELINE 3: Relations should be designed such
that their tuples will have as few NULL values as
possible
Reasons for nulls:
attribute not applicable or invalid
attribute value unknown (may exist)
value known to exist, but unavailable
Spurious Tuples
GUIDELINE 4: The relations should be designed to
satisfy the lossless join condition. No spurious tuples
should be generated by doing a natural-join of any
relations.
There are two important properties of decompositions:
(a)non-additive or losslessness of the corresponding join
(b)preservation of the functional dependencies.
Functional dependency
its a constraint between two set of attributes
from the db.
A F.D denoted by X-> Y between two sets of
attributes x and y that are subsets of R specifies a
constraint on the possible tuples that can form a
relation state r of R
The constraint is that, for any two tuples t1 & t2 in r
t1[x]=t2[x]
t1[y]=t2[y]
R(X,Y)
X
t1
10
t2
10
Y
d1
d1
There is a FD from X to Y or Y is FD on X
FD=> Functional dependency or f.d
X=> L.H.S
Y=> R.H.S
Full Functional dependency

Partial Functional dependency
Transitive dependency
Full Functional dependency
e.g., Eno,Pno-> Hours
Partial Functional dependency
e.g., Eno,Pno->Ename
Transitive dependency
e.g., Eno->Dno
Dno->Dname
Eno->Dname
Given a set of FDs F, we can infer additional FDs that

hold whenever the FDs in F hold
Armstrong's inference rules:
IR1. (Reflexive) If Y subset-of X, then X -> Y
IR2. (Augmentation) If X -> Y, then XZ -> YZ
(Notation: XZ stands for X U Z)
IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z
Some additional inference rules that are useful:
IR4. (Decomposition) If X -> YZ, then X -> Y and X -> Z
IR5. (Union) If X -> Y and X -> Z, then X -> YZ
IR6. (Psuedotransitivity) If X -> Y and WY -> Z, then
WX -> Z
Trivial , Non trivial
Trivial If a functional dependency (FD) X Y

holds, where Y is a subset of X, then it is called a
trivial FD. Trivial FDs always hold.
Non-trivial If an FD X Y holds, where Y is not a
subset of X, then it is called a non-trivial FD.
Completely non-trivial If an FD X Y holds,
where x intersect Y = , it is said to be a completely
non-trivial FD.
Candidate key
If a relation schema has more than one key, each is called
a candidate key. One of the candidate keys is arbitrarily
designated to be the primary key, and the others are called
secondary keys.
Prime and Non prime attribute
A Prime attribute must be a member of some candidate
key
A Nonprime attribute is not a prime attributethat is, it is
not a member of any candidate key.
Normalization of data is a process of analyzing the given

relation schemas based on their FD & primary keys to
achieve the desirable properties
1) Minimizing redundancy
2) Minimizing the insertion , deletion and modification
anomalies
Normal forms
1NF, 2NF, 3NF, BCNF(Boyce Codd Normal Form),4NF
and 5NF
1NF-
is based on primary key and atomic values and there

must be no composite attributes, multivalued attributes and
relation with in relation.
Composite attribute
Eno
Address
Ename
Fname
Lname
Eno
Address
Fname Lname
Multivalued Attribute
Dno
Dno
Multivalued
Attribute
Dname Dlocation
Dname
Dno Dlocation
Relation with in Relation

Eno Ename Addr
Eno Ename
Pno
Pname
Eno Pno
Pname
2NF - There is no partial dependency.

It is based on the concept of full functional dependency and
non key attribute should be fully dependent on the key
attribute.
A F.D X->Y if fully F.D
Def: A rs R is in 2NF if every non prime attribute A in R is full
FD on the primary key of R
Eg.
R={eno, pno, hours, ename, pname, plocation}
Given functional dependency
FD = {{eno,pno}-> hours,
eno->ename
pno->pname, plocation}
R1={eno,pno,hours}
R2 = {eno,ename}
R3={pno,pname,plocation}
now all the relations R1, R2 and R3 are in full functional
dependency.
3NFIt is based on the concept of transitive dependency

Def: A rs R is in 3NF if it satisfies 2 NF and no non
prime attribute of R is transitively dependent on the
primary key
Def: A rs R is in 3NF if, When ever a non trivial FD
X-> A holds in R, either
a) X is a super key of R (or)
b) A is a prime attribute of R
Eg:
R={eno, ename, address,dno,dname}
Given functional dependency
F = {eno -> ename,address,dno
dno -> dname}
R1={eno,ename,address,dno} R2 = {dno,dname}
BCNF(Boyce codd Normal Form)

Def:
A rs R is in BCNF if when ever a non trivial FD
X-> A holds in R, then X is a super key of R
Closure of a Set of Functional

Dependencies
Closure of a Set of Attributes
Redundancy of FDs
Canonical Cover
Example of Computing a
Canonical Cover
Finding Keys
3. Multivalued Dependencies and Fourth Normal

Form (1)
(a) The EMP relation with two MVDs: ENAME >> PNAME and ENAME >> DNAME. (b)
Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.
Multivalued Dependencies and Fourth Normal

Form (2)
Definition:
A multivalued dependency (MVD) X >> Y specified on

relation schema R, where X and Y are both subsets of R,
specifies the following constraint on any relation state r of R: If
two tuples t1 and t2 exist in r such that t1[X] = t2[X], then two
tuples t3 and t4 should also exist in r with the following
properties, where we use Z to denote (R 2 (X Y)):

t3[X] = t4[X] = t1[X] = t2[X].
t3[Y] = t1[Y] and t4[Y] = t2[Y].
t3[Z] = t2[Z] and t4[Z] = t1[Z].
An MVD X >> Y in R is called a trivial MVD if (a) Y is a
subset of X, or (b) X Y = R.

Form (4)
Definition:
A relation schema R is in 4NF with respect to a set of
dependencies F (that includes functional dependencies
and multivalued dependencies) if, for every nontrivial
multivalued dependency X >> Y in F+, X is a superkey
for R.

Form (5)
Decomposing a relation state of EMP that is not in 4NF. (a) EMP relation with additional tuples. (b)
Two corresponding 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.
4. Join Dependencies and Fifth Normal Form (1)

Definition:
A join dependency (JD), denoted by JD(R1, R2, ..., Rn),

specified on relation schema R, specifies a constraint on the
states r of R. The constraint states that every legal state r of R
should have a non-additive join decomposition into R1, R2, ...,
Rn; that is, for every such r we have
* (R1(r), R2(r), ..., Rn(r)) = r
A join dependency JD(R1, R2, ..., Rn), specified on relation

schema R, is a trivial JD if one of the relation schemas Ri in
JD(R1, R2, ..., Rn) is equal to R.
Join Dependencies and Fifth Normal Form (2)

Definition:
A relation schema R is in fifth normal form (5NF) (or
Project-Join Normal Form (PJNF)) with respect to a
set F of functional, multivalued, and join dependencies
if, for every nontrivial join dependency JD(R1, R2, ...,
Rn) in F+ (that is, implied by F), every Ri is a superkey
of R.
Relation SUPPLY with Join Dependency and

conversion to Fifth Normal Form
(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3). (d)
Decomposing the relation SUPPLY into the 5NF relations R1, R2, and R3.
Steps to find Minimal Cover

Singleton attributes in RHS
Identify extraneous attributes and remove it
Remove redundant dependencies
AB->CD
The above functional dependency should be
decomposed to singleton attributes in the RHS as below.
AB-> C and
AB-> D

If an attribute doesnt give any meaning to the functional
dependency, we say it as extraneous and remove it
Consider the functional dependencies
A-> B
If the LHS has more than one attribute, check whether there exists an
AB-> C extraneous( extra/unwanted) attribute if so, remove it.
D-> AC LHS which have 2 attributes is AB-> C
D-> E
+
A = ABC ,
B+ = B [Reflexivity]
If an attribute closure gives only its own attribute by satisfying

reflexivity, that attribute in the functional dependency is
extraneous.
B is extraneous in AB-> C implies A-> C
Finding Redundant Dependency

Consider the functional dependencies
A-> B
Step 2: In LHS there is no extraneous attribute
A-> C
Step 3: Remove redundant dependencies
D-> AC
1.Remove A-> B and find the attribute closure for A
A+ =AC[here if we are not consider A-> B , B cant be
D-> E
Step 1: Apply
singleton to RHS
A-> B
A-> C
D-> A
D-> C
D-> E
found in A+, so A-> B cant be a redundant dependency.

2. Remove A-> C and find the attribute closure for A
A+ =AB[here if we are not consider A-> C , C cant be
found in A+, so A-> C cant be a redundant dependency.
3. Remove D-> A and find the attribute closure for D
D+ =DCE[here if we are not consider D-> A , A cant be
found in D+, so D-> A cant be a redundant dependency
4. Remove D-> C and find the attribute closure for D
D+ =DAEC[here if we are not consider D-> C , C could be
found in D+, so D-> C is the redundant dependency so it should be
removed. Then the FDs are A-> B, A-C , D->A, D->E
5. Remove D-> E and find the attribute closure for D
D+ =DABC [here if we are not consider D-> E , E cant be
found in D+, so D-> E cant be a redundant dependency
So, Minimal cover will be after removing

a) Extraneous Attributes
b) Redundant Dependencies
Minimal Functional Dependencies are

A-> B
A-> C
D-> A
D->E
Find a Minimal Cover

R(A
B C D E)
F ={ A->D,
BC-> AD,
C->B,
E->A,
E->D}
Steps:
Remove redundant dependencies
R(A B C D E)
F ={ A->D,
BC-> AD,
C->B,
E->A,
E->D}
Singleton attributes in
RHS
F={ A->D,
BC->A,
BC->D,
C->B,
E->A,
E->D}

F={ A->D,
BC->A,
BC->D,
C->B,
E->A,
E->D}
F={ A->D,
C->A,
C->D,
C->B,
E->A,
E->D}
Remove redundant FDs

F={ A->D,
C->A,
C->D,
C->B,
E->A,
E->D}
F={ A->D,
C->A,
C->B,
E->A,
}
Equivalence of sets of FDs

Two
sets of FDs E and F

F is said to cover E if every FD in E is also
in closure of F
E and F are equivalent
if
E covers F and F covers E

E+ = F+

NORMALIZATION

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

NORMALIZATION

Transféré par

Droits d'auteur :

Formats disponibles

Relational Database Design

Relational database design: The grouping of attributes to

There are two popular approaches for designing the db

Normalization is the bottom up approach. It is step by

Informal design guidelines for relation schemas

Semantics of the Relation Attributes

DEPARTMENTs, PROJECTs) should not be mixed

Redundant Information in Tuples and Update

Insert Anomaly: Cannot insert a project unless an

Anomaly: Changing the name of project

2: Design a schema that does not

If a database design is not perfect, it may contain anomalies, which

Null Values in Tuples

Full Functional dependency

Given a set of FDs F, we can infer additional FDs that

Trivial , Non trivial

Trivial If a functional dependency (FD) X Y

Normalization of data is a process of analyzing the given

is based on primary key and atomic values and there

Relation with in Relation

2NF - There is no partial dependency.

3NFIt is based on the concept of transitive dependency

BCNF(Boyce codd Normal Form)

Closure of a Set of Functional

Closure of a Set of Attributes

3. Multivalued Dependencies and Fourth Normal

Multivalued Dependencies and Fourth Normal

A multivalued dependency (MVD) X >> Y specified on

properties, where we use Z to denote (R 2 (X Y)):

Multivalued Dependencies and Fourth Normal

Multivalued Dependencies and Fourth Normal

4. Join Dependencies and Fifth Normal Form (1)

A join dependency (JD), denoted by JD(R1, R2, ..., Rn),

A join dependency JD(R1, R2, ..., Rn), specified on relation

Join Dependencies and Fifth Normal Form (2)

Relation SUPPLY with Join Dependency and

Steps to find Minimal Cover

Identify extraneous attributes and remove it

If an attribute closure gives only its own attribute by satisfying

Finding Redundant Dependency

found in A+, so A-> B cant be a redundant dependency.

So, Minimal cover will be after removing

Minimal Functional Dependencies are

Find a Minimal Cover

Identify extraneous attributes and remove it

Remove redundant FDs

Equivalence of sets of FDs

sets of FDs E and F

E covers F and F covers E

Vous aimerez peut-être aussi