Vous êtes sur la page 1sur 9

Database Design - Were Are We?

UNIT 4
Schema Refinement and
Normal Forms

Weve learned that databases are important


Weve learned how to create a conceptual design using
ER diagrams
n Weve learned how to create a logical design by turning
the ER diagrams into a relational schema including
minimizing the data and relations created
n Now we need to refine that schema to reduce
duplication of information
n

Text :
n 3rd edition: Chapter 19, sections 19.1-19.6,
or
n 2nd edition: Chapter 15 sections 15.1-15.7

G. Tsiknis

G. Tsiknis

Unit 4

The Evils of Redundancy

Functional Dependencies (FDs)

n Despite our best efforts, tables obtained from ER model by the methods

discussed may have redundant information.


n This is because some attributes in a table depend on other attributes that
are not the key

n To say that the values of A, B, C together determine the

value of D, we write:
A, B, C D
A, B, C D is a functional dependency
A,B,C is a determinant
D is said to depends on A,B,C
Sometimes written as {A,B,C} D or A B C D

called functional dependencies ( a kind of integrity constraint )


used to identify schemas with such problems and to suggest refinements.
n Redundancy is at the root of several problems :
redundant storage, insert/delete/update anomalies
n What should we do?
split a table into more tables with less attributes (called decomposition)

n FDs are a special kind of integrity constraint


n Were most interested in cases where theres a single

n Must be careful:
Wrong decomposition may loose information

attribute on the RHS


n The most uninteresting cases are the trivial cases:
e.g. A,B,C A

Unit 4

G. Tsiknis

Unit 4

G. Tsiknis

Functional Dependencies, cont.

Example: FD Constraints on Entity Set


n Consider a Student schema for UofU :

n A functional dependency X Y holds over relation R if,

for every allowable instance r of R and every two tuples t1,


t2 in r :
if t1.X = t2.X then
t1.Y = t2.Y

i.e., given two tuples in r, if the X values agree, then the Y values
must also agree. (X and Y are sets of attributes.)
informally, precisely one Y-value is associated with each X value

Must be identified based on semantics of application


Given some instance r1 of a table R, we can check if r1 satisfies or
violates some FD f, but we cannot tell from r1 if f must hold over R!

G.
Jones
G.
92001200
Smith
A.
94001020
Smith

However, K R-K does not require K to be minimal!


G. Tsiknis

n Problems due to city acode :

n Solution: split Student into:

Student

name

G.
Jones
G.
92001200
Smith
A.
94001020
Smith
Unit 4
99111120

address
1234 W.
12th Ave.
2020 E.
th
18 St.
2020 E.
18th St.

th

1234 W. 12
Ave.
2020 E. 18 th
St.
2020 E. 18 th
St.

city

acode phone

Victoria

250

Van

604

Van

604

major

889-4444

CPSC

409-2222

MATH

222-2222

CPSC

G. Tsiknis

Unit 4

n Given some FDs , we can usually infer additional FDs :

Update anomaly: Can we change area_code in 2nd tuple?


Insertion anomaly: What if we want to insert the information that
Richmond has area 604?
Can we insert a student when we dont know the area code for the city ?
Deletion anomaly: If we delete all students in Victoria, we lose the area
code for that city !

sid

address

Reasoning About FDs

Example (Contd.)

name

99111120

n K is a superkey for R means that K R-K ( or K R)


Unit 4

sid is the key: sid name, address, city, acode, phone, major
city determines area_code : city acode

sid

n An FD is a statement about all allowable instances.

Student (sid, name, address, city, acode, phone, major)

n Some FDs on Student:

phone

Vict

889-4444

CPSC

Van

409-2222

MATH

Van

222-2222

CPSC

G. Tsiknis

major

city

sid acode

FDs in F hold.

closure of F : the set of all FDs that are implied by F.

n Armstrongs Axioms (X, Y, Z are sets of attributes):

ACode
city

sid city, city acode implies

n An FD f is implied by a set of FDs F if f holds whenever all

acode

Vict 250

Reflexivity: If Y X, then X Y
Augmentation: If X Y, then X Z Y Z for any Z
Transitivity: If X Y and Y Z, then X Z

n These are sound and complete inference rules for FDs.

Van 604

Unit 4

G. Tsiknis

Reasoning About FDs (Contd)

Example: Supplier-Part DB
n Suppliers supply parts to projects.

n Couple of additional rules (that follow from AA):

supplier attributes: s#, sname, city, status


part attributes: p#, pname
supplier-part attributes: qty

Union: If XY and XZ, then XY Z


Decomposition: If XY Z, then XY and XZ

n Functional dependencies:

n Example: Derive union rule from axioms


1. XY
given
2. XZ
3. XX Y
4. X YZ Y
5. XZ Y
Unit 4

df1:
fd2:
fd3:
fd4:

given
1, augmentation
2, augmentation
3, 4, transitivity
G. Tsiknis

Example: Supplier-Part DB (cont)


n

Exercise: Show that (s#, p#) is


the primary key of
SupplierPart1(s#, p#, status, city, qty)
Proof:
a. Show: s#, p# is a superkey
1. s#, p# --> s#, p#
2. s# --> city
3. s# --> status
4. s#,p# --> city, p#
5. s#,p# --> status, p#
6. s#,p# --> s# , p#, status
7. s#,p# --> s# , p#, Status, City
8. s#,p# --> s# , p#, Status, City, Qty
Unit 4

G. Tsiknis

df1:
fd2:
fd3:
fd4:

Unit 4

s# --> sname, city


city --> status
p# --> pname
s#, p# --> qty

G. Tsiknis

10

Example: Supplier-Part DB (cont)


s# --> sname, city
city --> status
p# --> pname
s#, p# --> qty

reflex
fd1, decomp
2, fd2, trans
2, aug
3, aug
1, 5, union
4, 6, union
7, fd4, union

b. Show: (s#, p#) is a candidate key


1. p# does not appear on the RHS of any FD therefore except
for p# itself, nothing else determines p#
3. in particular, s# --> p# does not hold
4. therefore, s# is not a superkey
5. similarly, p# is not a superkey
c. Show: (s#, p#) is the only candidate key
1. from the proof in (b), any superkeymust consist of at least
s# and p#
2. but any superkey which is a strict superset of s# and p#
cannot be a candidate key because of the proof in (a) (i.e.
(s#,p#) is one).

11

Unit 4

G. Tsiknis

12

Lets Try It With Closure

Example Using Closure

n An easy way to show that X Y holds on a table is to use the Closure .


n Closure for a set of attributes X is denoted X +
n X + includes all attributes which can be shown to depend on X by the FD

SupplierPart(s#,sname,city,status,p#,pname,qty)
n fd1: s# sname, city
n fd2: city status
n fd3: p# pname
n fd4: s#, p# qty

of that table and the axioms.


So, if Y is in X + , then X Y holds on that table
n Algorithm for finding Closure of X:
Let Closure = X
Until Closure doesnt change do
if a1, , anC is a FD and {a1, ,an} Closure
Then add C to Closure

{s#}+ =
{p#}+ =
{s#,p#}+ =

G. Tsiknis

Unit 4

13

Normal Forms

Unit 4

14

1NF & 2NF (First and Second Normal Form)

n Provide guidance for table refinement.

n A table is in 1NF if every field contains atomic values ( no

n Four important normal forms:


first normal form(1NF)
second normal form (2NF)
third normal form (3NF)
Boyce-Codd Normal Form (BCNF)

sets of values)
n A table is by definition in 1NF:
All fields have single values
Atomicity is subjective, within reason (e.g., yyyy-mm-dd, 123 Main
Street)
But, tables cannot have 2 entries for the same cell (e.g., for
author, you cant enter Raghu Ramakrishnan & Johannes
Gehrke in the same cell)

n If a relation is in a certain normal form, it is known that certain kinds of

problems are avoided/minimized.


It can be used to decide whether decomposition will help.
n Role of FDs in detecting redundancy:
Consider a relation R with 3 attributes, A B C.
o No FDs hold: There is no redundancy here.
o Given A B: Several tuples could have the same A value, and if so, theyll all
have the same B value!

n Normalization : the process of removing redundancy from data


Unit 4

G. Tsiknis

G. Tsiknis

15

n
n
n
n

1NF is not good enough


2NF is more restrictive, but not very restrictive either
Historical interest only
Well skip it

Unit 4

G. Tsiknis

16

Decomposition of a Relation Schema

Boyce Codd Normal Form (BCNF)

n Suppose that relation R contains attributes A1 ... An. A

n A table R is in BCNF if the following is true:

If a1, ..., an b is a non-trivial dependency in R,


then {a1, ..., an} is a superkey for R
n In BCNF, all determinants must be superkeys

decomposition of R consists of replacing R by two or more


relations such that:

In English: There is no attribute of R that is determined by a set of


attributes that is not a key of R.
n Why is this a problem?
If R is not in BCNF, it has redundancies
n Example:
SupplierPart1(s#, p#, status, city, qty)
Is it In BDNF?

n Intuitively, decomposing R means we will store instances


df1:
fd2:
fd3:
fd4:

s# --> sname, city


city --> status
p# --> pname
s#, p# --> qty

n How can we solve this problem :


we need to decompose the table to break the dependencies that violate
BCNF
what is the right way to do that?
Unit 4

Each new relation schema contains a subset of the attributes of R


(and no attributes that do not appear in R), and
Every attribute of R appears in at least one of the new relations.

G. Tsiknis

17

of the relation schemas produced by the decomposition,


instead of instances of R.
n e.g., Can decompose
SupplierPart1(s#, p#, status, city, qty)
into
Supplier1(s#, status, city) and SP(s#, p#, qty)
G. Tsiknis

Unit 4

Problems with Decompositions

18

Lossless-Join Decompositions
n Definitions: If r, r1, r2 are relations and X, Y are sets of attributes:
X(r) is the part of r that contains the attributes X (vertical slice)
r1 >< r2 is the join of the two relations; i.e. each tuple of r1 is joint with
every tuple in r2 that has the same values on the common attributes.

n There are three potential problems to consider:


Some queries become more expensive.
o e.g., Which parts are supplied form Vancouver?

Given instances of the decomposed relations, we may not be able to


reconstruct the corresponding instance of the original relation!

n Decomposition of R into X and Y is lossless-join w.r.t. a set of FDs F if,

for every instance r that satisfies F:

o Fortunately, not in this example

Checking some dependencies may require joining the instances of the


decomposed relations.

o Fortunately, not in thisexample

if we JOIN the X-part of r with the Y-part of r the result is exactly r


It is always true that r is a subset of the JOIN of its X-part and Y-part
In general, the other direction does not hold! If it does, the decomposition is
lossless-join.

n Definition extended to decomposition into 3 or more relations in a

straightforward way.

n Trade-off: Must consider these issues vs. redundancy.

n All decompositions used to deal with redundancy must be lossless!

(Avoids Problem (2).)


Unit 4

G. Tsiknis

19

Unit 4

G. Tsiknis

20

Example of Lossy-Joint Decomposition

Lossless-Join Check
n The decomposition of R into X and Y is lossless-join wrt

A
1
4
7

B
2
5
2

C
3
6
8

decom
pose

A
1
4
7

B
2
5
2

B
2
5
2

C
3
6
8

F if and only if F implies that:

join

A
1
4
7
1
7

B
2
5
2
2
2

X n Y X, or
X n Y Y
I.e. the common attributes of X and Y contain a key for either X or
Y

C
3
6
8
8
3

n Therefore, if

Ub hold over R and b is not in U (i.e not a trivial FD)


then
the decomposition of R into R-{b} and U {b}
is lossless-join.
n Pictorially:
Others

Unit 4

G. Tsiknis

21

G. Tsiknis

Unit 4

22

Decomposition into BCNF(cont)

Decomposition into BCNF

n Example:

n Let R be a relation with attributes A, and FD be a set of

Relation: (C, S, J, D, P, Q, V ),
FDs: C S J D P Q V, J PC, S DP, JS
n Is this in BCNF?

FDs on R s.t. all FDs determine a single attribute


n Pick any f FD that violates BCNF of the form Xb
n Decompose R into two relations: R 1(A-b) & R 2(X b)
n Recurseon R 1 and R 2 using FD

C+ =
(S D)+ =

(JP)+ =
J+=

n Decomposition:

(C, S, J, D, P, Q, V )
Remember: For all non-trivial functional dependencies Xb,
X must be a superkey for a relation to be in BCNF

(C, S, J, D, Q, V )
(C, J, D, Q, V )

(S, D, P)

(J, S)

n In general, several dependencies may cause violation of BCNF. T he

order in which we ``deal with them could lead to very different sets of
relations!
Unit 4

G. Tsiknis

23

Unit 4

G. Tsiknis

24

Another BCNF Example

Properties of BCNF Decomposition

Relations: R(J,K,L,M,N)
FD: JKL, KLM, LMN, MNJ, NJK

n BCNF decomposition is always possible


n Such decomposition eliminated redundancies

BUT
n Such decomposition may not preserve the dependencies

Is this in BCNF?

n A composition of relation R with FD s F to relations X and

Y is dependency-preserving decomposition if
for any FD g, if g is implied from F, g is also implied
from the restrictions of F to X and Y, and vice versa.
n Example:
Relation: (company, plant, product)
FDs: plant company
company, product plant
Decomposition: (plant, company) (plant, product)
G. Tsiknis

Unit 4

25

So Whats the Problem?


company

plant

product

Oshava-GM

GM

St. Catharines-GM

GM

Oshava-GM
St. Catharines-GM

engine
engine

n A relation R is in 3NF if:

If a 1, ..., a n b is a non-trivial dependency in R, then


}BCNF
either {a1, ..., a n} is a superkey for R
or b is part of a key.
n Note: b has to be part of a key not a superkey

No problem so far. All local FDs are satisfied.

(in any table, all attributes are part of a superkey)

Lets put all the data back into a single table again:
company

plant

product

GM

Oshava-GM

engine

GM

St. Catharines-GM

engine

n Example: R(company, plant, product)

plant company
company, product plant
Keys: {company, product}, {plant,Product}
Is it in BCNF? 3NF?
FDs :

Violates the FD: company, product plant


G. Tsiknis

26

3NF : A Decomposition that Preserves Dependencies

plant

Unit 4

G. Tsiknis

Unit 4

27

Unit 4

G. Tsiknis

28

Minimal Cover for a Set of FDs

3NF vs BCNF
n Any relation with 2 attributes is in 3NF and in BCNF

n Transforms FDs to be as small as possible.

there are no other attributes to violate the rules.

n Minimal cover G for a set of FDs F:

n If a relation R is in BCNF it is also in 3NF.


n A 3NF relation R may not be in BCNF if all 3 of the

following conditions are true:

a. R has multiple candidate keys


b. candidate keys are composite (i.e. not single-attributed)
c. these keys overlap

Closure of F = closure of G
Right hand side of each FD in G is a single attribute
If we modify G by deleting an FD or by deleting attributes from an FD
in G, the closure changes.

n Intuitively, every FD in G is needed, and is``as small as

n Its always possible to put a set of relations into BCNF,

possible in order to get the same closure as F.

but some times it is desirable to leave them in 3NF:


BCNF decomposition does not preserve all dependencies
3NF may retain some redundancies but preserves all
dependencies

To decompose into 3NF we rely on the minimal cover


Unit 4

G. Tsiknis

29

Unit 4

1. Put FDs in standard form (have only one attribute on

n Can use, the BCNF algorithm and stop earlier?


Yes, but we have to consider minimal dependencies (see below)
Does not work with general dependencies
n Decomposition into 3NF:
Given the FDs F compute F the minimal cover for F
For each dependency D in F that violates the 3NF split the table as we did
in the BCNF algorithm
After each decomposition identify the set of dependencies N in F that are
not preserved by the decomposition.
For each Xb in N create a relation R (X, b) and add it to the
decomposition
n Synthesis into 3NF :
Get the minimal cover of FDs F
For each XA in F, add the relation X A to the decomposition.
To make it lossless-join, add relations that have attributes that appear in
some keys

RHS)
2. Minimize LHS of each FD
3. Delete Redundant FDs
Example:
AB, ABCDE, EFG, EFH, ACDF EG

G. Tsiknis

30

Decomposition into 3NF

Finding minimal covers of FDs

Unit 4

G. Tsiknis

31

Unit 4

G. Tsiknis

32

Normalization and Design: Comments

Denormalization

n Most organizations go to 3NF or better


n If a relation has only 2 attributes, it is automatically in BCNF
n Our goal is to use lossless-join for all decompositions and

n Process of intentionally violating a normal form to gain


performance improvements
n Performance improvements:
Fewer joins
Reduces number of foreign keys

preserve dependencies
n 3NF may not be satisfactory if a relation has multiple &

overlapping keys
n BCNF decomposition is always lossless, but may not
preserve dependencies
n Good heuristic :

Since FDs are often indexed, the number of indexes many be


reduced

n Useful if certain queries often require (joined) results, and


the queries are frequent enough

Try to ensure that all relations are in at least 3NF


Check for dependency preservation
Unit 4

G. Tsiknis

33

Unit 4

G. Tsiknis

34

Vous aimerez peut-être aussi