Vous êtes sur la page 1sur 55

Chapter 15

Functional Dependencies and Normalization for


Relational Databases

Week #3

DAT604 Database Design and Impl


Limitations of E-R Designs
• ER approach is a good way to start dealing with the
complexity of modeling a real world enterprise
• Provides a set of guidelines, does not result in a unique
database schema
• Does not provide a way of evaluating alternative schemas
• Relational normalization theory provides a mechanism for
analyzing and refining the schema produced by an E-R
design

DAT604 Database Design and Impl 2


Theory and Methodology
What is relational database design?
• The grouping of attributes to form "good" relation
• Collection of relations to form good schemas
Why one schema is better than other?

Two levels at which goodness of a relation schemas can be


discussed
• The logical "user view" level (logical or Conceptual)
• The storage "base relation" level (Implementation or
Storage)
Design theory is concerned mainly with base relations
DAT604 Database Design and Impl 3
Informal Design Guidelines for
Relation Schemas
Informal measures of quality for schema design;

Semantics of the attributes


• Imparting clear semantics to attributes in relations
• Attributes belonging to one relation have certain real
world meaning
• Easier to explain the semantics of the relation the
better the relation schema design will be

DAT604 Database Design and Impl 4


Semantics of the Relation Attributes
Simple

Complex

DAT604 Database Design and Impl 5


Semantics of the Relation Attributes
(Continued . . .)
Simple Semantics

Simple Semantics Implicit Relationship

Complex Semantics
Multivalued Attribute

Simple Semantics

Complex Semantics

M:N Relationship between


EMPLOYEE and PROJECT

DAT604 Database Design and Impl 6


GUIDLINE # 1
• Design relation schema that is easy to explain
• Attributes from different entity types and relationship types
should not be mixed in the same relation
• Only foreign keys should be used to refer to other entities
• Entity and relationship attributes should be kept apart as
much as possible.
• Bottom Line:
• Design a schema that can be explained easily relation by
relation.
• The semantics of attributes should be easy to interpret.

DAT604 Database Design and Impl 7


Informal Design Guidelines for
Relation Schemas
Informal measures of quality for schema design;

• Reducing the redundant information in tuples


• Goal of schema design is to minimize the storage
space used by base relations
• Grouping attributes into relation schema has a
significant effect on storage

DAT604 Database Design and Impl 8


Semantics of the Relation Attributes
(Continued . . .)
Additional Information
POOR DESIGN

CLEAR SEMANTICS

DAT604 Database Design and Impl 9


Redundant Information in Tuples

Minimize the storage


space for the
base relation

DAT604 Database Design and Impl 10


PROBLEMS
• Mixing attributes of multiple entities may cause problems
• Using relation with redundant information leads to serious
problem of update anomalies

Problems with update anomalies


• Insertion anomalies
• Deletion anomalies
• Modification anomalies

DAT604 Database Design and Impl 11


Update Anomalies
INSERTION ANOMALIES
Insert a new employee OR Insert a new department

DAT604 Database Design and Impl 12


Update Anomalies
DELETION ANOMALIES
Delete last employee working in the department

DAT604 Database Design and Impl 13


Update Anomalies
MODIFICATION ANOMALIES
Change the manager of department 5

DAT604 Database Design and Impl 14


GUIDELINE #2
• Design the base relation schemas without any anomalies

• Note anomalies and make sure the updates are handled


correctly

May still violate some guidelines


To improve performance of some queries

DAT604 Database Design and Impl 15


Decomposition
• Redundancy is at the root of several problems associated
with relational schemas:
• Redundant storage and Relation with anomalies.
Solution:
• Use two relations to store information instead of just
one EMP_DEPT relations
• EMPLOYEE relation for employee information
• DEPARTMENT relation for department information
• No update anomalies:
• Certain attributes will be stored once

DAT604 Database Design and Impl 16


Redundant Information in Tuples

DAT604 Database Design and Impl 17


GUIDELINE #3
• Avoid placing attributes in base relation whose values may
frequently be null
• Relations should be designed such that their tuples will
have as few NULL values as possible
• If nulls are unavoidable make sure that they do not apply to
a majority of tuples in the relation
• Attributes that are NULL frequently could be placed in
separate relations (with the primary key)

DAT604 Database Design and Impl 18


SPURIOUS TUPLES

DAT604 Database Design and Impl 19


SPURIOUS TUPLES

DAT604 Database Design and Impl 20


Spurious Tuples
• There are two important properties of decompositions:
a) Non-additive or losslessness of the corresponding join
b) Preservation of the functional dependencies.

• Note that:
• Property (a) is extremely important and cannot be
sacrificed.
• Property (b) is less stringent and may be sacrificed.

DAT604 Database Design and Impl 21


GUIDELINE #4
• Design relation schemas so that so that they can be JOINed
with equality conditions on attributes that are either primary
keys or foreign keys
• Do not have relations that contain matching attributes other
than Foreign key – Primary key combination
• Do not JOIN on these attributes
• The relations should be designed to satisfy the lossless join
condition.
• No spurious tuples should be generated by doing a natural-
join of any relations

DAT604 Database Design and Impl 22


NORMALIZATION THEORY
• Result of E-R analysis need further refinement
• Appropriate decomposition can solve problems
• The underlying theory is referred to as normalization theory
and is based on functional dependencies (and other kinds,
like multivalued dependencies)
• Normalization theory is also called Theory of
decomposition
• The single most important concept in relational schema
design theory is that of a functional dependency

DAT604 Database Design and Impl 23


FUNCTIONAL DEPENDENCIES
• Functional dependencies (FDs) are used to specify
formal measures of the "goodness" of relational designs

• FDs is a constraint between two sets of attributes from


the database

• FDs and keys are used to define normal forms for


relations

DAT604 Database Design and Impl 24


FUNCTIONAL DEPENDENCIES
• A FD ( X  Y ) between two sets of attributes X and
Y that are subsets of relation R

• A set of attributes X functionally determines a set of


attributes Y if the value of X determines a unique
value for Y

Constraint:
t1[X] = t2[X]
Must also t1[Y] = t2[Y]

DAT604 Database Design and Impl 25


FUNCTIONAL DEPENDENCIES
• Address  ZipCode
• Stony Brook’s ZIP is 11733
• ArtistName  BirthYear
• Picasso was born in 1881
• Autobrand  Manufacturer, Engine type
• Pontiac is built by General Motors with gasoline engine
• Author, Title  PublDate
• Shakespeare’s Hamlet published in 1600

DAT604 Database Design and Impl 26


FUNCTIONAL DEPENDENCIES
• If a constraint on R states that there cannot be more than
one tuple with a given X-value in any relation instance r(R)

• X is a candidate key of R
• Implies X  Y for any subset of attributes Y of R

• If X  Y in R, this does not say whether or not Y  X in


R
Legal extensions of R
Legal relation states

DAT604 Database Design and Impl 27


FUNCTIONAL DEPENDENCIES
(Continued …)

The following dependencies should hold;

• SSN  ENAME
• PNUMBER  {PNAME, PLOCATION}
• {SSN, PNUMBER}  HOURS

DAT604 Database Design and Impl 28


FUNCTIONAL DEPENDENCIES

• Consider a brokerage firm that allows multiple clients to share an


account, but each account is managed from a single office and a client
can have no more than one account in an office

• HasAccount (AcctNum, ClientId, OfficeId)


• keys are (ClientId, OfficeId), (AcctNum, ClientId)
• Client, OfficeId  AcctNum
• AcctNum  OfficeId
• Thus, attribute values need not depend only on key
values

DAT604 Database Design and Impl 29


FUNCTIONAL DEPENDENCIES
• A FD is a property of the relation schema not of a
particular legal relation state

• A FD is a property of the attributes in the schema R


• The constraint must hold on every relation instance
r(R)
• FDs are derived from the real-world constraints on
the attributes
• Must be defined by someone who knows the
semantics of the attributes

DAT604 Database Design and Impl 30


FUNCTIONAL DEPENDENCIES
TEACHER  COURSE True/False
TEXT  COURSE True/False

DAT604 Database Design and Impl 31


FUNCTIONAL DEPENDENCIES

DAT604 Database Design and Impl 32


Normalization
• The process of decomposing unsatisfactory "bad" relations by
breaking up their attributes into smaller relations
• Provides a framework for analyzing relation schemas based
on keys and FDs
Assumptions:
• A set of Functional dependencies is given
• Primary key
Desirable properties :
• Minimize redundancy
• Minimizing the update anomalies

Relation Schemas that fail Normal Form Tests are


decomposed into smaller relation schemas
DAT604 Database Design and Impl 33
Normalization
ADDITIONAL PROPERTIES
• The Lossless Join or Non-additive Join property
• No spurious tuples generated after decomposition

• The Dependency Preservation Property


• Each functional dependency is represented in some
relation after decomposition

DAT604 Database Design and Impl 34


NORMALIZATION

DAT604 Database Design and Impl 35


FIRST NORMAL FORM
• Each normal form is a set of conditions on a schema that
guarantees certain properties (relating to redundancy and
update anomalies)
It states that;
• The domain of an attribute must include only atomic values
• The value of any attribute in a tuple must be a single value
from the domain of that attribute
(relations = sets of tuples; each tuple = sequence of atomic
values)

DAT604 Database Design and Impl 36


FIRST NORMAL FORM

DAT604 Database Design and Impl 37


FIRST NORMAL FORM

DAT604 Database Design and Impl 38


Different Views
• The domain of DLOCATION contains Atomic Values but
some tuples can have a set of these values. In this case
Dlocations is not functionally dependent on DNUMBER

• The domain of DLOCATION contains sets of values and


hence is Non-atomic. So DNUMBER → DLOCATION
• Each set is considered a single member of the attribute
domain

DAT604 Database Design and Impl 39


FIRST NORMAL FORM (Continued …)
• Expand the key Redundancy

DAT604 Database Design and Impl 40


FIRST NORMAL FORM (Continued …)
• Increase the number of
Null Values
attributes
Spurious Tuples

DAT604 Database Design and Impl 41


FIRST NORMAL FORM (Continued …)
• Separate relation

DAT604 Database Design and Impl 42


FIRST NORMAL FORM (Continued …)
Which Solution ?
• Separate relation

• Expand the key Redundancy

• Increase the number of attributes Null Values

DAT604 Database Design and Impl 43


SECOND NORMAL FORM
X  Y is a Full Functional Dependency if

• Removal of any attribute A from X means that the


dependency does not hold any more

{SSN, PNUMBER}  HOURS

DAT604 Database Design and Impl 44


SECOND NORMAL FORM (Continued…)
X  Y is a Partial Functional Dependency;

• If some attribute A  X can be removed from X and the


dependency still holds

{SSN, PNUMBER}  {ENAME}

DAT604 Database Design and Impl 45


SECOND NORMAL FORM (Continued…)

DEFINITION:

• A relation schema R is in 2NF if every nonprime attribute


A in R is fully functionally dependent on the primary key of
R

An attribute of a relation schema R is called a prime


attribute of R if it is a member of some candidate key of R

DAT604 Database Design and Impl 46


SECOND NORMAL FORM (Continued…)
Test for 2NF
• Test for functional dependencies whose left hand side
attributes are part of the primary key
• No test for single attribute primary key

DAT604 Database Design and Impl 47


SECOND NORMAL FORM (Continued …)

DAT604 Database Design and Impl 48


THIRD NORMAL FORM
Transitive Dependency

A functional dependency X  Y is transitive dependency


when;

• A set of attributes A that is neither a candidate key nor a


subset of any key of R and

• Both X  A and A  Y hold

DAT604 Database Design and Impl 49


THIRD NORMAL FORM (Continued …)

SSN  DMGRSSN Transitive dependency through DNUMBER

SSN DNUMBER and DNUMBER  DMGRSSN

DAT604 Database Design and Impl 50


THIRD NORMAL FORM (Continued …)

DEFINITION:
A relation schema is in 3NF if;

• It satisfies 2NF

• No nonprime attribute of R is transitively dependent on


the primary key

DAT604 Database Design and Impl 51


THIRD NORMAL FORM (Continued …)

No Partial
functional
dependency

DAT604 Database Design and Impl 52


SUMMARY
1NF

TEST: Relation should have no non-atomic attributes or


nested relations

REMEDY: Form new relations for each non-atomic


attribute or nested relation

DAT604 Database Design and Impl 53


SUMMARY (Continued …)
2NF
TEST: For relations where primary key contains multiple
attributes, no non-key attribute should be functionally
dependent on a part of the primary key
A relation schema R is in 2NF if every nonprime
attribute A in R is not partially dependent dependent
on any KEY
REMEDY: Decompose and set up a new relation for each
partial key with its dependent attribute. Make sure to keep a
relation with the original primary key and any attribute that
are fully functionally dependent on it
DAT604 Database Design and Impl 54
SUMMARY (Continued …)
3NF
TEST: Relations should not have a non-key attribute
functionally determined by another non-key
attribute

REMEDY: Decompose and set up a relation that


includes the non-key attribute that
functionally determine other non key
attribute

DAT604 Database Design and Impl 55

Vous aimerez peut-être aussi