Vous êtes sur la page 1sur 56

Normalization

Module 4

NormalizationNormalization- Definition
Definition

Normalization is the process of discarding repeating groups,


minimizing redundancy, eliminating composite keys for partial
dependency and separating non-key attributes.
In simple terms : "Each attribute (column) must be a fact about
the key, the whole key, and nothing but the key." Said another
way, each table should describe only one type of entity (information).

NormalizationNormalization- Overview
Overview

Normalization is the process of efficiently organizing data in a database.


Two objectives
Eliminate redundant data
Ensure data dependencies make sense

Reduce the amount of space a database consumes and ensure that data is
logically stored.
It is part of successful database design.
Without normalization, database systems can be
-

inaccurate,
slow,
and inefficient
and they might not produce the data you expect.

What
What Should
Should we
we achieve
achieve when
when normalizing
normalizing a
a
database?.
database?.
Arranging data into logical groups
Minimizing the amount of duplicated data
Access and manipulate the data quickly and efficiently
Make the changes in only one place
PS: Sometimes database designers refer to these goals in terms
such as data integrity, referential integrity, or keyed data access.

NormalizationNormalization- Overview
Overview

Normalisation is a theory for designing relational schema that make


sense and work well.
Well-normalised tables avoid redundancy and thereby reduce
inconsistencies.
Redundancy is unnecessary duplication.
In well-normalised DBs semantic dependencies are maintained by
primary key uniqueness.

Goals
Goals of
of Normalisation
Normalisation

Eliminate certain kinds of redundancy


avoid certain update anomalies
good representation of real world
simplify enforcement of DB integrity

Update
Update anomalies
anomalies

Undesirable side-effects that occur when performaing insertion,


modification or deletion operations on badly designed relational
DBs.

SSN
987
654
333
321
678
467

Name
J Smith
M Burke
A Dolan
K Doyle
O ONeill
R McKay

Dept
1
2
1
1
3
2

DeptMgr Dept Name


321

467
...
321
321
678
467

Representing
Department info
in the Employee
table causes
problems.

Sample
Sample anomalies
anomalies

Modification when the manager of a dept changes we have to change many values.
If we are not careful the DB will contain inconsistencies.
There is no easy way to get the DB to ensure that a department has only
one manager and only one name.

Anomalies
Anomalies continued
continued

Deletion if O ONeill leaves we delete his tuple and lose


- the fact that there is a department 3
- the name of dept 3
- who is the manager of dept. 3

Insertion
how would we create a new department before any employees are
assigned to it ?

Better
Better design
design

Separate entities are represented in separate tables.

SSN
987
654
333
321
678
467

Name
J Smith
M Burke
A Dolan
K Doyle
O ONeill
R McKay

Dept
1
2
1
1
3
2

Dept
1
2
3

DeptMgr Dept Name


321

467
...
678

10

Poor
Poor database
database design
design usually
usually include
include

Repetition of information

Inability to represent certain information

Loss of information

Difficulty to maintain information

11

Normalization
Normalization stages
stages

1NF - First normal form


2NF - Second normal form
3NF - Third normal form
BCNF Boyce Codd's Normal Form
4NF - Fourth normal form
5 NF- Fifth Normal Form

12

Un
Un Normalized
Normalized Form
Form

Un-normalised data = repeating groups, inconsistent data,


delete and insert anomalies

13

First
First Normal
Normal Form
Form

By analyzing above data the Observation are:


PRO_NUM intended to be primary key
Table entries invite data inconsistencies
Table displays data anomalies
-

Update
Modifying JOB_CLASS
Insertion
New employee must be assigned project
Deletion
If employee deleted, other vital data lost

14

First
First Normal
Normal Form
Form (FNF)
(FNF)

First Normal Form (1NF) = ELIMINATE REPEATING GROUPS


(make a separate table for each set of related attributes, and
give each table a primary key).
Each table contains all atomic data items, no repeating groups,
and a designated primary key (no duplicated rows)

15

First
First Normal
Normal Form
Form

Converting the above data to INF the following rules to be followed


Repeating groups must be eliminated
Proper primary key developed
Uniquely identifies attribute values (rows).
Dependencies can be identified
- Total dependency Desirable dependencies based on primary key
- Less desirable dependencies
Partial - based on part of composite primary key

- Transitive - one nonprime attribute depends on another nonprime attribute

16

After
After 11 NF
NF Data
Data

17

Functional Dependency Diagram

Partial - based on part of composite primary key


Transitive - one nonprime attribute depends on another
nonprime attribute

18

Second
Second Normal
Normal Form
Form

Second Normal Form (2NF) = ELIMINATE REDUNDANT DATA (if


an attribute depends on only part of multi-valued key, remove it
to a separate table).
Table is in 2NF if it met all database requirements for 1NF, and if
each non-key attribute is fully functionally dependent on the
whole primary key;
Data, which does not directly dependent on tables primary key must
be moved into another table.

19

Second
Second Normal
Normal Form
Form

2NF meets the following criteria:


Each table contains all atomic data items, no repeating groups,
and a designated primary key (no duplicated rows).
Each table has all non-primary key attributes fully functionally
dependant on the whole primary key

20

Steps
Steps of
of Second
Second Normal
Normal Form
Form

Start with 1NF format:


Write each key component on separate line
Write original key on last line
Each component is new table
Write dependent attributes after each key

21

Second Normal Form 2NF

22

Data
Data After
After Second
Second Normal
Normal Form
Form

23

Second
Second Normal
Normal Form
Form Summary
Summary

Should be in 1NF( Refer First Normal Form)


Includes no have any partial dependencies

24

Third
Third Normal
Normal Form
Form

Create separate table(s) to eliminate transitive functional


dependencies
Third Normal Form (3NF) = ELIMINATE COLUMNS NOT
DEPENDANT ON KEY (if attributes do not contribute to a
description of the key remove them to a separate table).
Table is in 3NF if it met all database requirements for both 1NF
and 2NF.
All transitive dependencies are eliminated
Each column must depend directly on the primary key;
All attributes that are not dependant upon the primary key must be
eliminated.

25

Third
Third Normal
Normal Form
Form

3NF meets the following criteria:


Each table contains all-atomic data items, no repeating groups, and a
designated primary key
Each table has all non-primary key attributes fully functionally
dependant on the whole primary key
All transitive dependencies are removed from each table

26

After
After 33 NF
NF Data
Data
In 2NF Contains no transitive dependencies

27

After
After Third
Third Normal
Normal Form
Form

28

Boyce-Codds
Boyce-Codds Normal
Normal Form
Form

After a lot of other approaches Boyce and Codd noticed a simple rule
for ensuring tables are well-normalised. Tables which obey the rule
are in BCNF (Boyce Codd Normal Form).

BCNF rule:
Every determinant in a table must be a candidate key for that
table.

29

Determinants
Determinants

A is a determinant of B if each value of A has precisely one


(possibly null) associated value of B.
Said another way A is a determinant of B if and only if whenever two tuples agree on
their A value they agree on their B value.

30

Determinants
Determinants

Note that determinacy depends on semantics of data cannot be


decided from individual table occurrences.

Alternative terminology
if A (functionally) determines B then
B is (functionally) dependent on A

31

Example
Example determinants
determinants

SSN determines employee name


SSN determines employee department
Dept. No. determines Dept. Name
Dept. Name determines Dept. No.
assuming Dept. names are also unique

Emp. Name does not determine Emp. Dept


two John Smiths could be in difft. Depts.

Emp. Name does not determine SSN.

32

Determinacy
Determinacy Diagram
Diagram

Name
SSN
Department

Dept. Name
Dept. Mgr

In general key attributes of an entity determine all the


single-valued attributes of the entity.

33

Composite
Composite Determinants
Determinants

(SSN, Project#) together determine


the hours that the employee works
on the project.

Name
SSN
hours
Project#
PName

34

Transitive
Transitive Dependencies
Dependencies

SSN actually determines DeptMgr


but only because
SSN determines DeptNo and
DeptNo determines DeptMgr.

Be careful to remove transitive


dependencies.

DeptNo
SSN

They mess up normalisation.

Dept. Mgr

35

Candidate
Candidate keys
keys

candidate key = any attribute or set of attributes which will be


unique for a table (set of attributes).
As well as the primary key there may be other candidate keys.
E.g. DNUMBER and DNAME are both candidate keys for the
Department table.

Key = row identifier


Candidate key = candidate identifier

36

Finding
Finding candidate
candidate keys
keys

Every key is by definition a determinant of all other attributes in a


relation.
So in a diagram, any attribute (or composite) from which all other
attributes are reachable is a candidate key.

Name

(SSN, Project#) is a
(composite) candidate
key for a table
containing these five
attributes.

SSN
hours
Project#
PName

37

What
What are
are the
the candidate
candidate keys
keys ??

E
P

M
N

S
W
T

V
K

teacher

subject

D
F

student

H
X
Y
A

B
C

38

Problems
Problems occur
occur when
when ...
...

Redundancy and anomalies occur when there are determinants


which are not candidate keys.

SSN

SSN is the only key for a table containing


these attributes

Name
DeptNo

Dept. Name

all attributes are reachable from SSN.

Dept. Mgr
SSN, DeptNo and DeptName are
determinants
they have arrows coming out of them.

39

BCNF
BCNF rule
rule

In well-normalised relations (Boyce-Codd normal form)


every determinant is a candidate key.

SSN

Name

DeptNo

DeptNo

Dept. Name
Dept. Mgr

The employee/dept table decomposed to BCNF.


Note that both DeptNo and DeptName are candidate keys of
the second table.
40

Transformation
Transformation to
to BCNF
BCNF

Create new tables such that each non-key


determinant is a candidate key in a new table.

The new table contains the attributes which are


directly determined by the new candidate key.

W
Z

X
Y
A

B
C

V
X
W

B
C

BCNF tables :
(V, X)
(A, B, C)
(V, W, Z, A)
(V, W, Y)
41

Summarizing
Summarizing Normal
Normal Forms
Forms

First NF - no multi-valued attributes


all relational DBs are 1NF

2NF - every non-key attribute is fully dependent on the


primary key
3NF - eliminate functional dependencies between non-key
attributes
all dependencies can then be enforced by uniqueness of
keys.

Table is in 2NF
but not 3NF
42

BCNF
BCNF vs.
vs. 3NF
3NF

BCNF goes further than 3NF, some say too far.


A 3NF table that has no overlapping composite keys is in BCNF.

A teacher teaches only one subject.


For a given subject a given student has only one teacher.
student

teacher

student

teacher

subject
3NF, not BCNF
keys: (student, subject)
(student, teacher)
teacher is a determinant

teacher

subject

BCNF
but tables are not independent
43

Further
Further Normalization
Normalization

4NF
The table should be In BCNF
The table should not Contain any multi-valued / nontrivial dependency

5NF
The table should be In 4NF
The Table should not have any join dependencies

44

Normalization

A series of steps followed to obtain a database


design that allows for consistent storage and
avoiding duplication of data
A process of decomposing relationships with
anomalies
The normalization process passes through
fulfilling different Normal Forms
A table is said to be in a certain normal form if
it satisfies certain constraints

Relational db model
1st Normal Form
2nd Normal Form
3rd Normal Form
Boyce/Codd Normal Form
4th Normal Form
5th Normal Form

Originally Dr. Codd defined 3 Normal Forms,


later on several more were added
For most practical purposes databases are
considered normalized if they adhere to
3rd Normal Form

Normalized relational db
model
45

Denormalization

Queries against a fully normalized database often perform poorly


Explanation: Current RDBMSs implement the relational model poorly.
A true relational DBMS would allow for a fully normalized database at
the logical level, while providing physical storage of data that is tuned
for high performance.

Two approaches are used


Approach 1: Keep the logical design normalized, but allow the DBMS
to store additional redundant information on disk to optimize
query response (indexes, materialized views, etc.).
In this case it is the DBMS software's responsibility to ensure
that any redundant copies are kept consistent.
Approach 2: Use denormalization to improve performance,
at the cost of reduced consistency

46

Denormalization
Demoralization is the process of attempting to optimize the performance
of a database by adding redundant data
This may achieve (may not!) an improvement in query response, but
at a cost
There should be a new set of constraints added that specify how the
redundant copies of information must be kept synchronized
Denormalization can be hazardous

: increase in logical complexity of the database design


: complexity of the additional constraints

It is the database designer's responsibility to ensure that the denormalized


database does not become inconsistent

47

On
On lighter
lighter Note
Note

48

Questions and Answers

49

End;

Thank you for your attention!

50

Codds
Codds Rules
Rules
proposed by Edgar F. "Ted" Codd, a pioneer of the
relational model for databases,
Rule 0: The system must qualify as relational, as a
database, and as a management system.
For a system to qualify as a RDBMS, that system must use its
relational facilities (exclusively) to manage the database

51

Codds
Codds Rules
Rules (Cont.)
(Cont.)

Rule 1: The information rule:


All information in the database to be represented in one and only one way,
namely by values in column positions within rows of tables.

Rule 2 : The guaranteed access rule:


All data must be accessible with no ambiguity.
This rule is essentially a restatement of the fundamental requirement for primary
keys.
It says that every individual scalar value in the database must be logically
addressable by specifying the name of the containing table, the name of the
containing column and the primary key value of the containing row.

52

Codds
Codds Rules
Rules (Cont.)
(Cont.)

Rule 3: Systematic treatment of null values:


The DBMS must allow each field to remain null (or empty).
It must support a representation of "missing information and inapplicable
information" that is systematic, distinct from all regular values and independent of
data type.
It is also implied that such representations must be manipulated by the DBMS in
a systematic way.

Rule 4: Active online catalog based on the relational model:


The system must support an online, relational catalog that is accessible to
authorized users by means of their regular query language.
Users must be able to access the database's structure (catalog) using the same
query language that they use to access the database's data.
53

Codds
Codds Rules
Rules (Cont.)
(Cont.)

Rule 5: The comprehensive data sublanguage rule:


The system must support at least one relational language that
- Has a linear syntax
- Can be used both interactively and within application programs,
- Supports data definition operations (including view definitions), data manipulation
operations (update as well as retrieval), security and integrity constraints, and
transaction management operations (begin, commit, and rollback).

Rule 6: The view updating rule:


All views that are theoretically updatable must be updatable by the system.

Rule 7: High-level insert, update, and delete:


The system must support set-at-a-time insert, update, and delete operators.
- This means that data can be retrieved from a relational database in sets constructed of
data from multiple rows and/or multiple tables.

This rule states that insert, update, and delete operations should be supported for
any retrievable set rather than just for a single row in a single table.
54

Codds
Codds Rules
Rules (Cont.)
(Cont.)

Rule 8: Physical data independence:


Changes to the physical level (how the data is stored, whether in arrays or linked
lists etc.) must not require a change to an application based on the structure.

Rule 9: Logical data independence:


Changes to the logical level (tables, columns, rows, and so on) must not require a
change to an application based on the structure.
Logical data independence is more difficult to achieve than physical data
independence.

Rule 10: Integrity independence:


Integrity constraints must be specified separately from application programs and
stored in the catalog.
It must be possible to change such constraints as and when appropriate without
unnecessarily affecting existing applications.
55

Codds
Codds Rules
Rules (Cont.)
(Cont.)

Rule 11: Distribution independence:


The distribution of portions of the database to various locations should be
invisible to users of the database.
Existing applications should continue to operate successfully :
- when a distributed version of the DBMS is first introduced; and
- when existing distributed data are redistributed around the system.
-

Rule 12: The nonsubversion rule:


If the system provides a low-level (record-at-a-time) interface, then that
interface cannot be used to subvert the system
For example, bypassing a relational security or integrity constraint.

56

Vous aimerez peut-être aussi