7 Final Normalization

Normalization
Module 4
NormalizationNormalization- Definition
Definition
Normalization is the process of discarding repeating groups,

minimizing redundancy, eliminating composite keys for partial
dependency and separating non-key attributes.
In simple terms : "Each attribute (column) must be a fact about
the key, the whole key, and nothing but the key." Said another
way, each table should describe only one type of entity (information).
NormalizationNormalization- Overview
Overview
Normalization is the process of efficiently organizing data in a database.

Two objectives
Eliminate redundant data
Ensure data dependencies make sense
Reduce the amount of space a database consumes and ensure that data is
logically stored.
It is part of successful database design.
Without normalization, database systems can be
-
inaccurate,
slow,
and inefficient
and they might not produce the data you expect.
What
What Should
Should we
we achieve
achieve when
when normalizing
normalizing a
a
database?.
database?.
Arranging data into logical groups
Minimizing the amount of duplicated data
Access and manipulate the data quickly and efficiently
Make the changes in only one place
PS: Sometimes database designers refer to these goals in terms
such as data integrity, referential integrity, or keyed data access.
NormalizationNormalization- Overview
Overview
Normalisation is a theory for designing relational schema that make

sense and work well.
Well-normalised tables avoid redundancy and thereby reduce
inconsistencies.
Redundancy is unnecessary duplication.
In well-normalised DBs semantic dependencies are maintained by
primary key uniqueness.
Goals
Goals of
of Normalisation
Normalisation
Eliminate certain kinds of redundancy

avoid certain update anomalies
good representation of real world
simplify enforcement of DB integrity
Update
Update anomalies
anomalies
Undesirable side-effects that occur when performaing insertion,

modification or deletion operations on badly designed relational
DBs.
SSN
987
654
333
321
678
467
Name
J Smith
M Burke
A Dolan
K Doyle
O ONeill
R McKay
Dept
1
2
1
1
3
2
DeptMgr Dept Name

321
467
...
321
321
678
467
Representing
Department info
in the Employee
table causes
problems.
Sample
Sample anomalies
anomalies
Modification when the manager of a dept changes we have to change many values.
If we are not careful the DB will contain inconsistencies.
There is no easy way to get the DB to ensure that a department has only
one manager and only one name.
Anomalies
Anomalies continued
continued
Deletion if O ONeill leaves we delete his tuple and lose

- the fact that there is a department 3
- the name of dept 3
- who is the manager of dept. 3
Insertion
how would we create a new department before any employees are
assigned to it ?
Better
Better design
design
Separate entities are represented in separate tables.
SSN
987
654
333
321
678
467
Name
J Smith
M Burke
A Dolan
K Doyle
O ONeill
R McKay
Dept
1
2
1
1
3
2
Dept
1
2
3
DeptMgr Dept Name

321
467
...
678
10
Poor
Poor database
database design
design usually
usually include
include
Repetition of information
Inability to represent certain information
Loss of information
Difficulty to maintain information
11
Normalization
Normalization stages
stages
1NF - First normal form

2NF - Second normal form
3NF - Third normal form
BCNF Boyce Codd's Normal Form
4NF - Fourth normal form
5 NF- Fifth Normal Form
12
Un
Un Normalized
Normalized Form
Form
Un-normalised data = repeating groups, inconsistent data,

delete and insert anomalies
13
First
First Normal
Normal Form
Form
By analyzing above data the Observation are:

PRO_NUM intended to be primary key
Table entries invite data inconsistencies
Table displays data anomalies
-
Update
Modifying JOB_CLASS
Insertion
New employee must be assigned project
Deletion
If employee deleted, other vital data lost
14
First
First Normal
Normal Form
Form (FNF)
(FNF)
First Normal Form (1NF) = ELIMINATE REPEATING GROUPS

(make a separate table for each set of related attributes, and
give each table a primary key).
Each table contains all atomic data items, no repeating groups,
and a designated primary key (no duplicated rows)
15
First
First Normal
Normal Form
Form
Converting the above data to INF the following rules to be followed

Repeating groups must be eliminated
Proper primary key developed
Uniquely identifies attribute values (rows).
Dependencies can be identified
- Total dependency Desirable dependencies based on primary key
- Less desirable dependencies
Partial - based on part of composite primary key
- Transitive - one nonprime attribute depends on another nonprime attribute
16
After
After 11 NF
NF Data
Data
17
Functional Dependency Diagram
Partial - based on part of composite primary key

Transitive - one nonprime attribute depends on another
nonprime attribute
18
Second
Second Normal
Normal Form
Form
Second Normal Form (2NF) = ELIMINATE REDUNDANT DATA (if

an attribute depends on only part of multi-valued key, remove it
to a separate table).
Table is in 2NF if it met all database requirements for 1NF, and if
each non-key attribute is fully functionally dependent on the
whole primary key;
Data, which does not directly dependent on tables primary key must
be moved into another table.
19
Second
Second Normal
Normal Form
Form
2NF meets the following criteria:

Each table contains all atomic data items, no repeating groups,
and a designated primary key (no duplicated rows).
Each table has all non-primary key attributes fully functionally
dependant on the whole primary key
20
Steps
Steps of
of Second
Second Normal
Normal Form
Form
Start with 1NF format:

Write each key component on separate line
Write original key on last line
Each component is new table
Write dependent attributes after each key
21
Second Normal Form 2NF
22
Data
Data After
After Second
Second Normal
Normal Form
Form
23
Second
Second Normal
Normal Form
Form Summary
Summary
Should be in 1NF( Refer First Normal Form)

Includes no have any partial dependencies
24
Third
Third Normal
Normal Form
Form
Create separate table(s) to eliminate transitive functional

dependencies
Third Normal Form (3NF) = ELIMINATE COLUMNS NOT
DEPENDANT ON KEY (if attributes do not contribute to a
description of the key remove them to a separate table).
Table is in 3NF if it met all database requirements for both 1NF
and 2NF.
All transitive dependencies are eliminated
Each column must depend directly on the primary key;
All attributes that are not dependant upon the primary key must be
eliminated.
25
Third
Third Normal
Normal Form
Form
3NF meets the following criteria:

Each table contains all-atomic data items, no repeating groups, and a
designated primary key
Each table has all non-primary key attributes fully functionally
dependant on the whole primary key
All transitive dependencies are removed from each table
26
After
After 33 NF
NF Data
Data
In 2NF Contains no transitive dependencies
27
After
After Third
Third Normal
Normal Form
Form
28
Boyce-Codds
Boyce-Codds Normal
Normal Form
Form
After a lot of other approaches Boyce and Codd noticed a simple rule
for ensuring tables are well-normalised. Tables which obey the rule
are in BCNF (Boyce Codd Normal Form).
BCNF rule:
Every determinant in a table must be a candidate key for that
table.
29
Determinants
Determinants
A is a determinant of B if each value of A has precisely one

(possibly null) associated value of B.
Said another way A is a determinant of B if and only if whenever two tuples agree on
their A value they agree on their B value.
30
Determinants
Determinants
Note that determinacy depends on semantics of data cannot be

decided from individual table occurrences.
Alternative terminology
if A (functionally) determines B then
B is (functionally) dependent on A
31
Example
Example determinants
determinants
SSN determines employee name

SSN determines employee department
Dept. No. determines Dept. Name
Dept. Name determines Dept. No.
assuming Dept. names are also unique
Emp. Name does not determine Emp. Dept

two John Smiths could be in difft. Depts.
Emp. Name does not determine SSN.
32
Determinacy
Determinacy Diagram
Diagram
Name
SSN
Department
Dept. Name
Dept. Mgr
In general key attributes of an entity determine all the

single-valued attributes of the entity.
33
Composite
Composite Determinants
Determinants
(SSN, Project#) together determine

the hours that the employee works
on the project.
Name
SSN
hours
Project#
PName
34
Transitive
Transitive Dependencies
Dependencies
SSN actually determines DeptMgr

but only because
SSN determines DeptNo and
DeptNo determines DeptMgr.
Be careful to remove transitive

dependencies.
DeptNo
SSN
They mess up normalisation.
Dept. Mgr
35
Candidate
Candidate keys
keys
candidate key = any attribute or set of attributes which will be

unique for a table (set of attributes).
As well as the primary key there may be other candidate keys.
E.g. DNUMBER and DNAME are both candidate keys for the
Department table.
Key = row identifier

Candidate key = candidate identifier
36
Finding
Finding candidate
candidate keys
keys
Every key is by definition a determinant of all other attributes in a

relation.
So in a diagram, any attribute (or composite) from which all other
attributes are reachable is a candidate key.
Name
(SSN, Project#) is a
(composite) candidate
key for a table
containing these five
attributes.
SSN
hours
Project#
PName
37
What
What are
are the
the candidate
candidate keys
keys ??
E
P
M
N
S
W
T
V
K
teacher
subject
D
F
student
H
X
Y
A
B
C
38
Problems
Problems occur
occur when
when ...
...
Redundancy and anomalies occur when there are determinants

which are not candidate keys.
SSN
SSN is the only key for a table containing

these attributes
Name
DeptNo
Dept. Name
all attributes are reachable from SSN.
Dept. Mgr
SSN, DeptNo and DeptName are
determinants
they have arrows coming out of them.
39
BCNF
BCNF rule
rule
In well-normalised relations (Boyce-Codd normal form)

every determinant is a candidate key.
SSN
Name
DeptNo
DeptNo
Dept. Name
Dept. Mgr
The employee/dept table decomposed to BCNF.

Note that both DeptNo and DeptName are candidate keys of
the second table.
40
Transformation
Transformation to
to BCNF
BCNF
Create new tables such that each non-key

determinant is a candidate key in a new table.
The new table contains the attributes which are

directly determined by the new candidate key.
W
Z
X
Y
A
B
C
V
X
W
B
C
BCNF tables :
(V, X)
(A, B, C)
(V, W, Z, A)
(V, W, Y)
41
Summarizing
Summarizing Normal
Normal Forms
Forms
First NF - no multi-valued attributes

all relational DBs are 1NF
2NF - every non-key attribute is fully dependent on the

primary key
3NF - eliminate functional dependencies between non-key
attributes
all dependencies can then be enforced by uniqueness of
keys.
Table is in 2NF
but not 3NF
42
BCNF
BCNF vs.
vs. 3NF
3NF
BCNF goes further than 3NF, some say too far.

A 3NF table that has no overlapping composite keys is in BCNF.
A teacher teaches only one subject.

For a given subject a given student has only one teacher.
student
teacher
student
teacher
subject
3NF, not BCNF
keys: (student, subject)
(student, teacher)
teacher is a determinant
teacher
subject
BCNF
but tables are not independent
43
Further
Further Normalization
Normalization
4NF
The table should be In BCNF
The table should not Contain any multi-valued / nontrivial dependency
5NF
The table should be In 4NF
The Table should not have any join dependencies
44
Normalization
A series of steps followed to obtain a database

design that allows for consistent storage and
avoiding duplication of data
A process of decomposing relationships with
anomalies
The normalization process passes through
fulfilling different Normal Forms
A table is said to be in a certain normal form if
it satisfies certain constraints
Relational db model
1st Normal Form
2nd Normal Form
3rd Normal Form
Boyce/Codd Normal Form
4th Normal Form
5th Normal Form
Originally Dr. Codd defined 3 Normal Forms,

later on several more were added
For most practical purposes databases are
considered normalized if they adhere to
3rd Normal Form
Normalized relational db
model
45
Denormalization
Queries against a fully normalized database often perform poorly

Explanation: Current RDBMSs implement the relational model poorly.
A true relational DBMS would allow for a fully normalized database at
the logical level, while providing physical storage of data that is tuned
for high performance.
Two approaches are used

Approach 1: Keep the logical design normalized, but allow the DBMS
to store additional redundant information on disk to optimize
query response (indexes, materialized views, etc.).
In this case it is the DBMS software's responsibility to ensure
that any redundant copies are kept consistent.
Approach 2: Use denormalization to improve performance,
at the cost of reduced consistency
46
Denormalization
Demoralization is the process of attempting to optimize the performance
of a database by adding redundant data
This may achieve (may not!) an improvement in query response, but
at a cost
There should be a new set of constraints added that specify how the
redundant copies of information must be kept synchronized
Denormalization can be hazardous
: increase in logical complexity of the database design

: complexity of the additional constraints
It is the database designer's responsibility to ensure that the denormalized

database does not become inconsistent
47
On
On lighter
lighter Note
Note
48
Questions and Answers
49
End;
Thank you for your attention!
50
Codds
Codds Rules
Rules
proposed by Edgar F. "Ted" Codd, a pioneer of the
relational model for databases,
Rule 0: The system must qualify as relational, as a
database, and as a management system.
For a system to qualify as a RDBMS, that system must use its
relational facilities (exclusively) to manage the database
51
Codds
Codds Rules
Rules (Cont.)
(Cont.)
Rule 1: The information rule:

All information in the database to be represented in one and only one way,
namely by values in column positions within rows of tables.
Rule 2 : The guaranteed access rule:

All data must be accessible with no ambiguity.
This rule is essentially a restatement of the fundamental requirement for primary
keys.
It says that every individual scalar value in the database must be logically
addressable by specifying the name of the containing table, the name of the
containing column and the primary key value of the containing row.
52
Codds
Codds Rules
Rules (Cont.)
(Cont.)
Rule 3: Systematic treatment of null values:

The DBMS must allow each field to remain null (or empty).
It must support a representation of "missing information and inapplicable
information" that is systematic, distinct from all regular values and independent of
data type.
It is also implied that such representations must be manipulated by the DBMS in
a systematic way.
Rule 4: Active online catalog based on the relational model:

The system must support an online, relational catalog that is accessible to
authorized users by means of their regular query language.
Users must be able to access the database's structure (catalog) using the same
query language that they use to access the database's data.
53
Codds
Codds Rules
Rules (Cont.)
(Cont.)
Rule 5: The comprehensive data sublanguage rule:

The system must support at least one relational language that
- Has a linear syntax
- Can be used both interactively and within application programs,
- Supports data definition operations (including view definitions), data manipulation
operations (update as well as retrieval), security and integrity constraints, and
transaction management operations (begin, commit, and rollback).
Rule 6: The view updating rule:

All views that are theoretically updatable must be updatable by the system.
Rule 7: High-level insert, update, and delete:

The system must support set-at-a-time insert, update, and delete operators.
- This means that data can be retrieved from a relational database in sets constructed of
data from multiple rows and/or multiple tables.
This rule states that insert, update, and delete operations should be supported for
any retrievable set rather than just for a single row in a single table.
54
Codds
Codds Rules
Rules (Cont.)
(Cont.)
Rule 8: Physical data independence:

Changes to the physical level (how the data is stored, whether in arrays or linked
lists etc.) must not require a change to an application based on the structure.
Rule 9: Logical data independence:

Changes to the logical level (tables, columns, rows, and so on) must not require a
change to an application based on the structure.
Logical data independence is more difficult to achieve than physical data
independence.
Rule 10: Integrity independence:

Integrity constraints must be specified separately from application programs and
stored in the catalog.
It must be possible to change such constraints as and when appropriate without
unnecessarily affecting existing applications.
55
Codds
Codds Rules
Rules (Cont.)
(Cont.)
Rule 11: Distribution independence:

The distribution of portions of the database to various locations should be
invisible to users of the database.
Existing applications should continue to operate successfully :
- when a distributed version of the DBMS is first introduced; and
- when existing distributed data are redistributed around the system.
-
Rule 12: The nonsubversion rule:

If the system provides a low-level (record-at-a-time) interface, then that
interface cannot be used to subvert the system
For example, bypassing a relational security or integrity constraint.
56

7 Final Normalization

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

7 Final Normalization

Transféré par

Droits d'auteur :

Formats disponibles

Normalization

Normalization is the process of discarding repeating groups,

Normalization is the process of efficiently organizing data in a database.

Normalisation is a theory for designing relational schema that make

Eliminate certain kinds of redundancy

Undesirable side-effects that occur when performaing insertion,

DeptMgr Dept Name

Deletion if O ONeill leaves we delete his tuple and lose

Separate entities are represented in separate tables.

DeptMgr Dept Name

Inability to represent certain information

Difficulty to maintain information

1NF - First normal form

Un-normalised data = repeating groups, inconsistent data,

By analyzing above data the Observation are:

First Normal Form (1NF) = ELIMINATE REPEATING GROUPS

Converting the above data to INF the following rules to be followed

- Transitive - one nonprime attribute depends on another nonprime attribute

Functional Dependency Diagram

Partial - based on part of composite primary key

Second Normal Form (2NF) = ELIMINATE REDUNDANT DATA (if

2NF meets the following criteria:

Start with 1NF format:

Second Normal Form 2NF

Should be in 1NF( Refer First Normal Form)

Create separate table(s) to eliminate transitive functional

3NF meets the following criteria:

A is a determinant of B if each value of A has precisely one

Note that determinacy depends on semantics of data cannot be

SSN determines employee name

Emp. Name does not determine Emp. Dept

Emp. Name does not determine SSN.

In general key attributes of an entity determine all the

(SSN, Project#) together determine

SSN actually determines DeptMgr

Be careful to remove transitive

They mess up normalisation.

candidate key = any attribute or set of attributes which will be

Key = row identifier

Every key is by definition a determinant of all other attributes in a

Redundancy and anomalies occur when there are determinants

SSN is the only key for a table containing

all attributes are reachable from SSN.

In well-normalised relations (Boyce-Codd normal form)

The employee/dept table decomposed to BCNF.

Create new tables such that each non-key

The new table contains the attributes which are

First NF - no multi-valued attributes

2NF - every non-key attribute is fully dependent on the

BCNF goes further than 3NF, some say too far.

A teacher teaches only one subject.

A series of steps followed to obtain a database

Originally Dr. Codd defined 3 Normal Forms,

Queries against a fully normalized database often perform poorly

Two approaches are used

: increase in logical complexity of the database design

It is the database designer's responsibility to ensure that the denormalized

Questions and Answers

Thank you for your attention!

Rule 1: The information rule:

Rule 2 : The guaranteed access rule:

Rule 3: Systematic treatment of null values:

Rule 4: Active online catalog based on the relational model: