Vous êtes sur la page 1sur 4

normalization

database systems
normalization

one guideline for achieving good database design is the reduction of redundant information in relations; such redundancies give rise to waste of storage space as well as data anomalies during DML operations normalization procedure minimizes redundancies that exist in relations does not totally eliminate redundancy, rather, it produces the controlled redundancy that allows relations to exhibit better characteristics built around the concept of normal forms and functional dependencies the successive reduction of a given collection of relations (in some given normal form) to some more desirable form

Unnormalized Table
studid studname subject English 3 Algebra grade 75 79

288945 Erap 767650 Gloria

first normal form (1NF)


a table is in 1NF if all the key attributes are defined there are no repeating groups; that is, each row/column intersection can contain one and only one value, rather than a set of values all attributes are dependent on the primary key by definition, all valid relations are in 1NF

English 3 95 Filipino 2 85 Physics 1 78

Normalized Table
studid studname subject English 3 Algebra English 3 Filipino 2 grade 75 79 95 85

288945 Erap 288945 Erap 767650 Gloria 767650 Gloria 767650 Gloria

Physics 1 78

Example
A student can enroll in many subjects. Several students may be enrolled in a subject. A subject is taught by only one instructor. An instructor can teach many subjects. Initial Database Scheme: STUDREC( studid, stname, address, course, subjid, subjdesc, instid, instname, grade )
STUDREC studid stname address course subjid subjdesc instid Ins02 Ins03 instname Domingo Ferrer Domingo grade 75 79 95 288945 Erap 288945 Erap 767650 Gloria 767650 Gloria 767650 Gloria Aurora Hill BSCS Aurora Hill BSCS Bonifacio Bonifacio Bonifacio BSIT BSIT BSIT 9600 English 3 Ins01 9601 Algebra 9602 Filipino 2 9600 English 3 Ins01 9603 Physics 1 Ins03

functional dependencies (FDs)


Let: R be a relation variable X and Y be arbitrary subsets of the attribute set of R
X functionally determines Y (or Y is functionally dependent on X), if and only if, in every possible legal value of R, each unique X-value is associated with exactly one Y-value (i.e., whenever the value of X is known, there can be no doubt about the corresponding value of Y) notation: X Y, where X is the determinant and Y is the dependent if X is a candidate key, then all attributes of R must necessarily be functionally dependent on X

de Guzman 85 de Guzman 78

The relation, although normalized, contains redundancies and suffers from data anomalies

partial dependencies
dependency diagram of STUDREC

second normal form (2NF)


a table is in 2NF if it is in 1NF it includes no partial dependencies; that is, no attribute is dependent on only a portion of the primary key a relation with a simple primary key is automatically in 2NF

studid, stname, address, course, subjid, subjdesc, instid, instname, grade

characteristics of STUDREC has a composite key contains partial dependencies (attributes are fully dependent on only a portion of the primary key) in the first normal form, but not in the second normal form

normalization procedure
involves the decomposition of a relation into two or more relations, each of which is a projection of the original relation the procedure must be reversible, that is, it should be possible to join the projections to obtain the original relation in order to be valid, a decomposition must be lossless (or non-lossy), that is, its recomposition must be equivalent to the original relation (implies that no information is lost or added during the decomposition)

normalization procedure
to achieve a valid decomposition, use Heaths Theorem Heaths Theorem: Let R(a, b, c) be a relation. If the functional dependency a b holds on R, then the decomposition of R, R1(a, b) R2(a, c) is lossless (and therefore, valid).

Normalization of STUDREC using Heaths Theorem


STUDREC(studid, stname, address, course, subjid, subjdesc, instid, instname, grade) Using the FD studid stname, address, course: STUDENT(studid, stname, address, course) STUDREC(studid, subjid, subjdesc, instid, instname, grade) Using the FD subjid subjdesc, instid, instname: SUBJECT(subjid, subjdesc, instid, instname) STUDREC(studid, subjid, grade)
STUDENT studid studname address Bonifacio course BSIT 288945 Erap 767650 Gloria SUBJECT subjid 9600 9601 9602 9603 subjdesc English 3 Algebra Filipino 2 Physics 1 instid Ins01 Ins02 Ins03 Ins03 instname Domingo Ferrer de Guzman de Guzman Aurora Hill BSCS STUDREC studid subjid grade 75 79 95 85 78 288495 9600 288495 9601 767650 9600 767650 9602 767650 9603

transitive dependencies
dependency diagram of SUBJECT
subjid, subjdesc, instid, instname

The database is in 2NF but it still contains redundancies and suffers from data anomalies

characteristics of SUBJECT contains transitive dependencies (non-key attributes are dependent on other non-key attributes) in the second normal form, but not in the third normal form

Normalization of SUBJECT using Heaths Theorem

third normal form (3NF)


a table is in 3NF if it is in 2NF it contains no transitive dependencies; that is, all non-key attributes are dependent only on the primary key, not on other non-key attributes

SUBJECT(subjid, subjdesc, instid, instname) Using the FD instid instname: INSTRUCTOR(instid, instname) SUBJECT(subjid, subjdesc, instid)
INSTRUCTOR instid Ins01 Ins02 Ins03 instname Domingo Ferrer de Guzman

SUBJECT subjid 9600 9601 9602 9603 subjdesc English 3 Algebra Filipino 2 Physics 1 instid Ins01 Ins02 Ins03 Ins03

3NF Database
Relational Database Scheme: STUDENT(studid, stname, address, course) INSTRUCTOR(instid, instname) SUBJECT(subjid, subjdesc, instid) FK: instid Ref INSTRUCTOR STUDREC(studid, subjid, grade) FK: studid Ref STUDENT subjid Ref SUBJECT INSTRUCTOR
STUDENT studid studname address Bonifacio course BSIT STUDREC instid Ins01 Ins02 Ins03 Ins03 studid subjid grade 75 79 95 85 78 288495 9600 288495 9601 767650 9600 767650 9602 767650 9603 288945 Erap 767650 Gloria SUBJECT subjid 9600 9601 9602 9603 subjdesc English 3 Algebra Filipino 2 Physics 1 Aurora Hill BSCS instid Ins01 Ins02 Ins03 instname Domingo Ferrer de Guzman

dependency preservation
dependency-preserving decomposition a valid decomposition where all dependencies in the original relation are contained, or implied by those, in the decomposition results in independent relations; such relations can

be updated individually by observing only entity integrity and referential integrity constraints
decompositions that are non-dependency-preserving result in dependent relations atomic relations relations that cannot be decomposed without losing some functional dependencies

Example: Decomposition
SUBJECT(subjid, subjdesc, instid, instname)

Boyce-Codd normal form (BCNF)


subjid, subjdesc, instid, instname

Decomposition 1: INSTRUCTOR(instid, instname) SUBJECT(subjid, subjdesc, instname) The decomposition is lossy and therefore invalid. Decomposition 2: Using the FD subjid instname: INSTRUCTOR(subjid, instname) SUBJECT(subjid, subjdesc, instid) The decomposition is valid but non-dependency-preserving. Decomposition 3: Using the FD instid instname: INSTRUCTOR(instid, instname) SUBJECT(subjid, subjdesc, instid) The decomposition is valid and dependency-preserving.

a table is in BCNF if every determinant in the table is a candidate key if a table has only one candidate key, 3NF and BCNF are equivalent any relation can be nonloss-decomposed into a set of BCNF relations

Example: BCNF
A student can be a member of several clubs. A club can have many students as members. A club can have many projects. Project names are unique. In each club, a student can participate in only one project. Table Structure: R(student, club, project) FDs that hold in R: student, club project student, project club project club
R student Cory Erap Erap Gloria Gloria Gloria club Econ Math Econ Math project BantayBayan MathWiz BantayBayan MathWiz

Example: BCNF
Using the FD project club: R1( project, club ) R2( student, project ) FK: project Ref R1 The database is in 3NF and in BCNF. The decomposition is valid but is not dependency preserving. The FD student, club project is lost in decomposition. Information about a club member who is not part of a project cannot be stored. The original relation R is atomic.
R1 project BantayBayan MathWiz ScienceExpo Tagisan sa Agham R2 student Cory Erap Erap Gloria Gloria project BantayBayan MathWiz Tagisan sa Agham BantayBayan MathWiz ScienceExpo club Econ Math Science Science

Science Tagisan sa Agham

Science ScienceExpo

Database Scheme 1: R(student, club, project) The relation is in 3NF but not in BCNF because project is a determinant but is not a candidate key.

Database Scheme 2: R(student, club, project) The relation is in 1NF due to the partial dependency project club

Gloria

Vous aimerez peut-être aussi