Académique Documents
Professionnel Documents
Culture Documents
database systems
normalization
one guideline for achieving good database design is the reduction of redundant information in relations; such redundancies give rise to waste of storage space as well as data anomalies during DML operations normalization procedure minimizes redundancies that exist in relations does not totally eliminate redundancy, rather, it produces the controlled redundancy that allows relations to exhibit better characteristics built around the concept of normal forms and functional dependencies the successive reduction of a given collection of relations (in some given normal form) to some more desirable form
Unnormalized Table
studid studname subject English 3 Algebra grade 75 79
Normalized Table
studid studname subject English 3 Algebra English 3 Filipino 2 grade 75 79 95 85
288945 Erap 288945 Erap 767650 Gloria 767650 Gloria 767650 Gloria
Physics 1 78
Example
A student can enroll in many subjects. Several students may be enrolled in a subject. A subject is taught by only one instructor. An instructor can teach many subjects. Initial Database Scheme: STUDREC( studid, stname, address, course, subjid, subjdesc, instid, instname, grade )
STUDREC studid stname address course subjid subjdesc instid Ins02 Ins03 instname Domingo Ferrer Domingo grade 75 79 95 288945 Erap 288945 Erap 767650 Gloria 767650 Gloria 767650 Gloria Aurora Hill BSCS Aurora Hill BSCS Bonifacio Bonifacio Bonifacio BSIT BSIT BSIT 9600 English 3 Ins01 9601 Algebra 9602 Filipino 2 9600 English 3 Ins01 9603 Physics 1 Ins03
de Guzman 85 de Guzman 78
The relation, although normalized, contains redundancies and suffers from data anomalies
partial dependencies
dependency diagram of STUDREC
characteristics of STUDREC has a composite key contains partial dependencies (attributes are fully dependent on only a portion of the primary key) in the first normal form, but not in the second normal form
normalization procedure
involves the decomposition of a relation into two or more relations, each of which is a projection of the original relation the procedure must be reversible, that is, it should be possible to join the projections to obtain the original relation in order to be valid, a decomposition must be lossless (or non-lossy), that is, its recomposition must be equivalent to the original relation (implies that no information is lost or added during the decomposition)
normalization procedure
to achieve a valid decomposition, use Heaths Theorem Heaths Theorem: Let R(a, b, c) be a relation. If the functional dependency a b holds on R, then the decomposition of R, R1(a, b) R2(a, c) is lossless (and therefore, valid).
transitive dependencies
dependency diagram of SUBJECT
subjid, subjdesc, instid, instname
The database is in 2NF but it still contains redundancies and suffers from data anomalies
characteristics of SUBJECT contains transitive dependencies (non-key attributes are dependent on other non-key attributes) in the second normal form, but not in the third normal form
SUBJECT(subjid, subjdesc, instid, instname) Using the FD instid instname: INSTRUCTOR(instid, instname) SUBJECT(subjid, subjdesc, instid)
INSTRUCTOR instid Ins01 Ins02 Ins03 instname Domingo Ferrer de Guzman
SUBJECT subjid 9600 9601 9602 9603 subjdesc English 3 Algebra Filipino 2 Physics 1 instid Ins01 Ins02 Ins03 Ins03
3NF Database
Relational Database Scheme: STUDENT(studid, stname, address, course) INSTRUCTOR(instid, instname) SUBJECT(subjid, subjdesc, instid) FK: instid Ref INSTRUCTOR STUDREC(studid, subjid, grade) FK: studid Ref STUDENT subjid Ref SUBJECT INSTRUCTOR
STUDENT studid studname address Bonifacio course BSIT STUDREC instid Ins01 Ins02 Ins03 Ins03 studid subjid grade 75 79 95 85 78 288495 9600 288495 9601 767650 9600 767650 9602 767650 9603 288945 Erap 767650 Gloria SUBJECT subjid 9600 9601 9602 9603 subjdesc English 3 Algebra Filipino 2 Physics 1 Aurora Hill BSCS instid Ins01 Ins02 Ins03 instname Domingo Ferrer de Guzman
dependency preservation
dependency-preserving decomposition a valid decomposition where all dependencies in the original relation are contained, or implied by those, in the decomposition results in independent relations; such relations can
be updated individually by observing only entity integrity and referential integrity constraints
decompositions that are non-dependency-preserving result in dependent relations atomic relations relations that cannot be decomposed without losing some functional dependencies
Example: Decomposition
SUBJECT(subjid, subjdesc, instid, instname)
Decomposition 1: INSTRUCTOR(instid, instname) SUBJECT(subjid, subjdesc, instname) The decomposition is lossy and therefore invalid. Decomposition 2: Using the FD subjid instname: INSTRUCTOR(subjid, instname) SUBJECT(subjid, subjdesc, instid) The decomposition is valid but non-dependency-preserving. Decomposition 3: Using the FD instid instname: INSTRUCTOR(instid, instname) SUBJECT(subjid, subjdesc, instid) The decomposition is valid and dependency-preserving.
a table is in BCNF if every determinant in the table is a candidate key if a table has only one candidate key, 3NF and BCNF are equivalent any relation can be nonloss-decomposed into a set of BCNF relations
Example: BCNF
A student can be a member of several clubs. A club can have many students as members. A club can have many projects. Project names are unique. In each club, a student can participate in only one project. Table Structure: R(student, club, project) FDs that hold in R: student, club project student, project club project club
R student Cory Erap Erap Gloria Gloria Gloria club Econ Math Econ Math project BantayBayan MathWiz BantayBayan MathWiz
Example: BCNF
Using the FD project club: R1( project, club ) R2( student, project ) FK: project Ref R1 The database is in 3NF and in BCNF. The decomposition is valid but is not dependency preserving. The FD student, club project is lost in decomposition. Information about a club member who is not part of a project cannot be stored. The original relation R is atomic.
R1 project BantayBayan MathWiz ScienceExpo Tagisan sa Agham R2 student Cory Erap Erap Gloria Gloria project BantayBayan MathWiz Tagisan sa Agham BantayBayan MathWiz ScienceExpo club Econ Math Science Science
Science ScienceExpo
Database Scheme 1: R(student, club, project) The relation is in 3NF but not in BCNF because project is a determinant but is not a candidate key.
Database Scheme 2: R(student, club, project) The relation is in 1NF due to the partial dependency project club
Gloria