Vous êtes sur la page 1sur 85

Data WareHouse [463163]

Agus Hermanto, S.Kom., M.MT., ITIL


Data WareHouse [463163]

Every data-modeling technique has its own set of


terms, definitions, and techniques. This
vernacular permits us to understand complex and
difficult concepts and to use them to design
complex databases.
Why do we need a data model ? A model is an
abstraction or representation of a subject that
looks or behaves like all or part of the original.
Examples include a concept car and a model of a
building.
All models have a common set of objectives.
They are designed to help people envision how
the parts fit together, help people understand
how to use or apply the final product, reduce the
development risk, and ensure that the people
building the product and those requesting it have
the same expectations.

Agus Hermanto, S.Kom., M.MT., ITIL 2


Data WareHouse [463163]

The Benefits :
1. A model reduces overall risk by ensuring that the requirements of the final product will be satisfactorily met. By
examining a mock-up of the ultimate product, the intended users can make a reasonable determination of
whether the product will indeed fulfill their needs and objectives.
2. A model helps the developers envision how the final product will interface with other systems or functions. The
level of effort needed to create the interfaces and their feasibility can be reasonably estimated if a detailed model
is created. (In the case of a data warehouse, these interfaces include the data acquisition and the data delivery
programs, where and when to perform data cleansing, audits, data maintenance processes, and so on.)
3. A model helps all the people involved understand how to relate to the ultimate product and how it will pertain to
their work function. The model also helps the developers understand the skills needed by the ultimate audience
and what training needs to occur to ensure proper usage of the product.
4. Finally a model ensures that the people building the product and those requesting it have the same expectations
about the ultimate outcome of the effort. By examining the model, the potential for a missed opportunity is
greatly reduced, and the belief and trust by all parties that the ultimate product will be satisfactory is greatly
enhanced.

Agus Hermanto, S.Kom., M.MT., ITIL 3


Relational Model Concepts
The relational Model of Data is based on the concept of a Relation
The strength of the relational approach to data management

comes from the formal foundation provided by the theory of


relations
We review the essentials of the formal relational model in this chapter
In practice, there is a standard model based on SQL
Note: There are several important differences between the formal
model and the practical model, as we shall see

Slide 5- 4
Relational Model Concepts
A Relation is a mathematical concept based on the ideas
of sets
The model was first proposed by Dr. E.F. Codd of IBM
Research in 1970 in the following paper:
"A Relational Model for Large Shared Data Banks,"

Communications of the ACM, June 1970


The above paper caused a major revolution in the field of
database management and earned Dr. Codd the coveted
ACM Turing Award
Slide 5- 5
Informal Definitions

Informally, a relation looks like a table of values.

A relation typically contains a set of rows.

The data elements in each row represent certain facts that


correspond to a real-world entity or relationship
In the formal model, rows are called tuples

Each column has a column header that gives an indication of the


meaning of the data items in that column
In the formal model, the column header is called an attribute
name (or just attribute)
Slide 5- 6
Example of a Relation

Slide 5- 7
Informal Definitions
Key of a Relation:
Each row has a value of a data item (or set of items)
that uniquely identifies that row in the table
Called the key
In the STUDENT table, SSN is the key

Sometimes row-ids or sequential numbers are


assigned as keys to identify the rows in a table
Called artificial key or surrogate key Slide 5- 8
Formal Definitions - Schema
The Schema (or description) of a Relation:
Denoted by R(A1, A2, .....An)
R is the name of the relation
The attributes of the relation are A1, A2, ..., An
Example:
CUSTOMER (Cust-id, Cust-name, Address, Phone#)
CUSTOMER is the relation name
Defined over the four attributes: Cust-id, Cust-name, Address, Phone#
Each attribute has a domain or a set of valid values.
For example, the domain of Cust-id is 6 digit numbers.

Slide 5- 9
Formal Definitions - Tuple
A tuple is an ordered set of values (enclosed in angled brackets <
>)
Each value is derived from an appropriate domain.
A row in the CUSTOMER relation is a 4-tuple and would consist of
four values, for example:
<632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404)

894-2000">
This is called a 4-tuple as it has 4 values

A tuple (row) in the CUSTOMER relation.

A relation is a set of such tuples (rows)


Slide 5- 10
Formal Definitions - Domain
A domain has a logical definition:
Example: USA_phone_numbers are the set of 10 digit phone numbers valid in
the U.S.
A domain also has a data-type or a format defined for it.
The USA_phone_numbers may have a format: (ddd)ddd-dddd where each d is a
decimal digit.
Dates have various formats such as year, month, date formatted as yyyy-mm-dd,
or as dd mm,yyyy etc.

The attribute name designates the role played by a domain in a relation:


Used to interpret the meaning of the data elements corresponding to that attribute
Example: The domain Date may be used to define two attributes named Invoice-
date and Payment-date with different meanings

Slide 5- 11
Formal Definitions - State
The relation state is a subset of the Cartesian
product of the domains of its attributes
each domain contains the set of all possible values
the attribute can take.
Example: attribute Cust-name is defined over the
domain of character strings of maximum length
25
dom(Cust-name) is varchar(25)
The role these strings play in the CUSTOMER
Slide 5- 12
Formal Definitions - Summary
Formally,
Given R(A1, A2, .........., An)

r(R) dom (A1) X dom (A2) X ....X dom(An)


R(A1, A2, , An) is the schema of the relation
R is the name of the relation
A1, A2, , An are the attributes of the relation
r(R): a specific state (or "value" or population) of relation R this is
a set of tuples (rows)
r(R) = {t1, t2, , tn} where each ti is an n-tuple

ti = <v1, v2, , vn> where each vj element-of dom(Aj)

Slide 5- 13
Formal Definitions - Example
Let R(A1, A2) be a relation schema:
Let dom(A1) = {0,1}
Let dom(A2) = {a,b,c}
Then: dom(A1) X dom(A2) is all possible combinations:
{<0,a> , <0,b> , <0,c>, <1,a>, <1,b>, <1,c> }

The relation state r(R) dom(A1) X dom(A2)


For example: r(R) could be {<0,a> , <0,b> , <1,c> }
this is one possible state (or population or extension) r of the relation
R, defined over A1 and A2.
It has three 2-tuples: <0,a> , <0,b> , <1,c>
Slide 5- 14
Definition Summary
Informal Terms Formal Terms
Table Relation
Column Header Attribute
All possible Column Values Domain

Row Tuple

Table Definition Schema of a Relation


Populated Table State of the Relation
Slide 5- 15
Example A relation STUDENT

Slide 5- 16
Characteristics Of Relations
Ordering of tuples in a relation r(R):
The tuples are not considered to be ordered, even
though they appear to be in the tabular form.
Ordering of attributes in a relation schema R (and of
values within each tuple):
We will consider the attributes in R(A1, A2, ..., An) and
the values in t=<v1, v2, ..., vn> to be ordered .
(However, a more general alternative definition of relation
does not require this ordering).

Slide 5- 17
Same state as previous Figure (but
with different order of tuples)

Slide 5- 18
Characteristics Of Relations
Values in a tuple:
All values are considered atomic (indivisible).

Each value in a tuple must be from the domain of the attribute for

that column
If tuple t = <v1, v2, , vn> is a tuple (row) in the relation state r of
R(A1, A2, , An)
Then each vi must be a value from dom(Ai)

A special null value is used to represent values that are unknown


or inapplicable to certain tuples.

Slide 5- 19
Characteristics Of Relations
Notation:
We refer to component values of a tuple t by:
t[Ai] or t.Ai
This is the value vi of attribute Ai for tuple t
Similarly, t[Au, Av, ..., Aw] refers to the subtuple of
t containing the values of attributes Au, Av, ..., Aw,
respectively in t

Slide 5- 20
Relational Integrity Constraints
Constraints are conditions that must hold on all valid
relation states.
There are three main types of constraints in the relational
model:
Key constraints
Entity integrity constraints
Referential integrity constraints
Another implicit constraint is the domain constraint
Every value in a tuple must be from the domain of its
attribute (or it could be null, if allowed for that attribute)Slide 5- 21
Key Constraints
Superkey of R:
Is a set of attributes SK of R with the following condition:
No two tuples in any valid relation state r(R) will have the same value for SK
That is, for any distinct tuples t1 and t2 in r(R), t1[SK] t2[SK]
This condition must hold in any valid state r(R)
Key of R:
A "minimal" superkey
That is, a key is a superkey K such that removal of any attribute from K
results in a set of attributes that is not a superkey (does not possess the
superkey uniqueness property)

Slide 5- 22
Key Constraints (continued)
Example: Consider the CAR relation schema:
CAR(State, Reg#, SerialNo, Make, Model, Year)
CAR has two keys:
Key1 = {State, Reg#}
Key2 = {SerialNo}
Both are also superkeys of CAR
{SerialNo, Make} is a superkey but not a key.
In general:
Any key is a superkey (but not vice versa)
Any set of attributes that includes a key is a superkey
A minimal superkey is also a key
Slide 5- 23
Key Constraints (continued)
If a relation has several candidate keys, one is chosen arbitrarily to be the
primary key.
The primary key attributes are underlined.
Example: Consider the CAR relation schema:
CAR(State, Reg#, SerialNo, Make, Model, Year)
We chose SerialNo as the primary key
The primary key value is used to uniquely identify each tuple in a relation
Provides the tuple identity
Also used to reference the tuple from another tuple
General rule: Choose as primary key the smallest of the candidate keys
(in terms of size)
Not always applicable choice is sometimes subjective

Slide 5- 24
CAR table with two candidate keys
LicenseNumber chosen as Primary Key

Slide 5- 25
Relational Database Schema
Relational Database Schema:
A set S of relation schemas that belong to the same

database.
S is the name of the whole database schema

S = {R1, R2, ..., Rn}

R1, R2, , Rn are the names of the individual relation

schemas within the database S


Following slide shows a COMPANY database schema
with 6 relation schemas
Slide 5- 26
COMPANY Database Schema

Slide 5- 27
Entity Integrity
Entity Integrity:
The primary key attributes PK of each relation schema R in S

cannot have null values in any tuple of r(R).


This is because primary key values are used to identify the individual
tuples.
t[PK] null for any tuple t in r(R)
If PK has several attributes, null is not allowed in any of these
attributes
Note: Other attributes of R may be constrained to disallow null
values, even though they are not members of the primary key.

Slide 5- 28
Referential Integrity
A constraint involving two relations
The previous constraints involve a single relation.
Used to specify a relationship among tuples in
two relations:
The referencing relation and the referenced
relation.

Slide 5- 29
Referential Integrity
Tuples in the referencing relation R1 have attributes FK
(called foreign key attributes) that reference the primary
key attributes PK of the referenced relation R2.
A tuple t1 in R1 is said to reference a tuple t2 in R2 if

t1[FK] = t2[PK].
A referential integrity constraint can be displayed in a
relational database schema as a directed arc from R1.FK
to R2.

Slide 5- 30
Referential Integrity (or foreign key)
Constraint
Statement of the constraint
The value in the foreign key column (or columns)
FK of the the referencing relation R1 can be
either:
(1) a value of an existing primary key value of a
corresponding primary key PK in the referenced
relation R2, or
(2) a null.
In case (2), the FK in R1 should not be a part Slide
of 5- 31
Displaying a relational database
schema and its constraints
Each relation schema can be displayed as a row of
attribute names
The name of the relation is written above the attribute
names
The primary key attribute (or attributes) will be underlined
A foreign key (referential integrity) constraints is displayed
as a directed arc (arrow) from the foreign key attributes to
the referenced table
Can also point the the primary key of the referenced relation
for clarity
Next slide shows the COMPANY relational schema Slide 5- 32
Referential Integrity Constraints for COMPANY database

Slide 5- 33
Other Types of Constraints
Semantic Integrity Constraints:
based on application semantics and cannot be expressed by the

model per se
Example: the max. no. of hours per employee for all projects he

or she works on is 56 hrs per week


A constraint specification language may have to be used to
express these
SQL-99 allows triggers and ASSERTIONS to express for some of
these

Slide 5- 34
Populated database state
Each relation will have many tuples in its current relation state
The relational database state is a union of all the individual relation
states
Whenever the database is changed, a new state arises
Basic operations for changing the database:
INSERT a new tuple in a relation

DELETE an existing tuple from a relation

MODIFY an attribute of an existing tuple

Next slide shows an example state for the COMPANY database

Slide 5- 35
Populated database state for COMPANY

Slide 5- 36
Update Operations on Relations
INSERT a tuple.
DELETE a tuple.
MODIFY a tuple.
Integrity constraints should not be violated by the update operations.
Several update operations may have to be grouped together.
Updates may propagate to cause other updates automatically. This
may be necessary to maintain integrity constraints.

Slide 5- 37
Update Operations on Relations
In case of integrity violation, several actions can be taken:
Cancel the operation that causes the violation

(RESTRICT or REJECT option)


Perform the operation but inform the user of the

violation
Trigger additional updates so the violation is corrected

(CASCADE option, SET NULL option)


Execute a user-specified error-correction routine

Slide 5- 38
Possible violations for each operation
INSERT may violate any of the constraints:
Domain constraint:
if one of the attribute values provided for the new tuple is not of the specified
attribute domain
Key constraint:
if the value of a key attribute in the new tuple already exists in another tuple in
the relation
Referential integrity:
if a foreign key value in the new tuple references a primary key value that
does not exist in the referenced relation
Entity integrity:
if the primary key value is null in the new tuple
Slide 5- 39
Possible violations for each operation
DELETE may violate only referential integrity:
If the primary key value of the tuple being deleted is referenced

from other tuples in the database


Can be remedied by several actions: RESTRICT, CASCADE, SET
NULL (see Chapter 8 for more details)
RESTRICT option: reject the deletion
CASCADE option: propagate the new primary key value into the foreign
keys of the referencing tuples
SET NULL option: set the foreign keys of the referencing tuples to NULL
One of the above options must be specified during database
design for each foreign key constraint
Slide 5- 40
Possible violations for each operation
UPDATE may violate domain constraint and NOT NULL constraint on
an attribute being modified
Any of the other constraints may also be violated, depending on the
attribute being updated:
Updating the primary key (PK):

Similar to a DELETE followed by an INSERT


Need to specify similar options to DELETE
Updating a foreign key (FK):
May violate referential integrity
Updating an ordinary attribute (neither PK nor FK):
Can only violate domain constraints
Slide 5- 41
In-Class Exercise
Consider the following relations for a database that keeps track of student
enrollment in courses and the books adopted for each course:
STUDENT(SSN, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(SSN, Course#, Quarter, Grade)
BOOK_ADOPTION(Course#, Quarter, Book_ISBN)
TEXT(Book_ISBN, Book_Title, Publisher, Author)
Draw a relational schema diagram specifying the foreign keys for this
schema.
Slide 5- 42
Summary
Presented Relational Model Concepts
Definitions
Characteristics of relations
Discussed Relational Model Constraints and Relational Database Schemas
Domain constraints
Key constraints
Entity integrity
Referential integrity
Described the Relational Update Operations and Dealing with Constraint
Violations

Slide 5- 43
Informal Design Guidelines for
Relational Databases (1)
What is relational database design?
The grouping of attributes to form "good" relation
schemas
Two levels of relation schemas
The logical "user view" level
The storage "base relation" level
Design is concerned mainly with base relations
What are the criteria for "good" base relations?
Informal Design Guidelines for
Relational Databases (2)
We first discuss informal guidelines for good relational design
Then we discuss formal concepts of functional dependencies and normal
forms
- 1NF (First Normal Form)

- 2NF (Second Normal Form)

- 3NF (Third Normal Form)

- BCNF (Boyce-Codd Normal Form)

Additional types of dependencies, further normal forms, relational design


algorithms by synthesis are discussed in Chapter 11 (Elmasri, 5th ed.)
Semantics of the Relation Attributes
GUIDELINE 1: Informally, each tuple in a relation should represent one
entity or relationship instance. (Applies to individual relations and their
attributes).
Attributes of different entities (EMPLOYEEs, DEPARTMENTs,

PROJECTs) should not be mixed in the same relation


Only foreign keys should be used to refer to other entities

Entity and relationship attributes should be kept apart as much as

possible.
Bottom Line: Design a schema that can be explained easily relation by
relation. The semantics of attributes should be easy to interpret.
Example of good database design: a simplified
COMPANY relational database schema
Redundant Information in Tuples and Update
Anomalies
Information is stored redundantly
Wastes storage
Causes problems with update anomalies
Insertion anomalies
Deletion anomalies
Modification anomalies
Persoalan yang Ditimbulkan oleh Redundansi

Redundansi ruang penyimpanan: beberapa data disimpan secara


berulang
Update anomaly: Jika satu copy data terulang tsb diubah, inkonsistensi
data dpt terjadi kecuali kalau semua copy dari data tsb diubah dengan
cara yang sama
Insertion anomaly: Mungkin dpt terjadi kesulitan utk menyisipkan data
tertentu kecuali kalau beberapa data tidak terkait lainnya juga ikut
disisipkan
Deletion anomaly: Mungkin dpt terjadi kesulitan utk menghapus data
tertentu tanpa harus kehilangan beberapa data tidak terkait lainnya
Persoalan yang Ditimbulkan oleh Redundansi:
Contoh
SSN Name Lot Rating Wages Hours
123-22-3666 Attishoo 48 8 10 40
231-31-5368 Smiley 22 8 10 30
131-24-3650 Smethurst 35 5 7 30
434-26-3751 Guldu 35 5 7 32
612-67-4134 Madayan 35 8 10 40

Asumsi: nilai attribut wages ditentukan oleh nilai rating (utk satu nilai rating yang diberikan,
hanya diperbolehkan terdapat satu nilai wages

Redundansi ruang penyimpanan: nilai rating 8 yang berkorespondensi dg wages 10 diulang tiga kali
Update anomaly: Nilai wages (yg terkait dengan nilai rating) dlm baris pertama dpt diubah tanpa
membuat perubahan yg sama pada baris kedua dan kelima
Insertion anomaly: Kesulitan utk menyisipkan pasangan nilai rating & wages baru, kecuali harus
dikaitkan dengan penyisipan employee baru
Deletion anomaly: Jika semua baris yang terkait dg nilai rating tertentu dihapus (misalnya baris utk
employee Smethurst dan Guldu dihapus), maka kita akan kehilangan informasi ketergantungan
antara nilai rating dan nilai wages yang diasosiasikan dengan nilai rating tsb (yaitu rating = 5 dan
wages = 7)
EXAMPLE OF AN UPDATE ANOMALY

Consider the relation:


EMP_PROJ(Emp#, Proj#, Ename, Pname,
No_hours)
Update Anomaly:
Changing the name of project number P1 from
Billing to Customer-Accounting may cause this
update to be made for all 100 employees working on
project P1.
EXAMPLE OF AN INSERT ANOMALY

Consider the relation:


EMP_PROJ(Emp#, Proj#, Ename, Pname,
No_hours)
Insert Anomaly:
Cannot insert a project unless an employee is
assigned to it.
Conversely
Cannot insert an employee unless an he/she is
assigned to a project.
EXAMPLE OF AN DELETE ANOMALY

Consider the relation:


EMP_PROJ(Emp#, Proj#, Ename, Pname,
No_hours)
Delete Anomaly:
When a project is deleted, it will result in deleting all
the employees who work on that project.
Alternately, if an employee is the sole employee on a
project, deleting that employee would result in
deleting the corresponding project.
Two relation schemas suffering from
update anomalies
Example States for EMP_DEPT and
EMP_PROJ
Guideline to Redundant Information in Tuples
and Update Anomalies
GUIDELINE 2:
Design a schema that does not suffer from the
insertion, deletion and update anomalies.
If there are any anomalies present, then note them
so that applications can be made to take them into
account.
Null Values in Tuples

GUIDELINE 3:
Relations should be designed such that their tuples will

have as few NULL values as possible


Attributes that are NULL frequently could be placed in

separate relations (with the primary key)


Reasons for nulls:
Attribute not applicable or invalid

Attribute value unknown (may exist)

Value known to exist, but unavailable


Persoalan yang Ditimbulkan oleh Null Values

Untuk kasus-kasus khusus, adanya nilai-nilai null yang berlebihan dalam suatu
relasi dpt menimbulkan pemborosan pengunaan ruang penyimpanan
Hal ini terutama dpt terjadi pada suatu relasi dengan jumlah attribut yang besar dan
jumlah baris yang juga besar, sehingga untuk kasus tertentu dapat terjadi banyak
nilai-nilai kolom yang tidak memenuhi (not applicable) untuk sejumlah baris dalam
relasi harus dibiarkan bernilai null
Sebagai contoh, utk relasi Hourly Employees, misalkan ditambah satu kolom
baru (OfficeLocCode) utk mencatat kode lokasi kantor dari para pemimpin
perusahaan. Jika misalnya terdapat ribuan employee, dan hanya ada sekitar
5%pemimpin, maka sebagian besar (95%) nilai kolom tersebut akan terisi
dengan nilai null (pemborosan ruang penyimpan).
Spurious Tuples
Bad designs for a relational database may result in erroneous results for certain
JOIN operations
The "lossless join" property is used to guarantee meaningful results for join
operations

GUIDELINE 4:
The relations should be designed to satisfy the lossless join condition.
No spurious tuples should be generated by doing a natural-join of any
relations.

There are two important properties of decompositions:


a) Non-additive or losslessness of the corresponding join
b) Preservation of the functional dependencies.
Pengantar Penyempurnaan Skema:
Dekomposisi Skema Relasi
Proses Dekomposisi sebuah skema relasi R berupa penggantian
skema relasi menjadi dua (atau lebih) skema-skema baru yang
masing-masing berisikan subset dari attribut-attribut relasi R dan
kesemuanya memuat semua attribut yang ada dalam relasi R.
Proses dekomposisi dilakukan dengan menggunakan konsep ketergantungan
fungsional (functional dependencies)
Contoh: skema relasi Hourly_Employees dpt didekomposisi menjadi:
Hourly_Emps2 (ssn, name, lot, rating, hours)
Wages (rating, wages)
Hourly_Emps2
S N L R H
Wage 123-22-3666 Attishoo 48 8 40
R W
8s 10
231-31-5368 Smiley 22 8 30
131-24-3650 Smethurst 35 5 30
5 7
434-26-3751 Guldu 35 5 32
612-67-4134 Madayan 35 8 40
Functional Dependencies (FDs)
Suatu functional dependency X Y dikatakan berlaku pada relai R jika,
utk setiap nilai r dari R yang diperbolehkan, berlaku keadaan:
t1 r , t2 r, X (t1) = X (t2) mengimplikasikan Y (t1) = Y (t2)
yaitu, jika diberikan dua tuples dalam r, jika nilai proyeksi X pada kedua
tuples sama, maka nilai proyeksi Y pada kedua tuples juga sama. (X dan Y
adalah sets dari attributes pada relasi yang sama.)
Sebuah FD adalah pernyataan yang berlaku pada semua relasi-relasi
yang dimungkinkan.
Harus diidentifikasi berdasarkan semantik dari aplikasi
Jika diberikan beberapa nilai r1 dari R yang mungkin, kita dpt melakukan
pengecekan apakah nilai tersebut melanggar beberapa FD f, tetapi kita tidak
dapat mengatakan bahwa f berlaku pada R!
Contoh: Constraints pada Entity
Set
Perhatikan relasi Hourly_Emps berikut:
Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked)
Notasi: Utk penyederhaan penulisan, skema relasi tsb akan dinotasikan
dengan menggabungkan singkatan dari attribut-attributnya: SNLRWH
Notasi ini menyatakan satu set attributes {S,N,L,R,W,H}.
Dalam beberapa kasus, nama sebuah relasi akan digunakan untuk
mengacu ke semua attribut dari relasi tersebut. (contoh,
Hourly_Emps untuk SNLRWH)
Beberapa FD yang berlaku pada Hourly_Emps:
ssn adalah sebuah key: S SNLRWH
rating menentukan hrly_wages: R W
Contoh S
123-22-3666
N
Attishoo
L
48
R
8
W
10
H
40
(Lanjutan) 231-31-5368
131-24-3650
Smiley
Smethurst
22
35
8
5
10
7
30
30
434-26-3751 Guldu 35 5 7 32
612-67-4134 Madayan 35 8 10 40

Beberapa persoalan akibat R W :


Update anomaly: Dapatkah W diubah
Hourly_Emps2
S N L R H
hanya pada tuple pertama dari 123-22-3666 Attishoo 48 8 40
SNLRWH ? 231-31-5368 Smiley 22 8 30
Insertion anomaly: Bgm jika diinginkan 131-24-3650 Smethurst 35 5 30
434-26-3751 Guldu 35 5 32
utk menyisipkan seo-rang employee 612-67-4134 Madayan 35 8 40
tetapi hourly wage utk rating yang
bersangkutan tidak diketahui ? Wages
Deletion anomaly: Jika semua emplo- R W
yee dengan rating 5 dihapus, maka 8 10
5 7
informasi mengenai hourly wage utk
rating 5 juga akan ikut terhapus !
Inference Rules for FDs (1)
Given a set of FDs F, we can infer additional FDs that hold whenever
the FDs in F hold
Armstrong's inference rules:
IR1. (Reflexive) If Y subset-of X, then X -> Y

IR2. (Augmentation) If X -> Y, then XZ -> YZ

(Notation: XZ stands for X U Z)


IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z

IR1, IR2, IR3 form a sound and complete set of inference rules
These are rules hold and all other rules that hold can be deduced
from these
Inference Rules for FDs (2)
Some additional inference rules that are useful:
Decomposition: If X -> YZ, then X -> Y and X -> Z
Union: If X -> Y and X -> Z, then X -> YZ
Psuedotransitivity: If X -> Y and WY -> Z, then WX
-> Z

The last three inference rules, as well as any other


inference rules, can be deduced from IR1, IR2, and
IR3 (completeness property)
Inference Rules for FDs (3)

Closure of a set F of FDs is the set F+ of all FDs that can be


inferred from F

Closure of a set of attributes X with respect to F is the set X+


of all attributes that are functionally determined by X

X+ can be calculated by repeatedly applying IR1, IR2, IR3


using the FDs in F
Bentuk-Bentuk Normal (Normal Forms)

Normal Forms (NF) digunakan utk membantu kita dlm memutuskan


apakah suatu skema relasi sudah merupakan hasil desain yang baik
atau masih perlu didekomposisi menjadi relasi-relasi yang lebih kecil.
Jika suatu relasi skema sudah berada dalam salah satu NF, berarti bhw
beberapa jenis persoalan redundansi dapat dihindari/diminimalkan.
NF yang didasarkan pada FDs: 1NF, 2NF, 3NF dan Boyce-Codd NF
(BCNF):
Setiap relasi dlm BCNF juga berada dlm 3 NF
Setiap relasi dlm 3 NF juga berada dlm 2 NF, dan
Setiap relasi dlm 2 NF juga berada dlm 1NF
Bentuk-Bentuk Normal (Normal Forms)

Setiap relasi yang berada dlm 1NF berlaku constraint bhw setiap field hanya
berisikan nilai-nilai atomic (tidak boleh berisikan lists atau sets, atau dengan kata
lin tidak boleh berisikan repeating groups).
Dlm perkuliahan, constraint ini dianggap berlaku sebelum dilakukan proses normalisasi
Setiap relasi yang berada dlm 2NF berlaku constraint bhw setiap non-key
attributes dalam sebuah relasi, secara fungsional hanya bergantung penuh pada
key dari relasi tersebut (tidak boleh berisikan partial dependency)
Oleh karena 2NF dibuat atas dasar sejarah perkembangan database (dari
network model ke hierarchical model), maka pembahasan hanya ditekankan
pada proses pembentukan 3NF dan BCNF yang merupakan langkah penting
dalam proses desain database.
Bentuk-Bentuk Normal (Lanjutan)

Peran FD dalam mendeteksi redundansi: Perhatikan sebuah relasi R


dengan 3 attributes ABC.
Jika tidak ada FD yang harus diberlakukan pada relasi R, maka dapat
dipastikan tidak akan terdapat persoalan redundansi.
Namun, jika dimisalkan berlaku A B, maka jika terdapat beberapa
tuples yang mempunyai nilai A yang sama maka baris-baris tersebut
juga harus mempunyai nilai B yang sama.
Untuk ini, potensi terjadinya redundansi dapat diperkirakan dengan
menggunakan informasi FDs
Boyce-Codd Normal Form (BCNF)

Relasi R dg FDs F dikatakan berada dalam BCNF jika, utk


semua FD X A dalam F, salah satu dari pernyataan berikut
harus berlaku:
A X (disebut trivial FD), atau
X adalah key dari R.
Dengan kata lain, R dikatakan berada dalam BCNF jika non-
trivial FDs yang berlaku pada R hanya berupa key constraints.
Third Normal Form (3NF)
Relasi R dengan FDs F dikatakan berada dlm 3NF jika, untuk semua FD X
A dalam F, salah satu dari pernyataan berikut harus berlaku:
A X (disebut trivial FD), atau
X adalah key dari R, atau
A adalah bagian dari beberapa key dari R (A adalah prime attribute)
Minimality dari key dalam kondisi ketiga di atas menjadi sangat penting !
Jika R berada dlm BCNF, sudah tentu R juga berada dlm 3NF
Jika R berada dlm 3NF, beberapa redundansi masih mungkin terjadi.
Bentuk 3NF dapat dipakai sebagai bentuk yang kompromistis dan digunakan
bilamana BCNF tidak dapat diupayakan (misalnya karena tidak ada
dekomposisi yang baik, atau karena alasan pertimbangan kinerja dari
database)
Apa yang Dapat Dicapai oleh 3NF?
Jika depedensi X A menyebabkan pelanggaran dari 3NF, maka salah satu kasus
di bawah ini akan terjadi:
X adalah subset dari beberapa key K (partial dependency)
Pasangan nilai (X, A) yang sama akan tersimpan secara redundan
X bukan subset dari sembarang key K (transitive dependency)
Terdapat rantai FDs K X A, yang berarti bhw kita tdk dpt mengasosiasikan sebuah nilai A
dengan sebuah nilai K kecuali kalau kita juga mengasosiasikan sebuah nilai A dengan sebuah
nilai X
Namun demikian, walaupun seandainya relasi berada dalam 3NF, persoalan-
persoalan berikut masih dpt terjadi:
Contoh: relasi Reserves SBDC (C=Credit Card ID), S C, C S berada dalam 3NF, tetapi utk
setiap reservasi dari sailor S, pasangan nilai (S, C) yang sama akan tersimpan dalam database.
Dengan demikian, 3NF memang merupakan bentuk normal yang relatif
kompromistis dibandingkan BCNF.
Proses Dekomposisi dari sebuah
Skema Relasi

Asumsikan relasi R terdirid ari attributes A1 ... An.


Proses dekomposisi dari R meliputi penggantian R oleh dua atau lebih relasi,
sehingga :
Setiap skema relasi yang baru terdiri dari subset attribut dari R (dan tidak satupun attribut
yang tidak muncul dalam R), dan
Setiap attribut dari R muncul sebagai sebuah attribut dari salah satu atau lebih relasi-relasi
yang baru
Secara intuitif, pendekomposisian R berarti bahwa kita akan menyimpan nilai-nilai
dari skema-skema relasi yang dihasilkan oleh proses dekomposisi, bukan nilai-nilai
dari relasi R
Contoh, relasi SNLRWH dapat didekomposisi menjadi SNLRH dan RW (lihat slide
berikutnya).
Contoh Dekomposisi-1

Perhatikan relasi:
DeptProj (Ename, SSN, Bdate, Address, Dnumber, Dname, DMgrSSN) ESBADNM

FDs: S EBAD, D NM
S EBAD : memenuhi 3NF dan BCNF
D MN : menyalahi 3NF atau BCNF, dekomposisi ESBADNM menjadi:
ESBAD dan DMN
Hasil proses dekompoisi ESBADNM : ESBAD dan DMN (3NF & BCNF)
ESBAD DeptProj1 (Ename, SSN, Bdate, Address, Dnumber)
DMN Department (Dnumber, Dname, DMgrSSN)
Contoh Dekomposisi-2
Perhatikan relasi:
EmpProj (SSN, Pnumber, Hours, Ename, Pname, Plocation) SPHEJL
FDs: SP H, S E, P JL
SP H : memenuhi 3NF dan BCNF
S E : menyalahi 3NF atau BCNF, dekomposisi SPHEJL menjadi:
SPHJL dan SE
P JL : menyalahi 3NF atau BCNF, dekomposisi SPHJL menjadi:
SPH dan PJL
Hasil proses dekompoisi SPHEJL : SPH, SE, dan PJL (3NF & BCNF)
SPH EmpProj1 (SSN, Pnumber, Hours)
SE Employee (SSN, Ename)
PJL Project (Pnumber, Pname, Plocation)
Contoh Dekomposisi-3
Perhatikan relasi:
LOTS (PropertyID#, CountyName, Lot#, Area, Price, TaxRate) ICLAPT
FDs: I CLAPT, CL IAPT, C T, A P
I CLAPT : memenuhi 3NF dan BCNF
CL IAPT : memenuhi 3NF dan BCNF
C T : menyalahi 3NF dan BCNF, dekomposisi ICLAPT menjadi:
ICLAP dan CT
A P : menyalahi 3NF dan BCNF, dekomposisi ICLAP menjadi:
ICLA dan AP
Hasil proses dekompoisi ICLAPT : ICLA, CT, dan AP (3NF dan BCNF):
ICLA LOTS1 (PropertyID#, CountyName, Lot#, Area)
CT TaxRate (CountyName, TaxRate)
AP Price (Area, Price)
Contoh Dekomposisi-4

Perhatikan relasi:
LOTS (PropertyID#, CountyName, Lot#, Area, Price, TaxRate) ICLAPT
FDs: I CLAPT, CL IAPT, C T, A P, dan A C (tambahan)
I CLAPT : memenuhi 3NF dan BCNF
CL IAPT : memenuhi 3NF dan BCNF
C T : menyalahi 3NF dan BCNF, dekomposisi ICLAPT menjadi:
ICLAP dan CT
A P : menyalahi 3NF dan BCNF, dekomposisi ICLAP menjadi:
ICLA dan AP
A C : menyalahi BCNF (tapi memenuhi 3NF)
Atruan 3NF, relasi AC TIDAK perlu dipisah (hasil proses dekomposisi seperti dalam contoh slide
sebelumnya)
Aturan BCNF, relasi AC harus dipisah, sehingga hasil akhir menjadi (FD CL IAPT hilang dari hasil
TIDAK mempertahankan property dependency preservation) :
o IL LOTS1 (PropertyID#, Lot#)
o CT TaxRate (CountyName, TaxRate)
o AP Price (Area, Price)
o AC Area (Area, CountyName)
Contoh Dekomposisi-5 (Untuk didiskusikan)
Dari potongan contoh data mengenai penerbangan di bawah ini,
bagaimana proses normalisasi data dapat dilakukan?
Aircraft- Airport- Departure- Arrival- Number-of-
Flight-No Airport-City
Type Code Time time Seats
BA069 B747 LHR Heathrow - 13:00 402
ZRH Zurich 14:30 15:30 402
BAH Bahrain 23:00 0:15 402
SEZ Seychelles 5:45 6:45 402
MRU Mauritus 9:10 - 402
SK586 AB10 LIS Lisbon - 15:00 154
NCE Nice 18:15 18:55 154
CPH Compenhagen 21:10 21:45 154
ARN Stockholm 2:55 - 154
SK783 AB10 CPH Compenhagen - 9:40 154
ATH Athens 14:00 15:00 154
DAM Damascus 17:00 - 154
SK961 MD11 CPH Compenhagen - 18:10 270
VIE Vienna 19:50 20:40 270
NBO Nairobi 5:50 6:40 270
JNB Johannesburg 9:35 - 270
Contoh Dekomposisi-6 (Untuk didiskusikan)

Diberikan sebuah relasi R dengan 4 (empat) buah attribut ABCD.


Asumsikan bahwa satu set FD di bawah ini merupakan satu-
satunya FD yang berlaku untuk relasi R. Untuk satu set FD
tersebut : (a) Dapatkan satu set candidate keys yang berlaku untuk
R, (b) Dapatkan bentuk normal terbaik (1NF, 2NF, 3NF, atau
BCNF) yang dapat dipenuhi oleh R, dan (c) Jika R tidak berada
dalam BCNF, lakukan dekomposisi R menjadi satu set relasi yang
memenuhi BCNF dan mempertahankan FD yang harus
diberlakukan
C D, C A, B C
Rangkuman
Jika sebuah relasi berada dalam BCNF, maka relasi tersebut bebas dari
redundansi yang dapat dideteksi dengan menggunakan FDs.
Dengan demikian, upaya untuk menjamin bhw semua relasi berada dalam
BCNF merupakan upaya heuristik yang baik.
Jika sebuah relasi tidak berada dalam BCNF, coba lakukan dekomposisi
menjadi sekumpulan relasi-relasi BCNF.
Harus mempertimbangkan apakah semua FDs dipertahankan. Jika
dekomposisi menjadi BCNF yang bersifat lossless-join dan dependency
preserving tidak dimungkinkan (atau tidak cocok, untuk beberapa queries yang
tipikal), pertimbangkan dekomposisi menjadi 3NF.
Dekomposisi sebaiknya dilakukan dan/atau diperiksa kembali dengan
mempertimbangkan performance requirements yang diinginkan.
Tugas Individu (plagiarism strictly prohibited)
1. Consider the following set of requirements for a UNIVERSITY database that is used to keep track of
students transcripts. This is similar but not identical to the database shown in Figure 1.2 :
a. The university keeps track of each students name, student number, Social Security number, current
address and phone number, permanent address and phone number, birth date, sex, class (freshman,
sophomore, ..., graduate), major department, minor department (if any), and degree program (B.A., B.S.,
..., Ph.D.). Some user applications need to refer to the city, state, and ZIP Code of the students
permanent address and to the students last name. Both Social Security number and student number
have unique values for each student.
b. Each department is described by a name, department code, office number, office phone number, and
college. Both name and code have unique values for each department.
c. Each course has a course name, description, course number, number of semester hours, level, and
offering department. The value of the course number is unique for each course.
d. Each section has an instructor, semester, year, course, and section number. The section number
distinguishes sections of the same course that are taught during the same semester/year; its values are
1, 2, 3, ..., up to the number of sections taught during each semester.
e. A grade report has a student, section, letter grade, and numeric grade (0, 1, 2, 3, or 4).
Tugas Individu (plagiarism strictly prohibited)
2. Design an ER schema for this
application, and draw an ER diagram
for the schema. Specify key attributes
of each entity type, and structural
constraints on each relationship type.
Note any unspecified requirements,
and make appropriate assumptions to
make the specification complete.
Tugas Individu (plagiarism strictly prohibited)
3. Consider the ER diagram
in Figure 7.20, which
shows a simplified
schema for an airline
reservations system.
Extract from the ER
diagram the
requirements and
constraints that
produced this schema.
Try to be as precise as
possible in your
requirements and
constraints specification.
Tugas Individu (plagiarism strictly prohibited)

4. Why would you choose a database system instead of simply storing data in operating system files? When would it make
sense not to use a database system?
5. What is logical data independence and why is it important?
6. Explain the difference between logical and physical data independence.
7. Explain the difference between external, internal, and conceptual schemas. How are these different schema layers related
to the concepts of logical and physical data independence ?
8. Scrooge McNugget wants to store information (names, addresses, descriptions of embarrassing moments, etc.) about the
many ducks on his payroll. Not surprisingly, the volume of data compels him to buy a database system. To save money, he
wants to buy one with the fewest possible features, and he plans to run it as a stand-alone application on his PC clone. Of
course, Scrooge does not plan to share his list with anyone. Indicate which of the following DBMS features Scrooge should
pay for; in each case, also indicate why Scrooge should (or should not) pay for that feature in the system he buys : A
security facility, Concurrency control, Crash recovery, A view mechanism, A query language.

JAWABAN DITULIS DI KERTAS FOLIO DAN SIAPKAN SLIDE UNTUK PRESENTASI RANDOM
Data WareHouse [463163]

Agus Hermanto, S.Kom., M.MT., ITIL 85