Académique Documents
Professionnel Documents
Culture Documents
Fall 2002
CS 425 CS 520
Prerequisite: CS401 Prerequisite: CS402
Emphasis: Emphasis:
» Database Design & Use » Database Design & Eng.
» Application Development » DBMS Development
Project: Project:
» Application Development » DBMS Development
Database System
Concurrency
Control
Security
Recovery
(C) Frieder, Grossman, & Goharian 1996, 2002 6
Definitions
DBMS - a Database Management System is a set of
routines that is capable of providing the following basic
functions:
» Add
» Delete
» Update
» Retrieve
Add (X) =
Find (X);
If not found then insert (X)
else return (error_code)
Delete (X) =
Find (X);
If found then remove (X)
else return (error_code)
(C) Frieder, Grossman, & Goharian 1996, 2002 8
Primitive Functionality (cont)
Update (X, Y) =
Delete (X);
If not error_code then Add (Y)
Retrieve (X) =
Delete (X);
If not error_code then Add (X)
Hierarchical
» Links but no cycles (hierarchy)
Relational
» Data Independence
Object – Oriented
» Entity Abstraction
I
N
D
E
X
Posting List
Query:
find all occurrences of the name (value of attribute) is
‘Hank’ in the database:
Hank worked:
» IBM
» Intel
» Bellcore
» Harris
(C) Frieder, Grossman, & Goharian 1996, 2002 16
Network Model
Graduated
Hank
Worked
Hank
Graduated Worked
IBM Intel
MIT IIT Michigan
Bellcore Harris
User Query
Query Optimizer
Operators
File Manager
Concurrency Control &
Crash Recovery:
Buffer Manager
Transaction Manager
Disk Space Manager Lock Manager
Recovery Manager
DB
Table = Relation
Row = Tuple
Column = Attribute
(C) Frieder, Grossman, & Goharian 1996, 2002 25
Formal Definition
IF R is the set of attributes (columns), commonly
referred to as schema, then
r(R) is a mapping of a set of tuples (rows ),
commonly referred to as instance.
Key: (Student#, Degree) – Assumes only one degree type per person
(C) Frieder, Grossman, & Goharian 1996, 2002 31
SELECT
SELECT - extract tuples from a relation
πstudent#, degree(SG)
Student # Degree
1 Bachelors
1 Masters
1 Doctorate
2 Bachelors
2 Masters
3 Bachelors
4 Bachelors
4 Masters
4 Doctorate
5 Bachelors
6 Associates
6 Bachelors
Student # GPA
1 4.0
1 4.0
1 4.0
2 3.8
2 3.0
3 2.1
4 3.5
4 3.4
4 3.9
5 3.6
6 4.0
6 3.1
Key:
(Student#, Degree) Student # GPA Degree
1 4.0 Bachelors
2 3.8 Bachelors
3 2.1 Bachelors
Undergrad 4 3.5 Bachelors
(USG) 5 3.6 Bachelors
6 4.0 Associates
6 3.1 Bachelors
(C) Frieder, Grossman, & Goharian 1996, 2002 36
Set Theoretic Operations:
USG UNION GSG
List all information on all students
List all students who were both graduate and undergraduate students
List all students who were undergraduate but not graduate students
Student #
Only Undergrads 3
(OU) 5
6
1 2 3 5
GU OU
4 6
Ex: Find the min, max, and average quantity of all wheels.
SELECT *
FROM PARTS
WHERE _____ LIKE _______ OR
_____ IN (_,_,_) OR
______ BETWEEN __ AND __
ORDER BY ____ DESC
SELECT *
FROM PARTS
WHERE name LIKE ‘W%’ OR
p# IN (2,4,8) OR
p# BETWEEN 11 AND 15
ORDER BY qty DESC
Ex: List all parts that would have a quantity greater than
50, if the quantity was increased by 25 percent.
Sales
Marketing
Service
Finance
department AVG(salary)
Sales 300
Marketing 425
Finance 800
Service 550
(C) Frieder, Grossman, & Goharian 1996, 2002 66
Group By (continued)
If a WHERE clause exists, it is executed as well.
North
South
One to many:
One parent may have many children
Many to one:
Many people may attend a single meeting
Many to Many:
Students graduate from multiple colleges
Each college graduates by many students
(C) Frieder, Grossman, & Goharian 1996, 2002 75
Single Relation Design
It is tempting to try and stuff all multi-valued relationships
into a single relation:
EMPLOYEE COLLEGE
emp# name salary emp# name
1 Fred 200 1 Harvard
EMPLOYEE COLLEGE
emp# name salary emp# name location
1 Fred 200 1 Harvard Boston
2 IIT Chicago
2 Ethel 300
2 Michigan Ann Arbor
3 Mike 400 3 MIT Boston
3 Stanford Stanford
3 IIT Chicago
SELECT b.name
FROM EMPLOYEE a, COLLEGE b, ATTENDS c
WHERE a.emp# = c.emp# AND name
b.col# = c.col# AND Michigan
a.name = ‘Mike’ MIT
Stanford
Ex: SELECT *
FROM EMPLOYEE
WHERE emp# IN (1,2,3,4,5,7)
SELECT *
FROM EMPLOYEE a
WHERE EXISTS (SELECT c.emp#
FROM ATTENDS c
WHERE c.emp# = a.emp#)
Note: If optional column list is not found, the values must be listed in
the order of their initial definition
UPDATE EMPLOYEE
SET salary = salary * 1.10
» Create Table
» Drop Table
» Create Index
» Drop Index
» GRANT
» REVOKE
» ALTER TABLE
FIXED VARYING
200 Hank FILL 252.35 200 4 Hank 252.35
FIXED VARCHAR
tuple 1 tuple 1 tuple 2
tuple 2 tuple 2 (cont) tuple 3
tuple 3 tuple 3 (cont)
tuple 4 tuple 4
tuple 5 tuple 5
SELECT *
FROM EMPLOYEE
WHERE salary IS NULL
Employee
Ex: Create a view on the EMPLOYEE relation such that the salary
attribute is omitted.
CREATE VIEW V1 AS
(SELECT num, name FROM EMPLOYEE)
Ex: Give John access to all EMPLOYEE data and ensure that Mary
and Sue may not look at employee salaries.
Normalization
» Refine the design
SSN
Name
Employee
Salary
DEPARTMENT 1 M
Dept-Emp EMPLOYEE
EMPLOYEE 1 1
Emp-Sp SPOUSE
SUPPLIER M M
Sup-part PARTS
1 loan- M
Loan payment Payment
1 M
Supplier Supl-part Parts
1 M
Supplier Supl-part Parts
qty
date
1 1
Employee Emp-Sp Spouse
M M
Employee Emp-Dept Department
1 loan- M
Loan payment Payment
OR:
IS-A
Full Time (ssn,salary,start-date,
Full Time Part Time
name,phone)
Part Time (ssn, hourly-rate,
Start-date name, phone)
Salary Hourly-rate
M M
Employee E-D-P Department
M
Project pname
M M
Employee Emp-Dept Department
M
active date
Employee (ssn, name, salary) 1
Department (dname, city) pname
Project
Project (p-no,pname)
Emp-Dept (ssn, dname) pno
Active (ssn, dname,pno,date)
• Academic Structure:
• Colleges – containing many departments, head by a Dean, etc.
• Departments – containing many labs, faculty, courses,
students, head by a Chair, etc.
• Classes:
• Locations, prerequisites, offerings, professors
• Personnel:
• Names, social security numbers, children, offices, phone
numbers, email addresses, salary, etc.
• Cafeteria Offerings:
• Prices, item selection based on meal (breakfast, lunch, and
dinner), purchases (date & cost), location, manager, etc.
Lab
Children
College 1 M
M
DC DL
M FC
1
DS
1 Department M M
DF
M M
1 Faculty
Student
M CS CD M
term loc
Sec# M M
M CF
M Course
CP term Sec#
M
(C) Frieder, Grossman, & Goharian 1996, 2002 142
Design Relation Definition
Relations (Academic)
» College( college#, dean, college_nm)
» Dept( dept#, chair, dept_nm)
» Lab (lab_nm, bld#, room#, capacity)
» Student (ssn, first_nm, last_nm)
» Faculty (f_ssn, first_nm, last_nm, salary, phone,email)
» Course (c#,course_nm,credit_hrs)
Relationships (Academic)
» DC (dept#, college#)
» DL (lab_nm, dept#)
» DS (ssn, dept#) /* ssn of student */
» DF (dept#, f_ssn) /* ssn of faculty */
» CF (c#, f_ssn, term, sec#, loc)
» CD (c#, dept#)
» CS (c#, ssn, term, sec#)
» CP( course#, prereq# ) /* Prereqs of Courses /*
Relations (Personnel)
» Children (ssn, f_nm,l_nm, age)
Relationship (Personnel)
» FC (f_ssn, ssn) /* ssn of faculty and child */
Cafeteria
» Meal( meal#, time )
» Café( café#, address, phone# )
» … etc.
(C) Frieder, Grossman, & Goharian 1996, 2002 145
Sample Query 1
What are the common meal offerings that are available for
both lunch and dinner?
SELECT A.meal#
FROM Meal A, Meal B
WHERE (A.meal# = B.meal#
AND A.time = “Lunch”
AND B.Time = “Dinner”)
3NF
2NF
1NF
Functional Dependency:
Whenever two tuples have the same value
of x, they will also have the same value of y.
SNAME
S# STATUS
CITY
An arrow will always exist from the primary key to the non-key
attributes. Problems usually exist when other arrows are
present. Normalization may be informally defined as the
process by which extra arrows are removed.
1. X => YZ Given
2. YZ => Y Reflexivity
3. X => Y Transitivity 1, 2
4. YZ => Z Reflexivity
5. X => Z Transitivity 1, 4
1. X => Y Given
2. X => Z Given
3. XY => YZ Augment Y on 2
4. X => XY Augment X on 1
5. X => YZ Transitivity 4, 3
1. X => Y Given
2. WY => Z Given
3. XW => YW Augment W on 1
4. XW => Z Transitivity 3, 2
1. AB => E Given
2. BE => I Given
3. E => G Given
4. GI => H Given
5. AB => G Transitivity 1, 3
6. AB => BE Augment B to 1
7. AB => I Transitivity 6, 2
8. AB => GI Additivity 5, 7
9. AB => H Transitivity 8, 4
10. AB => GH Additivity 5, 9
R = (A, B, C, D)
FD {A=>B, C=>A, C=>D}
A+ = A, B
B+ = B
C+ = C, A, D, B
D+ = D
Deletion Anomaly
» Deletion of a tuple describing a particular student (S5) eliminates
additional valid information (P5 exists)
Update Anomaly
» The Goal value appears many times for the same student and is
redundant.
STUDENTS
» Facts about the individual students
PROFESSOR-STUDENTS
» Facts about where a professor and a student first met
Deletion Anomaly
» Some deletes will not only eliminate the fact that an employee
exists in a given location (employee number 6), but will also
remove the information about the expertise in a city (London and
accounting).
Update Anomaly
» Experience is redundant
Age
OFFICE EMPLOYEE
City Experience Emp# Age City
Chicago Accounting 1 30 Chicago
Washington Computer Science 2 45 Washington
New York Physics 3 52 New York
London Accounting 4 44 Washington
5 50 Chicago
6 52 London
Key: <City >
7 49 Washington
» Deletion Anomaly:
– If S5 terminates the university, P5 is dismissed!!!
– ( You may like it, but as a professor, I do not!!! )
F+ ≠ (Fx ∪ Fy)+
Have to join to verify FD: sid,programdegree
(C) Frieder, Grossman, & Goharian 1996, 2002 186
Minimal Cover
Create the smallest set of functional dependencies by:
•Removing transitive dependencies from non-key attributes to key.
ssnprogram, degree => ssnprogram
program degree
•Union all functional dependencies that the left hand side is the same.
ssn DOB
ssn name
=> ssnDOB, name
•Remove the extra attribute from the left hand-side of FD if the
FD is valid after removal of that attribute.
ssnaddress
ssn, name address
=> ssnaddress
(C) Frieder, Grossman, & Goharian 1996, 2002 187
Decomposition to 3NF using
Minimal Cover
IF relation R is not in 3NF then
Create the Minimal Cover of FD on R.
Create a Relation Ri for each FD in Minimal Cover.
If the key of relation R is not in its entirety included in
any of the relations Ri, then create one more relation
with that key.
Query Tree
» Structure that corresponds to a relational algebra
expression by representing relations as leaf nodes.
Initial query tree will JOIN the three relations first and then perform
the selections and projections. O(e w p) tuples will be accessed.
Join
(p# = pno)
PROJECTS Join
(essn = ssn)
WORKS-ON EMPLOYEE
(C) Frieder, Grossman, & Goharian 1996, 2002 205
Migrate SELECT
PROJECT (lname)
Step 1: Join
Reduce size of (p# = pno)
join by computing SELECT
early in the process. Join
(essn = ssn)
Both pno and ssn are key attributes. Thus, they both will
yield only one tuple to be retrieved.
Join
Step 2:
(essn = ssn)
Order joins according
to lower input and result
Join
sizes
(p# = pno)
temporary
relation sizes PROJECT (essn) PROJECT (ssn, lname)
PROJECTS
(C) Frieder, Grossman, & Goharian 1996, 2002 209
Overview of Key Rules
Partition each select of (A and B and C) to SELECT (A),
SELECT (B), and SELECT(C)
Move each select as far down the tree as possible
Rearrange leaf nodes so that the small answer sets are
processed first. Typically, use smallest selectivity to
estimate this (found in system catalog).
Combine a Cartesian product and a select of joining
conditions in a join
Move projection as far down the tree as possible
Identify sub-trees that represent groups of operations that
may be executed by a single access routine.
(C) Frieder, Grossman, & Goharian 1996, 2002 210
Explain
Many commercial systems provide a utility to identify the
path the optimizer will choose for a given query.
Typically the utility is referred to as EXPLAIN and the
syntax is:
Checkpoint Failure
(C) Frieder, Grossman, & Goharian 1996, 2002 216
Resilience to Failure
Power Failure (only)
» Undo and Redo processing
Data Disk Failure
» The last good data archive is restored and all logs since the point
of failure are re-processed to REDO all transactions lost on the
bad disk.
Log Disk Failure
» This is rare, but the only good means of avoiding this problem is
to use dual logs in which all log writes are duplicated.
» Without dual logging it is necessary to restore back to the last
good data archive and all transactions since then are lost.
If s1, s3, s4, s5, s6 execute, Y now has a value that will be incorrect if T1 does
not execute s2, but instead rolls back.
Page 1 Page 2
<DOC>
<DOCNO> AP881214-0028 </DOCNO>
<FILEID>AP-NR-12-14-88 0117EST</FILEID>
<FIRST>u i BC-Japan-Stocks 12-14 0027</FIRST>
<SECOND>BC-Japan-Stocks,0026</SECOND>
<HEAD>Stocks Up In Tokyo</HEAD>
<DATELINE>TOKYO (AP) </DATELINE>
<TEXT>
The Nikkei Stock Average closed at 29,754.73 points
up 156.92 points on the Tokyo Stock Exchange Wednesday.
</TEXT>
</DOC>
INDEX TERM
DocID Termcnt Term Term df idf
28 1 nikkei average 2265 1.08
28 2 stock closed 2208 1.08
28 1 average exchange 2790 1.00
28 1 closed nikkei 234 2.07
28 2 points points 1627 1.23
28 1 up stock 2674 1.00
28 1 tokyo tokyo 725 1.58
28 1 exchange up 12746 0.30
28 1 wednesday wednesday 6417 0.60
QUERY
TERM TERMCNT
nikkei 1 ORIGINAL QUERY:
stock 2 “nikkei stock exchange
american stock exchange”
exchange 2
american 1
t
Inner Prod uct ∑ xi ⋅ yi
i=1
t
∑ xiyi
Cosine Coefficient i=1
t t
∑ i •
x 2
∑ i
y 2
i=1 i=1
Phase 2
» After receiving all acknowledgments, send COMMIT to all sites.
» If all acknowledgments are not received in a certain
pre-defined time period, a ROLLBACK is sent to all sites.
» Once, the COMMIT is sent, all sites commit the data. If a site
fails before it receives the COMMIT, it will receive the
COMMIT upon restart.
Ack A
C (EMP)
(start)
Ack B
(DEPT)
(C) Frieder, Grossman, & Goharian 1996, 2002 248
Two-Phased Commit (example)
After Phase one, site C has received all acknowledgements
and is now ready to send final commit.
COMMIT A
C (EMP)
(start)
Phase 2:
COMMIT B
(DEPT)
Ack A
C (EMP)
(start)
Ack B
(DEPT)
(C) Frieder, Grossman, & Goharian 1996, 2002 249
Two-Phased Commit
Failure During Phase 1
Consider a case where site B fails after receiving the request
for update but site A succeeds:
Update EMP A
Phase 1: C (EMP)
(start)
Update DEPT B
(DEPT)
Ack A
C (EMP)
(start)
Site C receives only one acknowledgment but was
waiting for two, so a rollback is sent to all sites.
(C) Frieder, Grossman, & Goharian 1996, 2002 250
Two-Phased Commit
Failure During Phase 2
Consider a case where site B fails after sending the
acknowledgment in Phase 1.
COMMIT A
Phase 2: C (EMP)
(start)
COMMIT B
(DEPT)
Site B will eventually restart,
receive the COMMIT and phase two will complete.
Ack A
C (EMP)
(start)
Ack B
(DEPT)
(C) Frieder, Grossman, & Goharian 1996, 2002 251
Replication
Since 2PC processing is expensive, a cheaper alternative is
to replicate data so that they are at each site.
Many replication algorithms exist, the goal of which is to
propagate an update to all replicas.
EMP
Replica of EMP Replica of EMP
Source
Read-Only
Read-Only DBMS Replica
DBMS
Replica Site C Site C
Site B
Site B