Vous êtes sur la page 1sur 132

Database Fundamentals

Basic & Intermediate

14/01/2009© Lord's Kitchen 1


Icons used

Questions References

Key Concepts Demo

Brain Teasers A Welcome Break

14/01/2009© Lord's Kitchen 2


Module Information
Module This module provides students with the basic knowledge
Description and skills that are needed to understand the need for
databases and how they can design them

Level Basic & Intermediate

Prerequisites Basic knowledge of data and files


Module Objective & Outline
 Module Objective:
After completing this Module, you should :

 Understand what is a Database System


 Explain briefly different types of Database Systems
 Be able to create a Database environment with ER Modeling
 Have a broad overview on Relational Database Management System
 Have an introduction to Structured Query Language
 Understand how the DBMS & its host computer system intercommunicate
 Be aware of the new trends in Database

 © Lord's Kitchen


Module Outline

 Module Flow:


1. 2. 3. 4.
What is a Types of Creating a Structured

Database Database Database Query
System Systems Environment Language

5. 6.
Internal Database
Management Trends

© Lord's Kitchen
1.0 Database System
Learning Objective:


At the end of this Topic you will be able to –
•Understand what is a Database System
•Know how files are organized
•Appreciate the advantages of using a DBMS
over a traditional file system
•Be aware of the Database Architecture

14/01/2009© Lord's Kitchen 6


What is a Database System
A Database System is essentially a computerized
record-keeping system.
A database-management system (DBMS) consists of
a collection of interrelated data and a set of
programs to access those data.
Database systems are designed to manage large
volume of information

14/01/2009© Lord's Kitchen 7


File Organization : Terms and Concepts
 
Data Hierarchy in a Computer System
 Database: Group of related files

 File: Group of records of same type

 Record: Group of related fields

 Field: Group of words or a complete
number

 Byte: Group of bits that represents a
single character

 Bit: Smallest unit of data; binary
digit (0,1)

© Lord's Kitchen
File Organization : Terms and Concepts

 Entity: Person, place, thing, event about which information is


maintained
 Attribute: Description of a particular entity
 Key Field: Identifier field used to retrieve, update, sort a record

© Lord's Kitchen
File Organization : Terms and Concepts

Problems with the Traditional File Environment

 Data redundancy

 Program-Data dependence

 Lack of flexibility

 Poor security

 Lack of data-sharing and


availability
 Traditional File Processing
 No concurrency control

© Lord's Kitchen
DBMS and its Advantages
v A Database Management System is a collection of programs that
enables users to create and maintain a database. It is a
general purpose software system that facilitates processes of
defining, constructing and manipulating databases for various
applications.

v Advantages of Database approach:
• Controlling Redundancy
• Restricting Unauthorized access
• Providing persistent storage for program objects and data structures
• Permitting inference and actions using deduction rules
• Providing multiple user interface
• Representing complex relationships among data
• Enforcing integrity constraints
• Providing backup and recovery

© Lord's Kitchen
Database Management System (DBMS)

© Lord's Kitchen
DBMS Architecture

© Lord's Kitchen
DBMS Architecture
Data Independence
§ Logical data independence : capacity to
change conceptual schema without
having to change external schema.
§ Physical data independence : capacity to
change internal schema without changing
conceptual schema.

© Lord's Kitchen
Functions of DBMS
 Data definition :
• Specifies content and structure of database and defines each data
 element
 Data manipulation :
• Manipulates data in a database
 Data security and integrity :
• Monitors user requests and rejects any unauthorized attempts
 Data recovery and concurrency :
• Enforces certain controls for recovery and concurrency
 Data dictionary:
• Stores definitions of data elements, and data characteristics
 Performance :
• Functions should be performed efficiently

© Lord's Kitchen
Requirements of a DBMS

14/01/2009© Lord's Kitchen 16


Database System : Recap
• Why do businesses have trouble finding the information
they need in their information systems?
• How does a database management system help businesses
improve the organization of their information?
• What are the advantages of using a DBMS over a
traditional file system
• State the major functions and requirements of a DBMS

14/01/2009© Lord's Kitchen 17


Quiz

If a Customer Database has the following fields:
EmpId, EmpName, Salary and DeptName, What
would be the ideal Key field and why ?
 EmpID
EmpName
DeptName
EmpId+DeptName

14/01/2009© Lord's Kitchen 18


2.0 Types of Databases
Learning Objective:

 At the end of this Topic you will be able to –


•Explain briefly the various types of Database
Systems
 Relational DBMS
Hierarchical DBMS
Network DBMS
Object-Oriented Databases

14/01/2009© Lord's Kitchen 19


Relational Database Model
 Represents data as two-dimensional tables called relations
 Relates data across tables based on common data element

Examples: DB2, Oracle, MS SQL Server

14/01/2009© Lord's Kitchen 20


Three Basic Operations in a
Relational Database

14/01/2009© Lord's Kitchen 21


Three Basic Operations in a
Relational Database

SELECT JOIN

PROJECT

14/01/2009© Lord's Kitchen 22


Hierarchical Database Model
Itis a pointer based model
Organizes data in a tree-like structure
Stores data in tables and views relationships as links
Supports one-to-many parent-child relationships
Prevalent in large legacy systems

14/01/2009© Lord's Kitchen 23


Network DBMS
Depicts data logically as many-to-many
relationships
Organizes data in tables and views relationships
as links
It is also a pointer based model
Organizes data in arbitrary graphs

14/01/2009© Lord's Kitchen 24


Hierarchical and Network DBMS
Some of the Disadvantages

 Outdated
 Complex pointer based organization
 Less flexible compared to RDBMS
 Lack support for ad-hoc and English language-like queries

14/01/2009© Lord's Kitchen 25


Object-Oriented Databases

Object-oriented DBMS: Stores data and


procedures as objects that can be retrieved
and shared automatically
Object-relational DBMS: Provides capabilities of
both object-oriented and relational DBMS

14/01/2009© Lord's Kitchen 26


Types of Databases : Summary
• In a relational database the data is perceived
as tables (and nothing but tables) by the
user
• The relational operators available are used to
manipulate the data in the tables

14/01/2009© Lord's Kitchen 27


3.0 Creating a DB environment
 Learning Objective:
◦ At the end of this Topic you will –
 Have the ability to model an application system
based on the E-R Modeling approach.
Understand the Relational Database concepts like
Normalization, Data Integrity, Relational
Operations like Union, Intersection etc.
Be able to Design Relational Databases based on
E-R Models or System Requirements for an
application.

14/01/2009© Lord's Kitchen 28


Introduction to Data Modeling

What is Data Modeling?

A technique for analyzing requirements and for
identifying the information needs of an organization

Why Data Modeling is important?

Cannot build a good system without knowing
what data needs to be captured and how it needs to
be organized

14/01/2009© Lord's Kitchen 29


Introduction to Data Modeling
An Overview :
 Conceptual representation of the data structures required by a
database
 Data structures include the data objects, the associations between
data objects, and the rules which govern operations on the
objects
 Focuses on what data is required and how it should be organized
 Independent of hardware or software constraints
Data Model And Database Design:

 Data Model is to a Database what a Building plan or a blueprint is


to a Building
 A Database Design translates a data model into a database
 A Data Model is the conceptual design of a database

14/01/2009© Lord's Kitchen 30


E-R Modeling
 Originallyproposed by Peter Chen (1976)
 Views the real world as entities and relationships
 Key component is the E-R Diagram
 Most common model used for designing relational databases

• Entity- An identifiable object or concept of


significance
• Attribute- Property of an entity or relationship
• Relationship- An association between entities
• Identifier- one or more attributes identifying an
instance
(occurrence) of an entity

14/01/2009© Lord's Kitchen 31


Entity relationship diagram

14/01/2009© Lord's Kitchen 32


E-R Modeling
Entity

DEPARTMENT has
EMPLOYEE
works
•Dept No. •Name
for
•Name •Emp Id.

Relations Attribute
Identifie s
hip
r

14/01/2009© Lord's Kitchen 33


E-R Modeling
Entity

•Any object or thing of significance about which data


needs to be collected and maintained
•Could be
Concrete or tangible like a person or a building
Abstract like a concept or activity
•Analogous to a table in a relational database


Examples: EMPLOYEES, PROJECTS, INVOICES

14/01/2009© Lord's Kitchen 34


E-R Modeling

 Entity Rules
• Any thing or object may only be represented by one entity. Entities are
mutually exclusive in all cases.
• Each entity must be uniquely identifiable. Each instance (occurrence) of
an entity must be separate and distinctly identifiable from all other
instances of that type of entity.
 Entity Classification and Types
• Classified as dependent and independent
• An independent entity is one that does not rely on another for
identification
• A dependent entity is one that relies on another for identification
• In some, methodologies, the terms used are strong and weak,
respectively

14/01/2009© Lord's Kitchen 35


E-R Modeling
 Entity Classification and Types
• Fundamental entity - An entity that exists and is of interest in its
own right. Generally, most entities in the data model are
fundamental entities.


Example :Department and Employee are both fundamental entities

 Special Entity Types


• Associative Entity -Used to associate two entities in order to reconcile a
many-many relationship
• Sub-type/super-type- Used in generalization hierarchies to
represent a subset of instances of their of parent entity

14/01/2009© Lord's Kitchen 36

E-R Modeling
Example of Associative entity :

has for a
ORDER ORDER LINE ITEM
appears on
belongs to

14/01/2009© Lord's Kitchen 37


E-R Modeling
• Generalization Hierarchies
 Generalization occurs when two or more entities

represent categories of the same real-world object.




Example: CAR and TRUCK represent categories of the
same entity, VEHICLE is the super-type; CAR and TRUCK
would be the subtypes

14/01/2009© Lord's Kitchen 38


E-R Modeling
• Generalization Hierarchies
Form of abstraction that specifies that two or more entities that

share common attributes can be generalized into a higher level

entity type called a super-type or generic entity.

The lower-level of entities become the sub-type, or categories,

to the super-type. Sub-types are dependent entities.

14/01/2009© Lord's Kitchen 39


E-R Modeling
• Generalization Hierarchies

 Sub-types can be either mutually exclusive (disjoint) or

overlapping (inclusive)

In an overlapping hierarchy an entity instance can be part of

multiple subtypes
 Example: Entity PERSON represents people at a university. It has three subtypes,
FACULTY, STAFF, and STUDENT. A STAFF member could also be registered as a
STUDENT

PERSON

STUDENT STAFF FACULTY


14/01/2009© Lord's Kitchen 40
E-R Modeling
• Generalization Hierarchies
In a disjoint hierarchy, an entity instance can be in only one
subtype.


Example: Entity EMPLOYEE, may have two subtypes,
CLASSIFIED and WAGES. An employee may be one type
or the other but not both

14/01/2009© Lord's Kitchen 41


E-R Modeling
• Generalization Hierarchies - Nested

PERSON

STUDENT FACULTY

UNDERGRAD GRADUATE

14/01/2009© Lord's Kitchen 42


E-R Modeling
Attribute
• Attributes describe a property or a characteristic of an
entity
• A particular instance of an attribute is a value.
 For example “John Doe” is one value of the attribute Name.
• Simple attribute
Contains only atomic values FNa
• Composite attribute me
Has component attributes
 MI
Na
Student
me
DOB
LNa
Simple Composite me
14/01/2009© Lord's Kitchen 43
E-R Modeling
•Attribute Classification
•Single-valued attribute

•Has exactly one value per instance of an entity
•Multi-valued attribute

•Contains repeating values per instance of an entity
 v

Multi-valued Mat
Mod h
Single-
valued ule
Id Student Phy
sics

14/01/2009© Lord's Kitchen 44


E-R Modeling
• Identifiers and Descriptors
• Attributes can be classified as identifiers or descriptors
 • Identifiers, more commonly called keys, uniquely identify an

instance of an entity.
• A descriptor describes a non-unique characteristic of an entity

instance.
An Example :
Entity: Employee
Unique Identifier: Employee No.
Descriptor: Name, DOJ, DOB

14/01/2009© Lord's Kitchen 45


E-R Modeling
• Relationship
• Represents an association between two or more

entities
Examples

- Employees work for Departments
 - Departments manage one or more projects
- Employees are assigned to projects
- Projects have sub-tasks
- Orders have line items
• Defined in terms of:
- Degree
- Connectivity
- Cardinality
- Direction
- Type
- Existence
14/01/2009© Lord's Kitchen 46
E-R Modeling
• Degree
 Number of entities associated with the relationship
Binary relationships, the association between two entities is the
most common type in the real world. N-ary is the general form for
degree n
• Connectivity
Mapping of associated entity instances in the relationship.
The values of connectivity are "one" or "many”.
• Cardinality
Actual number of related occurrences for each of the two entities.
The basic types of connectivity for relations are: one-to-one, one-to-many,
and many-to-many.




14/01/2009© Lord's Kitchen 47
E-R Modeling
•Connectivity and Cardinality
•A one-to-one (1:1) relationship is when at most one instance of a entity A
Is associated with one instance of entity B.
For example:
Employees in the company are each assigned their own office. For each
Employee there exists a unique office and for each office there exists a
unique employee.
•A one-to-many (1:N) relationships is when for one instance of entity A,
there are zero, one, or many instances of entity B, but for one instance
of entity B, there is only one instance of entity A.
An example :
A department has many employees each employee is assigned to one department

14/01/2009© Lord's Kitchen 48


E-R Modeling
•Connectivity and Cardinality
•A many-to-many relationship, is when for one instance of entity
A, there are zero, one, or many instances of entity B and for
one instance of entity B there are zero, one, or many instances
of entity A.
An example is:
employees can be assigned to no more than two projects at the
same time; Project must have assigned at least three employees

14/01/2009© Lord's Kitchen 49


E-R Modeling
• Direction
Indicates the originating entity of a binary relationship. The entity
from which a relationship originates is the parent entity; the entity
where the relationship terminates is the child entity.
• Type
The direction of a relationship is determined by its connectivity.
Identifying and Non-identifying
• An identifying relationship is one in which one of the child entities is also
dependent entity.
• A non-identifying relationship is one in which both entities are
independent.

14/01/2009© Lord's Kitchen 50


E-R Modeling
• Existence
Denotes whether the existence of an entity instance is
dependent
upon the existence of another, related, entity instance.
Defined as either mandatory or optional.
• Mandatory and optional relationship
If an instance of an entity must always occur for an entity to be
included in a
relationship, then it is mandatory. If the instance of the entity is
not required, it
is optional.
 Example:

Mandatory :Every project must be managed by a
single department

Optional : Employees may be assigned to work on
projects 14/01/2009© Lord's Kitchen 51
E-R Modeling
• E-R Notation
No standard notation
Original notation by Chen
Common notations are: Bachman, crow's foot, and IDEFIX
All styles represent entities as rectangular boxes and
relationships as lines connecting boxes
Each style uses a special set of symbols to represent the
cardinality of a connection

14/01/2009© Lord's Kitchen 52


E-R Modeling
•Entities
•Represented by labeled rectangles
•The label is the name of the entity
•Entity names should be singular nouns.
•Relationships
•Represented by a solid line connecting
Employee
two entities.
•Name written above the line
Works for
•Relationship names should be verbs

Department

14/01/2009© Lord's Kitchen 53


E-R Modeling
• Attributes
Listed inside the entity rectangle Employee
•EmpID
Underlined •EmpName

Names should be singular nouns


• Cardinality
Many is represented by a line ending in a crow's
foot. If omitted, cardinality is one
• Existence
Represented by placing a circle or a
perpendicular bar on the line
Mandatory existence is shown by the bar next to
the entity for an instance that is required
Optional existence is shown by placing a circle
next to the entity that is optional

• 14/01/2009© Lord's Kitchen 54


E-R Modeling - Assignment
How to create an E-R Model from Requirements ?

Step 1: Identify Entities
• Entities are things people talk about, record information about
and do work on – by definition
• Any keyword (noun) is a candidate
• Identify generic object from reference to instances or
occurrences
• Combine synonyms to represent a single entity


An Example : Purchase Order - System Requirements

A buyer creates a purchase order (PO) as and when the need arises. A PO is for a

Specific vendor. A PO has one or more line items. A buyer cannot create a PO of

Total value more than his approval limit. A PO can be sent to the vendor by mail,

fax, EDI. A PO can be canceled before it is submitted. A PO can be linked to a

sales order…

14/01/2009© Lord's Kitchen 55
E-R Modeling
 Step 1: Identify Entities
Entities

Purchase Order (PO)
 Buyer?

Vendor

Line Items
 Sales Order

Approval Limit?
Buyer characterizes a PO
 Approval Limit characterizes a Buyer

 What does it tell us?

Approval Limit is not an entity


Buyer is an entity
Approval Limit is an attribute of the entity Buyer



14/01/2009© Lord's Kitchen 56
E-R Modeling
Step 2: Identify Relationships

 Look for phrases describing a link between two things or


objects
 Verbs relating two nouns often suggest relationships
 e.g. A buyer creates a purchase order, A purchase order has
one or more
 Lines
 Requirements may or may not contain information regarding
degree,
 existence, cardinality of a relationship up front
 Further questioning may need to be done to determine the
above

14/01/2009© Lord's Kitchen 57


E-R Modeling
 Step 2: Identify Relationships
 Grid Technique

PO
PO replaced by Buyer

Buyer creates a is approver Vendor


of
Vendor supplies - - Line
against a
Line belongs to a - created for -
item supplied
by

14/01/2009© Lord's Kitchen 58


E-R Modeling
Step 2 : Identify Relationships
•Analyzing Existing Systems (Files, Databases)
•Look for -
Pointers
Foreign Keys
Repeating Groups
Structured Codes
•All of the above imply relationships

14/01/2009© Lord's Kitchen 59


E-R Modeling
• Step 3 : Identify Attributes
• An attribute is any detail that server to identify, classify, quantify or
express the the state of an entity
• Ask the following question for each entity “What information do you
need to know or hold about …?”
• Potential attributes are easily found by examining paper forms

14/01/2009© Lord's Kitchen 60


E-R Modeling
• Step 3: Identify Attributes
Example Purchase Order Form
Purchase Order No __________ •Purchase Order No
Buyer _________ Vendor ___________ •Vendor
Date Created ______ • Buyer
• Date Created
• Item?
No Item Quantity Value • Address
___ ___________ ______ • City
__________ • State
• Zip
___ ___________ ______ • Total Value?
__________
Shipping Address
___ ___________ ______
Street _________
__________
City __________
Zip _______ Total Value ______

14/01/2009© Lord's Kitchen 61


E-R Modeling
 E-R Model of the Purchase Order Example

creates created for a


PURCHASE
BUYER
ORDER
created by

has supplies against

belongs to VENDOR

exists on
ITEM LINE
created for

14/01/2009© Lord's Kitchen 62


E-R Modeling
Major Modeling Techniques
Peter Chen’s original entity/relationship
diagrams
Information Engineering
Richard Barker’s notation, used by Oracle
corporation
IDEF1X
Object Role Modeling
Unified Modeling Language (UML)
Extensible Markup Language (XML)

14/01/2009© Lord's Kitchen 63


E-R Modeling
Major Modeling Techniques
• Data Modeling has sets of two audiences:
User community - Uses the models to verify that the analysts
understand their environment and their requirements.
Systems designers - Use the business rules implied by the
models as the basis for their design of computer systems.
• Different techniques are better for one audience or the
other.
• All techniques are fundamentally the same
• Differences are mainly in syntactic or notational

14/01/2009© Lord's Kitchen 64


Relational Model
Objective :
•To give an informal introduction to relational
concepts especially as they
•relate to relational database design issues.
What it is not ?
 This does not give a complete description of relational
theory.

14/01/2009© Lord's Kitchen 65


Relational Model
Formally introduced by Dr. E. F. Codd in 1970
Represents data in the form of two-dimension
tables
A relational database is a collection of two-
dimensional tables
Basic understanding of the model needed to
design and use relational databases

14/01/2009© Lord's Kitchen 66


Relational Model
Tables, Columns and Rows
Relationships and Keys
Data Integrity
Normalization
What is a table?
• Represents some real-world person, place, thing, or
event
• Two-dimensionalCour s e No. Cour s e _Title C_Hr s .De pt. C
 Columns
CIS 120 Intr o to CIS 4 Cis
 Rows
 M KT 333 Intr o to M k ting 3 M KT
ECO 473 Labor Econ. 3 ECO
BA201 Intr o to Stat. 5 ECO
CIS 345 Intr o to Dbas e 4 CIS

14/01/2009© Lord's Kitchen 67


Relational Model
Table
• Columns represent a property of the person, place,
thing or event that the table represents
• Rows represent an occurrence or instance of what the
table represents
• A data valueEmpid
is stored
Namein the intersection
Level DOJof a rowManager
and
column 101412 John M3 4/10/98 101667
• Each named column has a domain, which is the set of
values that may appear
102235 Nancy in that M4
column1/23/01 101412
101398 Mike S1 8/15/95 101667
101667 Jeff M2 6/2/96 100351
103893 Cindy M3 7/17/95 101284
101116 Rahul S2 2/20/00 101412
Employee
102739 Scott C1 4/13/01 101667

14/01/2009© Lord's Kitchen 68


Relational Model
Table - Terminology

In this document Formal Terms Many Database Manuals

Table Relation Table

Column Attribute Field

Row Tuple Record

14/01/2009© Lord's Kitchen 69


Relational Model
• Salient features of a relational table
• Values are atomic (1NF)
• Column values are of the same kind (Domain)
• Each Row is unique (Primary Key)
• Sequence of columns is insignificant
• Sequence of rows is insignificant
• Each column must have a unique name
• Relationships and Keys
• Keys - Fundamental to the concept of relational databases
• Relationship - An association between two or more tables
defined by means of keys

14/01/2009© Lord's Kitchen 70


Relational Model
• Primary Key
• Column or a set of columns that uniquely identify a row in a
table
• Must be unique and must have a value
• Foreign Key
• Column or set of columns which references the primary key or
a unique key of another table
• Rows in two tables are linked by matching the values of the
foreign key in one table with the values of the primary key in
another
•EMP_ID in table EMPLOYEE is the primary key
•DEPT_NO in table DEPARTMENT is the primary key Examples
•DEPT_NO in table EMPLOYEE is a foreign key

14/01/2009© Lord's Kitchen 71


Relational Model
• Data Integrity
• Ensures correct and consistent navigation and manipulation of
relational tables
• Two types of integrity rules
• Entity integrity
• Referential integrity
• The entity integrity rule states that the value of the primary key
can never be a null value
• The referential integrity rule states that if a relational table has a
foreign key, then every value of the foreign key must either be null or
match the values in the relational table in which that foreign key is a
primary key
Ø

14/01/2009© Lord's Kitchen 72


Relational Model
• Data Manipulation
• Relational tables are equivalent to sets
• Operations that can be performed on sets can be
performed on relational tables
• Relational Operations such as :
• Selection
• Projection
• Join
• Union
• Intersection
• Difference INTERSECTION
• Product
UNION
• Division

14/01/2009© Lord's Kitchen DIFFERENCE 73


Relational Model
• Selection
• The select operator, sometimes called restrict to prevent
confusion with the SQL SELECT command, retrieves subsets of rows
from a relational table based on a value(s) in a column or columns

A B C D E
1 A 212 Y 2
2 C 45 N 84
3 B 8656 N 4
4 D 324 N 56
5 C 5656 Y 34
6 A 445 N 4

7 B 546 Y 55

14/01/2009© Lord's Kitchen 74


Relational Model
• Projection
• The project operator retrieves subsets of columns from a relational
table removing duplicate rows from the result

A B C D E
1 A 212 Y 2
2 C 45 N 84
3 B 8656 N 4
4 D 324 N 56
5 C 5656 Y 34
6 A 445 N 4

7 B 546 Y 55

14/01/2009© Lord's Kitchen 75


Relational Model
• Product
• The product of two relational tables, also called the Cartesian Product, is
the concatenation of every row in one table with every row in the second.
• The product of table A (having m rows) and table B (having n rows) is the
table C (having m x n rows). The product is denoted as A X B or A TIMES B

ak ax ay bk bx by
k x y
1 A 2 1 A 2
Table A 1 A 2
1 A 2 4 D 8
2 B 4 1 A 2 5 E 10
3 C 6 2 B 4 1 A 2
A TIMES B
2 B 4 4 D 8
k x y 2 B 4 5 E 10
1 A 2 3 C 6 1 A 2
Table B
4 D 8 3 C 6 4 D 8
5 E 10 3 C 6 5 E 10
14/01/2009© Lord's Kitchen 76
Relational Model
•Join
•Combines the product, selection and projection operations
•Combines (concatenates) data from one row of a table with rows from
another or same table
•Criteria involve a relationship among the columns in the join relational table

If the join criterion is based on equality of column value, the result is called an equi join
A natural join is an equi join with redundant columns removed
Joins can also be done on criteria other than equality. Such joins are called non-equi joins

k a b k a b k c
1 A 2 Table B Equi-Join 1 A 2 1 aa
2 B 4 3 C 6 3 bb
3 C 6 k c
1 aa
k a b c
3 bb 1 A 2 aa
Table A 5 cc Natural Join 3 C 6 bb
14/01/2009© Lord's Kitchen 77
Relational Model
•Union
•The UNION operation of two tables is formed by appending rows from
one table to those of a second to produce a third. Duplicate rows are
eliminated
•Tables in an UNION operation must have the same number of columns
and corresponding columns must come from the same domain
A Union B
k x y k x y
1 A 2
1 A 2
2 B 4
3 C 6 Table B 2 B 4

3 C 6
k x y
Table A
1 A 2 4 D 8
4 D 8
5 E 10 5 E 10

14/01/2009© Lord's Kitchen 78


Relational Model
•The UNION operation of two tables is formed by appending rows from one table to
those of a second to produce a third. Duplicate rows are eliminated
•Tables in an UNION operation must have the same number of columns and
corresponding columns must come from the same domain

A Union B
k x y k x y
1 A 2
1 A 2
2 B 4
3 C 6 Table B 2 B 4

3 C 6
k x y
Table A
1 A 2 4 D 8
4 D 8
5 E 10 5 E 10

14/01/2009© Lord's Kitchen 79


Relational Model
• Intersection
• The intersection of two relational tables is a third table that
contains common rows. Both tables must be union compatible. The
notation for the intersection of A and B is A [intersection] B = C or A
INTERSECT B

A Intersect B

k x y k x y
1 A 2 1 A 2
2 B 4 Table A 4 D 8
3 C 6
5 E 10

k x y
1 A 2 Table B

14/01/2009© Lord's Kitchen 80


Relational Model
• Difference
• The difference of two relational tables is a third that contains those
rows that occur in the first table but not in the second. The Difference
operation requires that the tables be union compatible.
The notation for difference is A MINUS B or A-B. As with arithmetic, the order of
subtraction matters. That is, A - B is not the same as B - A.

k x y
1 A 2 A MINUS B k x y
2 B 4 Table A 2 B 4
3 C 6
3 C 6

k x y
1 A 2 k x y
B MINUS A
4 D 8 Table B 4 D 8
5 E 10 5 E 10
14/01/2009© Lord's Kitchen 81
Relational Model
• Division
• The division operator results in columns values in one table for
which there are other matching column values corresponding to every
row in another table.

k x y k

1 A 2 1

1 B 4 3
2 A 2 x y
3 B 4 A 2
4 B 4 A DIV B
B 4
3 A 2

Table A Table B

14/01/2009© Lord's Kitchen 82


Normalization
Normalization theory is based on the concepts of normal forms. A
relational table is said to be a particular normal form if it satisfied
a certain set of constraints.

We shall discuss four normal forms in this Module.

What is Functional Dependency ?


The concept of functional dependency is the basis for the first three normal forms.
A column Y of a relational table is said to be functionally dependent upon column X
when values of column Y are uniquely identified by values of column X.

Full functional dependence applies to tables with composite keys. Column Y in relational
table R is fully functional on X of R where X is a composite key if it is functionally
dependent on X and not functionally dependent upon any subset of X.

14/01/2009© Lord's Kitchen 83


Normalization
Un normalized
Relation
Remove
repeating groups
Normalized
Relation (1NF)
Remove partial
dependencies
2 NF
Remove transitive
dependencies

3 NF
Remove remaining
Anomalies resulting
from FD‘s

Boyce/Codd NF
Remove multivalued
dependencies
14/01/2009© Lord's Kitchen 84
Normalization
An Example : A company obtains parts from a number of suppliers. Each
supplier is located in one city. A city can have more than one supplier located
there and each city has a status code associated with it. Each supplier may
provide many parts.

The company creates a simple relational table to store this information:

FIRST (s#, status, city, p#, qty)


s# Supplier identification number
status Status code assigned to city
City City where supplier is located
p# Part number of part supplied
Qty Qty of parts supplied to date
Composite primary key is (s#, p#)

14/01/2009© Lord's Kitchen 85


Normalization
• FIRST NORMAL FORM –1NF

A relational table is said to be in the first normal form if all values of the columns
are atomic. That is, they contain no repeating values.

s# city status p# qty


s1 London 20 p1 300
s1 London 20 p2 100
s1 London 20 p3 200
s1 London 20 p4 100
s2 Paris 10 p1 250
s2 Paris 10 p3 100
s3 Tokyo 30 p2 300
s3 Tokyo 30 p4 200
14/01/2009© Lord's Kitchen 86
Normalization
• SECOND NORMAL FORM – 2NF
• Table FIRST contains redundant data. Redundancy causes update
anomalies.
• Update anomalies - problems that arise when information is inserted,
deleted, or updated.
• INSERT. The fact that a certain supplier (s5) is located in a
particular city (Athens) cannot be added until they supplied a part.
• DELETE. If a row is deleted, then not only is the information about
quantity and part lost but also information about the supplier.
• UPDATE. If supplier s1 moved from London to New York, then six
rows would have to be updated with this new information.

14/01/2009© Lord's Kitchen 87


Normalization
A relational table is in second normal form 2NF if it is in 1NF and every non-key
column is fully dependent upon the primary key. That is, every non-key column
must be dependent upon the entire primary key.

FIRST is in 1NF but not in 2NF because status and city are functionally
dependent upon only on the column s# of the composite key (s#, p#).

Steps for transforming a 1NF table to 2NF is:


1.Identify any determinants other than the composite key, and the columns they
determine.
2.Create and name a new table for each determinant and the unique columns it
determines.
3.Move the determined columns from the original table to the new table.
Determinate becomes the primary key of the new table.
4.Delete the columns you just moved from the original table except for the
determinate which will serve as a foreign key.

14/01/2009© Lord's Kitchen 88


Normalization
SECOND NORMAL FORM – 2NF
PARTS
s# p# qty
s1 p1 300
s1 p2 100 SECOND
s1 p3 200
s# city status
s1 p4 100 s1 London 20
s2 p1 250 s2 Paris 10
s2 p3 100 s3 Tokyo 30
s3 p2 300
s3 p4 200

14/01/2009© Lord's Kitchen 89


Normalization
• SECOND NORMAL FORM – 2NF
• Modification Anomalies
• Tables in 2NF but not in 3NF still contain modification
anomalies:
• INSERT. The fact that a particular city has a certain
status (Rome has a status of 50) cannot be inserted until
there is a supplier in the city.
• DELETE. Deleting any row in SUPPLIER destroys the
status information about the city as well as the association
between supplier and city.

14/01/2009© Lord's Kitchen 90


Normalization
• THIRD NORMAL FORM – 2NF

A relational table is in third normal form (3NF) if it is already in 2NF and every
non-key column is non transitively dependent upon its primary key.

In other words, all non-key attributes are functionally dependent only


upon the primary key.

SUPPLIER
s# city status The table supplier is in 2NF but not in
3NF because it contains a transitive
s1 London 20
dependency
s2 Paris 10 SUPPLIER.s# —> SUPPLIER.city
s3 Tokyo 30 SUPPLIER.city —> SUPPLIER.status
s4 Paris 10 SUPPLIER.s# —> SUPPLIER.status
14/01/2009© Lord's Kitchen 91
Normalization
•Steps for transforming a table into 3NF is:
1.Identify any determinants, other the primary key, and the columns they
determine.
2.Create and name a new table for each determinant and the unique
columns it determines.
3.Move the determined columns from the original table to the new table. The
determinant becomes the primary key of the new table.

CITY_STATUS
SUPPLIER s# city
s1 London city status
s2 Paris London 20
The transformation of s3 Tokyo Paris 10
SUPPLIER into 3NF s4 Paris Tokyo 30
s5 London
Rome 50

14/01/2009© Lord's Kitchen 92


Normalization
•Advantages of 3rd Normal form :
•Eliminates redundant data which in turn saves space and reduces
manipulation anomalies.
Example:
INSERT: Facts about the status of a city, Rome has a status of 50, can be added
even though there is not supplier in that city.
DELETE: Information about supplier can be deleted without destroying
information about a city.
UPDATE: Changing the location of a supplier or the status of a city requires
modifying only one row.

s# city CITY_STATUS
s1 London city status
s2 Paris London 20
The transformation of s3 Tokyo Paris 10
SUPPLIER into 3NF s4 Paris Tokyo 30
s5 London
Rome 50
SUPPLIER
14/01/2009© Lord's Kitchen 93
Normalization
• Advanced Forms :: BOYCE CODD NORMAL FORM
Many practitioners argue that placing entities in 3NF is generally
sufficient because it is rare that entities that are in 3NF are not also in
4NF and 5NF. The advanced forms of normalization are:
Ø

•Boyce-Codd Normal Form


•Fourth Normal Form
•Fifth Normal Form
Boyce-Codd normal form (BCNF) is a more rigorous version of the 3NF.
BCNF is based on the concept of determinants. A determinant column is
one on which some of the columns are fully functionally dependent.
A relational table is in BCNF if and only if every determinant is a
candidate key.

14/01/2009© Lord's Kitchen 94


Database Design
 This section presents and discusses –
• How to translate the E-R (conceptual) model (diagram) to
an RDBMS (logical) schema.
• Exercise on E-R Modeling and Database Design

 Some Guidelines -
• Entities: Create one table for each simple (not a sub-type or super-type)
entity.
• Attributes: Map each attribute to a candidate column with a more
precise format.
• Optional attributes become null columns
• Mandatory attributes become not null columns
• Unique Identifier: Convert the components of the unique identifier to
the primary key of the table.
Ø

14/01/2009© Lord's Kitchen 95


Database Design
• Sub-types: A sub-type entity is simply an entity with its own attributes or
relationships, but it also inherits any attributes and/or relationships from its
parent entity (super-type)
• 1:1 relationships: Merge the two entities into a single table, keeping all
attributes. Identify (add if needed) the primary key.
• 1:Many relationships: Create two tables, one for each entity. Post the
primary key from the 1 side to the N side (add attributes), and identify it as
a foreign key. (Add the primary key from the 1 side to the attributes on the
Many side. The posted attributes are a foreign key.)
• M:N (Many:Many) relationships: Create a new (bridge) table and post the
primary keys from both entities as attributes in the new table. The posted
attributes are foreign keys.

14/01/2009© Lord's Kitchen 96


Database Design
A few comments…

§ There are more rules, treating exceptions, but these


are good enough in most cases
§ There may occur reasons to violate the rules.
§ Always: use common sense and expect iterative
development.
§ Use CASE tools like ERWin wherever possible. Tools can
automatically generate SQL table definitions from
drawn E-R diagrams.

14/01/2009© Lord's Kitchen 97


Database Design:: Assignment

Develop an E-R
model and database
schema for a system
to handle purchase
orders.

14/01/2009© Lord's Kitchen 98


Creating a DB environment : Summary
 The first step in designing a database application is to understand
what information the database needs to store and what integrity
constraints or business rules apply to the data.
 Data Model is to a Database what a Building plan or a blueprint is to
a Building. It is the conceptual model of the Database.
 Given a relational schema we need to decide whether it is a good
design or whether we need to decompose it into smaller relations.
Normalization gives the guidance to such decomposition.


14/01/2009© Lord's Kitchen 99


4.0 Structured Query Language
 Learning Objectives:
 At the end of this Topic you will be able to
 Write simple SQL queries
 Get familiar with the various relational operations
such as SELECT,
 PROJECT and JOIN

14/01/2009© Lord's Kitchen 100


An Introduction
• Structured Query Language - (SQL) is the most widely
used commercial relational database language. The SQL
has several parts :

•DML – The Data Manipulation Language (DML)


•DDL – The Data Definition Language (DDL)
•Embedded and dynamic SQL
•Security
•Transaction management
•Client-server execution and remote database access

SELECT column-list FROM table-names WHERE condition(s)

14/01/2009© Lord's Kitchen 101


Query Processing

Query Processing
•Query in a High Level Language (typically a 4 GL)
•Parsing : The parser converts a query, submitted by
a database user and written in a high-level
language, into an algebraic operators expression.
•Optimization : It is the key Topic for query
processing design. It receives the expression and
builds a good execution plan. The plan determines
the order of execution of the operators and selects
suitable algorithms for implementation of the
operators.
•Code Generation for the Query : The planned code
is built with the aim of retrieving the result of the
query with high performance.
•Code execution by Database Processor
© Lord's Kitchen
: The
Query Processing

SELECT column-list
FROM table-names
WHERE condition(s)

Conditional
Selection

14/01/2009© Lord's Kitchen 103


Query Processing
• The SQL Select Statement performs three Types of Operations

1. Projection

SELECT column-list FROM tables-names

WHERE condition(s)

2. Join

3. Selection

14/01/2009© Lord's Kitchen 104


Performing Projection
SELECT Module_Title, C_Hrs FROM Module

Module Result Table

Course No. Course_Title C_Hrs. Dept. C Course_Title C_Hrs.


CIS 120 Intro to CIS 4 Cis Intro to CIS 4
MKT 333 Intro to Mkting 3 MKT
Intro to Mkting 3
ECO 473 Labor Econ. 3 ECO
Labor Econ. 3
BA201 Intro to Stat. 5 ECO
CIS 345 Intro to Dbase 4 CIS Intro to Stat. 5

14/01/2009© Lord's Kitchen 105


Performing a Selection Operation
SELECT * FROM Module WHERE C_Hrs = 4

Course N o. Course T itle C. H rs.D ep t. C


CIS 120 Intro to CIS 4 Cis
M K T 333 Intro to M kting 3 M KT
Module
ECO 473 Labor Econ. 3 ECO
BA 201 Intro to Stat. 5 ECO
CIS 345 Intro to D base 4 CIS

Course No. Course Title C. Hrs. Dept. C


CIS 120 Intro to CIS 4 Cis Result Table
CIS 345 Intro to Dbase 4 CIS

14/01/2009© Lord's Kitchen 106


Performing both Projection and
Selection
SELECT Module_Title, C_Hrs FROM Module WHERE Dept_C =‘CIS’

Module Result Table

Course_No Course_Title C_ Hrs. Dept_C Course_Title C_ Hrs.


CIS 120 Intro to CIS 4 CIS Intro to CIS 4
MKT 333 Intro to Mkting 3 MKT Intro to Dbase 4
ECO 473 Labor Econ. 3 ECO
BA201 Intro to Stat. 5 ECO
CIS 345 Intro to Dbase 4 CIS

14/01/2009© Lord's Kitchen 107


Performing both Projection and
Selection
Basic SELECT Statement WHERE Clause Operators
• =, <, >, <=, >=
• IN (List)
 WHERE CODE IN (‘ABC’, ‘DEF’, ‘HIJ’) - would return only rows
with
 one of those 3 literal values for the code
attribute
• BETWEEN min_val AND max_val
WHERE Qty_Ord BETWEEN 5 and 15 - would return rows where
 Qty_Ord is >= 5 and <= 15 - Works on character data using
ascending alphabetical order
• LIKE “literal with wildcards” % used for multiple chars. _ single char.
WHERE Name LIKE ‘_o%son’ - returns rows where name has o as
the 2nd character and ends with son - Torgeson or Johnson
• NOT
WHERE NOT Name = ‘Johnson’ - would return all rows where
name <> Johnson - lowest priority in operator order
• AND and OR, Use Parentheses to control order

14/01/2009© Lord's Kitchen 108


Joining Tables
Joining Tables

•To appropriately join tables, the tables must be


related and we apply a where clause which equates
the primary key column of the table on the one side
of the relationship with the parallel foreign key
column of the many side table.

This type of join is called an Equi-join.
 Our example will join Modules and departments where
dept code is the linking “key” column.
•The next series of slides takes you through a step by
step process of combining data rows from one table
with data rows in another table.
•The first slide introduces the SQL Select statement
the shows the join operation14/01/2009©
and aLord'spicture
Kitchen
of the 109
Joining Tables
Joining Two Tables - Select and Tables

SELECT * FROM Module C, department D


WHERE D.Dept_Code = C.Dept_Code
Module
Course_No Course_Title C_Hrs Dept_Code
CIS 120 Intro to CIS 4 Cis
MKT 333 Intro to Mkting 3 MKT
ECO 473 Labor Econ. 3 ECO
BA201 Intro to Stat. 5 ECO
CIS 345 Intro to Dbase 4 CIS

Department
SQL will compare every row of the
Dept Code Dept name Office#
1st table with the first row of the 2nd MKT Marketing 244
table. Then it will compare all rows of CIS Comp. Info. Sys. 302
the 1st with the second row of the second, ECO Economics 244

and so on only rows where the condition


is met are placed in the result table.

14/01/2009© Lord's Kitchen 110


Joining Tables
Joining Two Tables - Row 1 Module to Row 1 Dept
SELECT * FROM Module C, department D WHERE D.Dept_Code = C.Dept_Code

No match so row not


Course_No Course_Title C_Hrs Dept_Code
placed in results
CIS 120 Intro to CIS 4 CIS
MKT 333 Intro to Mkting 3 MKT
ECO 473 Labor Econ. 3 ECO
Module BA201 Intro to Stat. 5 ECO
CIS 345 Intro to Dbase 4 CIS Department
Dept Code Dept name Office#
MKT Marketing 244
CIS Comp. Info. Sys. 302
ECO Economics 244
RESULT TABLE

Course_No Course_Title C_Hrs Dept_Code Dept_Name Office#

14/01/2009© Lord's Kitchen 111


Joining Tables
Joining Two Tables - Row 1 Module to Row 2 Dept
SELECT * FROM Module C, department D WHERE D.Dept_Code = C.Dept_Code
Match on condition
Course_No Course_Title C_Hrs Dept_Code causes a result row to
CIS 120 Intro to CIS 4 Cis
be produced.
Module

MKT 333 Intro to Mkting 3 MKT


ECO 473 Labor Econ. 3 ECO
BA201 Intro to Stat. 5 ECO Department
CIS 345 Intro to Dbase 4 CIS
Dept Code Dept name Office#
MKT Marketing 244
CIS Comp. Info. Sys. 302
ECO Economics 244

RESULT TABLE
Course_No Course_Title C_Hrs Dept_Code Dept_Name Office#
CIS 120 Intro to CIS 4 Cis Comp. Info S 302

14/01/2009© Lord's Kitchen 112


Joining Tables
Joining Two Tables - Row 1 Module to Row 3 Dept
SELECT * FROM Module C, department D WHERE D.Dept_Code = C.Dept_Code

Course_No Course_Title C_Hrs Dept_Code


CIS 120 Intro to CIS 4 Cis
MKT 333 Intro to Mkting 3 MKT Department
ECO 473 Labor Econ. 3 ECO
Module BA201 Intro to Stat. 5 ECO Dept Code Dept name Office#
CIS 345 Intro to Dbase 4 CIS MKT Marketing 244
CIS Comp. Info. Sys. 302
ECO Economics 244

RESULT TABLE
Course_No Course_Title C_Hrs Dept_Code Dept_Name Office#
CIS 120 Intro to CIS 4 Cis Comp. Info S 302

14/01/2009© Lord's Kitchen 113


5.0 Internal Management
Learning Objective
◦ After completing this topic you will be able to :
 Describe the various components of the computer system
that provide data storage facilities to a DBMS
Understand how DBMS communicates with the host system
Outline some of the database tuning factors

14/01/2009© Lord's Kitchen 114


Computer file management and
DBMS
 Computer files are stored in external media such as disks and tapes.
• Direct access
• Sequential access
 Input output of data and memory management is managed by the
Operating system
• File manager
• Disk manager
DBMS


File Request

File Manager DBMS/Host inter-com

Logical Physical
Page Req. Disk Manager
Page Access

14/01/2009© Lord's Kitchen 115


Intercommunication
 DBMS/Host communication :
• A file is a collection of pages. A page is a unit of Input
Output.
• The DBMS sends a file request to the file manager.
• The file manager has no idea where the requested page is
physically stored.
• The file manager in turn communicates with the disk
manager.
• The file manager provides the database system with the
given page.
• The database system converts the same into a logical form as
understandable by the user.
• 14/01/2009© Lord's Kitchen 116
Tuning at the internal level
 Indexes
• Database indexes are important means of speeding up access
to set of records. Especially in a relational database.
• Index is very useful in existence tests.
• Once a index is created it is transparent to the user.
 Hashing
• Hashing is directly determining a page address for a given
record without the overhead of creating indexes.
• The main problem associated with hashing are overflow &
underflow.
 Clusters
• Physically storing related pages in the form of intra file
subsets.
• Inter file clustering to store records from distributed
databases in the same physical page.

• 14/01/2009© Lord's Kitchen 117
Internal Management : Summary
 Database files are stored in logical page sets.
 The underlying physical files that store a database need not
map to the logical representation of the DBMS.
 Indexes are useful means of speeding up data access in large
databases . They incur overheads.
 Hashed functions speed up individual record access, however
has overflow & underflow problems.
 Intra and inter file clustering of the physical records speed
up certain operations at the cost of other types of data
manipulations.


14/01/2009© Lord's Kitchen 118
6.0 Database Trends
Learning Objective
– At the end of this Topic you will be :
•Familiar with various terms like
• OLAP
• Data warehousing
• Data mining
•Aware of the business needs that require data to
be analyzed in multiple dimensions

14/01/2009© Lord's Kitchen 119


Multidimensional Data Analysis
On-line analytical processing (OLAP)
Multidimensional data analysis
Supports manipulation and analysis of large
volumes of data from multiple
dimensions/perspectives

14/01/2009© Lord's Kitchen 120


Types of databases
• Major Types of Databases

D a t a b a s e s

c e n t r a l d i s i se t d r i b d u a n t t e ea d t b w a d o s a r e kt a s

© Lord's Kitchen
Centralized database
 Used by single central processor or multiple processors in

client/server network

disk Tape Drive


printer

CPU Disk Printer Tape drive


Controller Controller Controller

System bus
Memory Controller

Memory

14/01/2009© Lord's Kitchen 122


Distributed
 database
Stored in more than one physical location
•Partitioned database
•Duplicated database

14/01/2009© Lord's Kitchen 123


Multidimensional data model

On-line analytical processing (OLAP)


•Multidimensional data analysis
•Supports manipulation and analysis of large volumes of
data from multiple dimensions/perspectives

14/01/2009© Lord's Kitchen 124


Data warehouse
 Supports reporting and query tools
 Stores current and historical data
 Consolidates data for management analysis and decision making

14/01/2009© Lord's Kitchen 125


Data warehouse

Data mart
•Subset of data warehouse
•Contains summarized or highly focused portion of data
for a specified function or group of users
Data mining
•Tools for analyzing large pools of data
•Find hidden patterns and infer rules to predict trends

14/01/2009© Lord's Kitchen 126


Databases and the web

Hypermedia database
• Organizes data as network of nodes
• Links nodes in pattern specified by user
• Supports text, graphic, sound, video and executable
programs

14/01/2009© Lord's Kitchen 127


Databases and the web
 Database server
• Computer in a client/server environment runs a DBMS to process
SQL statements and perform database management tasks
 Applicationserver
 Software handling all application operations

14/01/2009© Lord's Kitchen 128


Database Trends : Summary
The database forms the backend for any kind of
application architecture be it a client server,
distributed system such as the web etc.
Users want to see data in as many dimensions
possible, therefore it is important to be aware
of concepts regarding Data warehousing , Data
mining and On-line analytical processing (OLAP)

14/01/2009© Lord's Kitchen 129


14/01/2009© Lord's Kitchen 130
Database Fundamentals: Next
Step

Resource Description Reference Topic or Topic


Type
Book Case*Method: Entity Relationship
Modeling - Richard Barker
Book Data & Databases – Joe Celko
Book An Introduction to Database Systems
– C. J. Date
Book The Data Modeling Handbook -
Reingruber and Gregory
Book Data Modeling for Information
Professionals – Bob Schmidt
Book Data Model Patterns – David C. Hay,
Richard Barker
Congratulations!
You have successfully completed

Database
Fundamentals

14/01/2009© Lord's Kitchen 132