Vous êtes sur la page 1sur 89

Compiled by P.

Chamanga

DATABASE SYSTEMS CONCEPTS & DESIGN

1. DATABASE ENVIRONMENT

Definition of a Database:
It is a shared collection of interrelated data designed to meet the varied information needs of an
organisation.

It is integrated and it is shared


Integrated - previously distinct data files have been logically organised to
eliminate (or reduce) redundancy and to facilitate data access.
Shared - all qualified users in the organisation have access to the same
data, for use in a variety of activities.

It is a structured collection of stored operational data used by all the application systems of an
organisation. It is independent of any individual application

It is a central source of data to be shared by many users for a variety of related applications.

Data as a Resource:
Information, which is the analysis and synthesis of data is one of the most vital of corporate
resources of late.

Structured into models for planning and decision making


Incorporated into measurement of performance and profitability
Integrated into product design marketing methods

Information is recognised and treated as an asset

Acceptance of data resource management is demonstrated by:


Firm commitment to the data base approach
Successful established of the data administration function

Database Concepts:
The two essential concepts are based on:
A data model is the logical structure of the data as it appears at a particular level of the
database system. Each application, which uses a database, has its own data model

Data Models
How data appears as viewed by different applications using the same database system.
E.g. customer accounts file contain details about goods - stock file contain details about
goods
Data Independence
Data models are not affected by any changes in storage techniques
Central data model & associated data models are distinct from the arrangement of data on
any particular storage media.

Reality, Data and Metadata


The real world itself will be referred to as reality. Data collected about people, places, or events
in reality will eventually be stored in the file or database. In order to understand the form and
structure of the data, information about the data itself is required. The information that describes
the data is known/referred to as metadata. The relation between data, reality and metadata is
pictured as follows: -

OBJECTS, EVENTS DATA DICTIONARY DATA BASE


DIRECTORY

1
Compiled by P. Chamanga

Entity Class Record Definitions Record Occurrences

Attributes Data item definitions Data item occurrences

Reality Metadata Data


(real world) (data definition) (data occurrences)

Entity
An object or event about which someone chooses to collect data is an entity. An Entity may be a
person, or a place for example, a sales person, a city or a product. An entity can be also an event
or unit of time, such as a machine breakdown, a sale, or a month or a year.

Entity Class
It is a collection of entities with similar characteristics. It is known as Entity Sets/Entity Types.
It is grouped for convenience.

2
Compiled by P. Chamanga

Attribute
It is a property of a real world-entity rather than as a data-oriented term.
It is a property of an entity eg. Customer
Customer Number
Customer Name
Address
Telephone
Credit Limit
Balance

An attribute is some characteristics of an entity. There can be many attributes for each entity for
example, a patient can have many attributes, such as last name, first name, address, city and so
on.

The word data item is also used in conjunction with an attribute. Data element is simply a
synonym for data item.

Data items can have values. These values can be of fixed or variable length. They can be
alphabetic, numeric or alphanumeric. Sometimes a data item can be referred to as a field.

A field represents something physical not logical, therefore many data items can be packed into a
field. A field can be read and can be converted to a number of data items. A common example of
this is to store the date in a single field as mm/dd/yyyy. In order to sort the file, in the order of
date, three separate data items are extracted from the field and sorted first by year, then by
month, and finally by day.

Typical values assigned to data items may be numbers, alphabetic characters, special characters,
and a combination of all three. These can be illustrated as follows: -

3
Compiled by P. Chamanga

ENTITY DATA ITEM VALUE


Salesperson Salesperson number 77865
salesperson name Thompson
Company name Ceata Enterprises
Age 40
Address 42 Musasa Close $15,800.00
Sales
Package Code A209
Width 16
Weight 32
Mailing Address Box 1294, Harare
Return address Box A2098, Mutare
Order Order Number 53541 H
Description Shirts
Quantity ordered 120
Amount $1,500.00
Order placed by Takura

Identifier
This is an attribute that uniquely distinguishes an entity from the rest eg. EC Number identifies
an employee.

Association
Forms a relationship between two or more entities eg.

Direct representation of association between entities distinguishes data base approach from
conventional file application.

Relationships
These are associations between entities (sometimes they are referred to as data associations).
They imply that values for the associated data items are in some way dependent on each other.

Represents an entity

Represents a relationship
Records
A record is a collection of data items that have something in common with the entity described.
Below is a diagram to illustrate the structure of a record

4
Compiled by P. Chamanga

Order File
Order# Description Quantity Amount

Shirt 1200 35000 R


E
Short 1000 16000 C
O
A003 Dress 2000 99000 R
D
A004 Trousers 1300 75000 S

A005 Vests 1100 12000

Keys
A key is one of the data items in a record. When a key uniquely identifies a record, it is called a
primary key for example order# can be a primary key because there is only one number assigned
to each customer order. In this way a primary key identifies the real world, that is customer
order.

A key is called a secondary key if it can not uniquely identify a record. Secondary keys can be
used to select a group of records that belong to a set for example orders that come from the city of
Mutare. When it is not possible to identify a record uniquely by using one data item found in a
record a key can be constructed by choosing two or more data items and combine them.

When a data item is used as a key in a record, the description is underlined therefore in the order
record: -

(order#, description, quantity, amount) the key is order#.

If an attribute is a key in another file it should be underlined with a dashed line (_ _ _ _ _) and it
is a foreign key in this file.
Metadata
Metadata is data about the data in the file/database.
It describes the name given, type and the length assigned to each data item
It describes the length and composition of each of the records
It is kept in a Data Dictionary

Example
Data Item Data Type Length
Name Character 10
Surname Character 15
Date of Birth Date 10
Weight Numeric 2

Data item
This is a unit fact, the smallest named unit of data in a database that has meaning to a user.
It is also known as data element, field, or attribute.
Preferences:
Data item - unit of data
Field - is physical rather than logical term that refers to the column position within a record
where a data item is located.
Examples:
Employee-Name, Student#

Data Aggregate
It is a collection of data items that is named and referenced as a whole
Example:
NAME = Last-Name, First-Name, Initials

5
Compiled by P. Chamanga

In COBOL data aggregates are referred to as group items. In the data dictionary they should
include; data aggregate name, description, names of the included data items.

Logical & Physical Redundancy

Data Modelling Notation

Conceptual & Logical Design Phases

Static & Dynamic Properties of Data Models

6
Compiled by P. Chamanga

TRADITIONAL FILE PROCESSING SYSTEMS


It is programming with files.

Each user defines and implements the files needed for a specific application.

Data records are physically organised on storage devices using either sequential or random file
organisation, so that, each application has its own separate data file or files and software
programs.

Example:

User1: Grade reporting Officer


Keeps a file on students and their grades
Implements program to print students transcript and enter new grades into the file

User2: Accounting Officer


Keeps track of students' fees and payments

Although both users are interested in data about students, each maintains separate files, programs
to manipulate these files, each requires data not available from the other's files

Results in redundancy in defining and storing data resulting in wastage of storage space &
redundant efforts to maintain common data up-to-date

In the Database Approach, a single repository of data is maintained, defined once and accessed by
various users.

Advantages & Disadvantages


In a Traditional File Environment all the methods of file organisation are associated with
individual files and individual software programs.

What if the information required to solve a particular problem is located in more than one file?

Often extra programming and data manipulation will be required to obtain that information, for
example:-
Suppose you want to know all of the orders outstanding for a particular
customer. Some of the information is maintained in the order file, for an order
entry application. The rest of the information is maintained in a customer
master file. Thus the required information is stored in several files, each of
which is organised in a different way. To extract the required information,
there is need to sort both files until the records are arranged in the same order.
Records from these files will have to be matched, and the data items from the
merging of both files will have to be extracted and output.

- Obtaining this information requires additional programming and creation of more files
- Most organisations have developed information systems one at a time, as the need arises,
each with its own set of programs, files and users. After some time, these applications
and files may reach to a point where the organisation's information resources maybe out
of control.
- Some symptoms of this crisis are:
Data redundancy (similar data in different files)
Program or Data Dependency
Data Confusion (caused by continuously opening and closing different
files)
Excessive costs

1. Data Redundancy

7
Compiled by P. Chamanga

Refers to the presence of duplicate data in multiple data files or in several data files.
The same piece of data, such as employee name and address, will be maintained and
stored in several different files by several systems. Separate software programs must be
developed to update this information and keep it current in each file in which it appears.

2. Program/Data Dependency
Refers to the close relationships between data stored in files and specific software
programs required to update and maintain these files. Every computer program or
application must describe the location of the data it uses. In a traditional file
environment, any change to the format or structure of the data in the file necessitates a
change in all of the software programs that use the data.

3. Data Confusion (Inconsistency of data)


Refers to inconsistency among various representations of the same piece of data in
different information systems and files. Over time , as different groups in a firm update
their applications accordingly to their own business rules, data in one system become
inconsistent with the same data in another system for example, the student name and
addresses maintained in a college student enrolment system and in a separate system to
generate mailing labels may not correspond exactly if each system is updated with
different software programs, procedures and time frames.

4. Excessive Software Costs


Normally result from creating, documenting and keeping track of so many files and
different applications many of which contain redundant data.

These problems can be easily viewed or pictured or visualised through the following illustrations:

Illustration Of Traditional File System

DATA FILES APPLICATIONS USERS

Cust Name Savings


Social Security# Savings Accounting
Savings A/C ID Account System
A/C Balance

Cust Name
Social Security#
Address Loan
Loan A/C ID Loan Accounting
Interest Rate Account System
Loan Period
Loan Balance

Cust Name
Social Security#
Address Checking Checking
Checking A/C ID Account Accounting
Account Balance System

Class discussion on advantages of Traditional File Environment


What are the justifications of Database on Organisation?

Advantages: 1. Easy to create and simple to use


2. Require minimal overheads to access and use

Data Base Approach

8
Compiled by P. Chamanga

Characteristics Of Database Approach Versus Traditional File Processing Approach:

In the conventional file processing, the user defines and implements files for specific
applications. In the database approach, a single repository of data is maintained and defined
once and accessed by various users.

Four characteristics most important in distinguishing a database system from a traditional file
processing system are:

i) Self-contained nature of database system:


The database system contains not only the database itself but also the complete
definition or description of the database. The definition (or metadata) is stored in a
system catalog.

In traditional file processing, data definition is typically part of application programs

ii) Insulation between programs and data:


In traditional file processing systems, the structure of data files is embedded in access
programs. Hence, any change to the structure of a file may require changing all
programs that access the file.

In a database system, the DBMS access programs are written independently of any
specific files. The structure of data files is stored in the DBMS catalog separately from
the access programs. This is called program-data independence.

iii) Data abstraction:


A DBMS should provide users with a conceptual representation of data that does not
include many of the details of how it is stored. A data model is a type of data abstraction
that provides this conceptual representation. The data model uses logical concepts such
as objects, their properties, and their interrelationships, which may be easier for users to
understand than storage concepts.

iv) Support of multiple views of data:


A database typically has many users each with a different perspective or view of the
database. A view may be a subset of the database or it may contain virtual data that is
derived from the database but not explicitly stored.

DBMS CONCEPTS

Data Model, Schemas and Instances

A data model is the main tool for providing abstraction. It is a set of concepts used to describe
the structure of a database. It includes a set of operations for specifying retrievals and updates.

It is important to distinguish between the description of a database and the database itself. The
description of a database is called a database schema. The description of an entity is called a
schema. Data in the database at a particular moment is called a database instance.

DBMS ARCHITECTURE

Here we are looking at an architecture for database systems, called the three-level-schema
architecture.

Three-Level schema Architecture

The goal of the three-level schema architecture is to separate the user applications from the
physical database. In this architecture, schemas can be defined at the following three levels:

9
Compiled by P. Chamanga

The internal level has an internal schema, which describes the physical storage structure of the
database. It is the one closest to the physical storage, that is, it is the one concerned with the way
the data is physically stored. This is usually the one taken by the systems programmers. The
systems programmer is concerned with the actual physical organisation and placement of the data
element in the database. The internal view is the internal or hardware view of the database. The
internal schema uses a physical data model and describes the complete details of data storage and
access paths for the database. The systems programmer designs and implements this view by
allocating cylinders, tracks and sectors for the various segments of the database, so that various
programs can run as smoothly and efficiently as possible.

The conceptual level has a conceptual schema, which describes the structure of the whole
database for a community of users. It is a logical view. It is how the Database appears to be
organised to the people who designed it. The conceptual schema is a global description of the
database that hides the details of physical storage structures and concentrates on describing
entities, data types, relationships and constraints. It is the view usually used by the Database
Administrator. It includes all the data elements in the Database and how these data elements
logically relate to each other.

The external or view level includes a number of external schemas or user views. It is the one
concerned with the way the data is viewed by individual users, and is usually used by an
application programmer. Each external schema describes the database view of one group of
database users. Each view typically describes the part of the database that a particular user group
is interested in and hides the rest of the database from that user group.

10
Compiled by P. Chamanga

The three-schema architecture is illustrated below

END USERS

EXTERNAL EXTERNAL EXTERNAL


LEVEL VIEW 1 ... VIEW n

external/conceptual
mapping

CONCEPTUAL CONCEPTUAL
LEVEL SCHEMA

conceptual/internal
mapping

INTERNAL INTERNAL
LEVEL SCHEMA

STORED
DATABASE

11
Compiled by P. Chamanga

UserA2 UserB1 UserB2 ...

External External
View A View B

External/Conceptual External/Conceptual
Mapping A Mapping B

Conceptual DBMS
View

Conceptual/Internal Mapping

Internal View
Stored Database

12
Compiled by P. Chamanga

Data Independence

The three-schema architecture can be used to explain the concept of data independence, which
can be defined as the capacity to change the schema at one level of a database system without
having to change the schema at the next higher level. There are two types of data independence:

Logical data independence is the capacity to change the conceptual schema without
having to change external schemas or application programs. We may change the
external schema to expand the database by adding a new record type or data item, or to
reduce the database by removing a record type or data item.

Physical data independence is the capacity to change the internal schema without
having to change the conceptual (or external) schemas. Changes to the internal schema
may be needed because some physical files are reorganised for example, by creating
additional access structures to improve the performance of retrieval or update. If the
same data as before remains in the database, we should not have to change the
conceptual schema.

Benefits & Risks

Benefits/Advantages Of Database Approach

1. Reduced Data Redundancy


A Database minimises duplication of data from file to file, thus a student's name and
address might appear in only one record in a University Database rather than in the files
of many departments. Only one copy of each data item is kept, duplication of data is
eliminated.

Improved consistency of data while reducing the waste in storage space due to a reduced
redundancy.

2. Data Independence
A Database system keeps descriptions of data separate from the applications that can
occur without necessarily requiring changes in every application program that uses the
data.

Promotes data independence, which insulates application programs from modification of


the database.
3. Data Sharing
Permission to develop new data processing applications without having to create new data files
through data sharing. Data can be accessed from a single central source for use by many users for
different applications.
4. Permits centralised control over data standards, security restrictions, and integrity controls.
Uniform system of security monitoring via centralised system.

5. Encourages use of powerful query languages by users without previous programming


experience. The result could be reduction in program maintenance cost. The cost to upgrade
application programs in response to changes in the file structure. Some database also store
data in ways that do not depend on the storage media used. Thus if new disk drives are
purchased, the data may not need to be recognised to remain accessible to the application
programs using them.

6. Increased application programmers and user productivity

Most DBMS offer application program development tools that help application
programmers in writing program codes. These tools can be very powerful, and they
usually improve an application programmers productivity substantially. Object oriented
databases provide developers with libraries of reusable codes to speed up development of

13
Compiled by P. Chamanga

applications. Users also increase their productivity when Query Languages and report
generators allow them to produce reports from the database with little technical
knowledge and without any help from the programmers, thus avoiding the long time
periods that MIS departments typically take to develop new applications. The result is
greater use of the corporate database for ad-hoc queries. Users also increase their
productivity when they use microcomputer software designed to work with mainframe
database. This allows them to acquire and manipulate data with easy, without requiring
the assistance of programmers.
7. Improved Data Integrity: Because data redundancy is minimised and the threat to data
integrity is reduced. Data integrity ensures that the data in the database is accurate.
Updated values are available to all applications and it ensures data consistency to all
applications.

8. Reduced data complexity


Complexity is reduced by consolidated management of data, access and utilisation via
the DBMS.
9. Eliminates data confusion
Data confusion can be eliminated because there is one and only one source and
definition for the data.
10. Setting up of new applications made easy
It is a matter of just extending the database and providing new interfaces because most
of the data is already available. This is time saving in that there is no need for starting
from scratch.

14
Compiled by P. Chamanga

Problems/Disadvantages Of Databases

DBMS provide many opportunities and advantages, but these advantages may come at a price.
DBMS also poses problem as:-

1. Resource Problems
Characterised with high initial investment and possible need for additional hardware. It
requires large software system for creation and maintenance. It also requires a fairly
large computer to support it. A Database system usually requires extra computing
resources. After all, the new database system programs must run much more data, must
be stored on-line to answer queries, which we hope will increase. As a result much more
terminals may be needed to put managers and other users on-line, to the additional hard
disk system, which may be needed to put more data on-line and make it available to
managers. Communications devices maybe needed to connect the extra terminals to the
database. It maybe even necessary to increase the size or number of CPUs to run the
extra software required by the database system.

Currently PCs are becoming more powerful and DBMS becoming more compact
therefore the problem is becoming less serious. It is also being overcome by availability
of distributed relational databases

2. Security Problems
A database must have sufficient controls to ensure that data is made available to
authorised personnel only and that adding, deleting and updating of data in the database
is accomplished by authorised personnel only. Access security means much more than
merely providing log in codes, account codes and passwords. Security considerations
should include some means of controlling physical access to terminals, tapes, and other
devices. Security considerations should also include the non-computerised procedures
associated with the database such as forms to control the updating or deletion of records
or files and procedures for storing source documents. In addition, access to employee,
vendor, and customer data should conform to various state regulations, such as the 1974
Privacy Act, and the 1978 Right to Financial Privacy Act. Certainly the database should
contain an archiving feature to copy all important files and programs and these should
be procedures for regular update and storage of these archival copies.

Failure of database system through hardware or software problems, or malicious damage or


industrial action can adversely affect the organisation since all its data processing is
dependent on the database.

3. Ownership Problem
In file based systems employees who run application programs on application specific
files frequently feel that the data in these files are theirs and theirs alone. Users, such as
payroll department, personnel develop ownership of the files in the system. When a
database of such files is created, the data is owned by the entire company. Any user with
a need should be able to obtain the authority to read or otherwise access the data.
However, for a database to be successful the data must be viewed and treated as a
corporate resource, not as an individual's property.

Security and integrity may be compromised if DBA does not administer the database
properly.

The organisation experiences and overhead cost for providing security, concurrency
control, and recovery and integrity functions.

The generality with which the DBMS provides for defining and processing data can also
be problematic.

15
Compiled by P. Chamanga

Justification of Data Base in an Organisation

COMPONENTS OF THE DATA BASE


User Group
DBMS
Database
Data Dictionary
User/System Interface
Data Base Administration & Hardware

Database System Elements:


Stored Data
Various Data Models
Software to maintain data (DBMS)
Person working with the database

Database Management System is a layer of software which maintain the database & provide an
interface to the data for application programs, which use it.

The DBMS (Database Management System) is the heart of the database.

It allows creation, accessing, modification and updating of the database and the retrieval of data
and the generation of the reports.

All transactions between users and database are through DBMS.

The DBA (Database Administrator) ensures that the database meets its objectives.

In charge of the overall running of the database system.

It requires software & managerial skills

Technical Responsibilities of a DBA:

To set up the database

To control and manage the Database

To identify the needs of an organisation and of the users

To define, implement and control the database storage including the structure of the
database.

To define & control access to the database.

To coordinate the data resources of the whole enterprise using user and management
cooperation.

To ensure that policies and procedures are established to guarantee effective production,
control and use of data.

To define a strategy for backup storage and recovery from breakdown

To decide how data is to be stored

To decide on the information content of the database & structure of different data models.

16
Compiled by P. Chamanga

ACCESSING THE DATABASE THROUGH THE DBMS

|
Application |------+
Programs +------+----- DBMS-------DATABASE
Users |------+
|

17
Compiled by P. Chamanga

THE DATABASE SYSTEM

It is a computer record keeping system

BUILDING BLOCKS OF A COMPUTER BASED ELECTRONIC DATABASE SYSTEM

BIT

BYTE/CHARACTER

DATA ELEMENT/FIELD

RECORD

FILE

DATABASE

A BIT is a binary digit, which is either a 0 or a 1

A BYTE is a collection of bits representing a character

DATA ELEMENT/FIELD is a collection of characters describing one attribute of an entity

An ENTITY is anything we can collect/store information on

An ATTRIBUTE is an element that makes up something or an entity. In database an attribute is known


as a field

A RECORD is a collection of related data elements describing an entity

A FILE is a collection of records describing an entity

A DATABASE is a collection of related files

18
Compiled by P. Chamanga

THE DATABASE MANAGEMENT SYSTEMS

A DBMS is a collection of software programs that :-

1. Store data in a uniform way


2. Organise the data into records in a uniform way
3. Allow access to the data in a uniform way

- In a DBMS, applications do not obtain the data they need directly from the storage media
(database)
- They request the data from the DBMS
- The DBMS then retrieves the data from the storage media and provides them to the application
programs
- A DBMS operates between application programs and the data

The illustration below shows the relationship of Application Programs, the DBMS and the Database.

+----------------------+
|APPLICATION |
|PROGRAM +------+
| | |
+----------------------+ | +------------+ D
| | | A
+-----------------------+ | | | T
| | +-----------| | A
|APPLICATION | | | B
|PROGRAM +-----------------| DBMS | A
| | | | S
+-----------------------+ +--------| | E
+-----------------------+ | | |
| | | +------------+
|APPLICATION +--------+
|PROGRAM |
| |
+-----------------------+

19
Compiled by P. Chamanga

COMPONENTS OF A DBMS

DBMS system software is usually developed by commercial vendors and purchased by organisations.

The components of a particular DBMS vary from one vendor to another.

Some of these components are typically used by information specialists in the system, for example,
information systems specialists typically use the Data Dictionary, Data Languages, Teleprocessing
Monitor, Applications Development Systems, Security Software and archiving and recovery system
components of DBMS.

Other components such as Report Writers and Query Languages may be used by both programmers and
other non-specialists.

DATA DICTIONARY/DIRECTORY/DATABASE SCHEMA

Contains the names and description of every data element in the Database

It also has a description of how data elements relate to one another

Through the use of its data dictionary, a DBMS stores data in a consistent manner thus reducing
redundancy. For example, the data dictionary ensures that the data element representing the number of an
inventory item named (stocknum) will be of uniform length and have other uniform characteristics
regardless of the application program that uses it.

Application developers use the data dictionary to create the records they need for the programs they are
developing

A Data Dictionary checks records that are being developed against the records that already exists in the
database and prevents redundancy in data element names

Because of the data dictionary an application program does not have to specify the characteristics of the
data it wants from the database. It merely requests the data from the DBMS

This may permit changing the characteristics of a data element in the data dictionary without changing it
in all the application programs that use the data element

Defines Metadata

20
Compiled by P. Chamanga

DATA LANGUAGES

To place a data element into the Data Dictionary, a special language is used to describe the characteristics
of the data element.

This language is called a Data Description Language or DDL.

To ensure uniformity in accessing data from the database, a DBMS will require that standardised
commands be used in application programs.

These commands are part of a specialised language used by programmers to retrieve and process data
from the Database.

This language is called the Data Manipulation Language or DML

A DML usually consists of a series of commands such as FIND, GET, APPEND etc.

These commands are placed in an application program to instruct the DBMS to get the data the
application needs at the right time

SECURITY SOFTWARE

A security software package provides a variety of tools to shield the Database from unauthorised access.

ARCHIVING AND RECOVERY SYSTEM

Archiving programs provide the Database Manager with the tools to make copies of the database, which
can be used in the case of the original database records are damaged.

Restart or recovery systems are tools used to restart the database and to recover lost data in the event of a
failure

REPORT WRITERS

A report Writer allows the programmers, managers and other users to design output reports without
writing an application program in a programming language such as COBOL, SQL and other QUERY
Language

A Query Language is a set of commands for creating, updating and accessing data from a Database.

Query Languages allow programmers to ask ad-hoc questions of the database interactive without the aid of
programmers

A form of a Query Language is SQL (Structured Query Language)

SQL is a set of about several English like commands that has become a standard in Database industry and
development

For SQL is used in many DBMS, managers who understand SQL syntax are able to use the same set of
commands regardless of the DBMS.

This software must provide the manager with access to data in many Database Management
Environments. The basic form of an SQL command is:-

SELECT ........ FROM ....... WHERE ..........

After SELECT you list the fields you want to display

21
Compiled by P. Chamanga

After FROM you list the name of the file or group of records that contain those fields
After WHERE you list any condition for the search of the records

Example:

If you wish to select all customer names from customer database where the city in which the customer
lives is Harare

Solution:

SELECT * ALL FIELDS


FROM customer
WHERE city = "Harare"

OR

SELECT name, DOB, Credit_Limit, City SPECIFIED FIELDS ONLY


FROM customer
WHERE city = "Harare"

The results would be a list of ALL fields/Specified fields of customers located in Harare only

Some of the Query Languages use a natural language set of commands

These Query Languages are structured so that the commands used are as close to standard English as
possible. For example the following statement might be used:

PRINT names and address of all customers who live in Harare

Query Languages allow users to retrieve data from database without having detailed information about the
structure of the records or without being concerned about the processes the DBMS uses to retrieve the
data. Furthermore managers do not have to learn COBOL, BASIC etc

TELEPROCESSING MONITOR

It is a communications software package that manages communication between the database and remote
terminals

Teleprocessing monitors often handle order entry systems that have terminals located at remote sales
locations.

These maybe developed by DBMS software firms and offered as a companion package to their database
products

22
Compiled by P. Chamanga

EVOLUTION OF DATA MANAGEMENT SOFTWARE

RUDIMENTARY FILE STORAGE STRUCTURED RELATIONAL DBMS


INPUT OUTPUT SOFTWARE ACCESS DBMS
SOFTWARE METHOD

Program Program Program Program

Access Method Logical Schema External Schema

Physical Schema Conceptual Schema

Internal Schema

No independence Storage independence Physical data Logical data independence


independence
Physical data Storage units can be Physical and logical External and conceptual
description in the changed structures separated structures separated
program

1. Earliest data processing applications there was no formal data management software, all data
descriptions and input/output instructions were coded in each application program resulting in no
data independence every change to a data file required modification or rewriting of the application
program.
2. Access Methods was the first formal data management software. It is a software routine that manages
the details of accessing and retrieving records in a file providing storage independence. Storage units
can be changed (newer units replacing older units) without altering or modifying application
programs.
3. Two-level schema (two-schema architecture) was the most early database management systems
employed. Logical schema corresponds to an external or user view that describes the data as seen by
each application program. A physical schema corresponds to the internal schema that describes the
representation of data in computer facilities. This resulted in physical data independence that is, the
data structures or methods of representing data in secondary storage could be altered without
modifying application programs e.g. To achieve efficiency, linked lists could be used instead of
indexes without changing application programs. The two-level schema was characteristics of
structured database management systems, such as those that use the hierarchical and network data
models. This did not provide logical data independence.
4. Three-level schema provided by contemporary relational DBMS. The conceptual schema provides an
integrated view of the data resource for the entire organisation. This schema (conceptual) evolves
over time new data definitions are added to it as the database grows and matures. It provides both
logical and physical independence. It has logical data independence; the conceptual schema can grow
and evolve over time without affecting the external schema resulting in existing application programs
not need to modify as database evolves.

23
Compiled by P. Chamanga

A database management system that provides these three levels of data is said to follow a three-schema
architecture.

A schema is a logical model of a database. It captures the metadata that describe an organisations data in
a language that can be understood by the computer.

Level of data independence Examples of changes


Logical
Data item format Data item type, length, representation, or unit of
measure
Data item usage How a data item is derived, used, edited, or
protected
Logical record structure How data items are grouped into logical records
Logical data structure Overall logical structure or conceptual model
Physical
Physical data organisation How the data are organised into stored records
Access method What search techniques and access strategies are
used
Physical data location Where data are located on the storage devices
Storage devices Characteristics of the physical storage devices
used.

Physical data independence insulates a user from changes to the internal model
Logical data independence insulates a user from changes to the conceptual model.

24
Compiled by P. Chamanga

FILE ORGANISATION

A file contains groups of records used to provide information for operations, planning, decision
making etc.
It is a technique for physically arranging the records of a file on a secondary storage device.

Overview of basic file organisation:

File Organisation

Sequential Indexed
Direct

Nonsequential (full Relative-addressed Hash-addresses


Sequential (block index)
index)

Hardware independent
(VSAM)

Hardware
dependent (ISAM)

25
Compiled by P. Chamanga

Comparison of Basic File Organisation:


a)

Start of Asteroids Breakout Combat Zaxxon


File

b)
Key H P Z

A D H K M P Q Z

Asteroids Defender Megamania Zaxxon


c)
Chess Combat Defender Faceoff Zaxxon
Relative
Record number 1 2 3 4 n

d)
Key Hashing routine Relative record #

Pitfall Berserk Odyssey Donkey Kong Space Invader


Relative
Record # 1 2 3 4 n

26
Compiled by P. Chamanga

Organisation Access
a) Sequential: Sequential:
Physical order of records in the file is the same Accessing a record is only by first
as the order in which records were written to accessing all records that physically
the file normally in ascending order of precede it
primary key
b) Indexed Sequential: Random/Sequential:
Records are stored in physical sequence Random access of individual records is
according to the primary key. possible without accessing other records.
The file management system or access method,
builds an index, separate from data records Entire file can be accessed sequentially
that contains key values and pointers to the
data themselves
c) Relative: Relative:
Also known as direct file organisation Each record can be retrieved by specifying
Records are often loaded in primary key its relative record number, which gives the
sequence so that the file can be processed position of the record relative to the
sequentially, but records can also be in random beginning of the file.
sequence.
The user or application program has
To specify the relative location of a desired
record.
d) Hashed: Relative:
Also known as direct file organisation in which Record is located by its relative record
hash addressing is used. number, as for a relative organisation.
The primary key value for a record is converted .
by an algorithm (called hashing routine) into a
relative record number.
Records are not in logical order.
Hashing algorithm scatters records throughout
the file, and is normally not in primary key
sequence.

27
Compiled by P. Chamanga

Basic Access Modes:


1. Sequential:
Record can be retrieved only by retrieving all the records that physically precede it.
Generally used for copying files and for sequential batch processing of records.
2. Random:
Record is accessed out of the blue without referencing other records in the file.
It follows no predefined pattern
Typically used for on-line updating and/or retrieval of records.

File organisation is rarely changed but record access mode can change each time the file is used.

Permissible File Organisation & Record Access Modes:

File Record Access Mode


Organisation Sequential Random
Sequential Yes No (Impractical)
Indexed Sequential Yes Yes
Direct-Relative Yes Yes
Direct-Hashed No (Impractical) Yes

There are several file organisation methods namely:

1. Hashed file organisation


2. Clustering file organisation
3. Index file organisation
4. Compression file organisation

1. The Hashed File Organisation


Direct access devices also permits access to a given record by going directly to its
address.

Since it is not feasible to reserve a physical address for each possible record a method
called Hashing is used. Hashing is the process of calculating an address from the record
key.

Suppose that there were 500 employees in an organisation and we wanted to use the
Social Security Number as a key, it would be inefficient to reserve 999 999 999
addresses, one for each social security number.

Therefore, we could take the social security number and use it to derive the address of
the record. There are many hashing techniques, a common one is to divide the original
number by a prime number. This is known as the Division Method that approximates
the storage locations and then use the remainder as the address, as follows:-

Begin with the Social Security Number 053-4689-42. Then divide by 509,
yielding 105047. Note that 105047 multiplied by 509 does not equal 53468923
instead. The difference between the original number 53468942 and 53468923
is 19.

The storage location of the record for an employee whose Social Security
Number is 472-3840-86 has the same remainder.

When this occurs, the second person's record should be placed in a special
overflow area.
Example:
Qn. Given the following number 472-3840-86 divided by 509
(prime number). Find the physical location.
Solution:

28
Compiled by P. Chamanga

The physical location is 472384086/509


Yielding 928063 X 509
= 472384067
Then the original number 472384086 as a difference of 19

Therefore the physical address is 19.


Converts record key into the relative record number with a hashing algorithm.

Modular Arithmetic
Divide key by the number of locations available for storage, and take the remainder
for example, 100 locations and a 4-digit key 1537:
Storage location 153 remainder 37
Therefore the storage location is 37
Alphanumeric keys need to be converted to base 36 or ASCII code for each
character or digit

Folding
Divide key into two or more parts added together. For example 872377 = 872
377
1249
Then apply the Modular Arithmetic

Divide and remainder is a common hashing algorithm


Divide record key by a prime number to determine the remainder
Prime number must be greater than the number of actual records
Prime number must contain an allotment for future file expansion.
Example: For a 10,000 product inventory system, prime number
11,001 must be used which allows 1,001 additional expansion
positions (10%)
Nonnumeric record keys:
Either strip off record key of its nonnumeric characters
Or convert them to numbers
Hashing algorithm for alphanumeric conversion is the Soundex
System.
B,F,P, V are assigned 1
C, G, J, K, Q, S, X, Z are assigned 2
D, T are assigned 3
L is assigned 4
M, N are assigned 5
R is assigned 6
A, E, H, I, O, U, W, Y are assigned 0
Example:

a) Product # C-64744 strip off nonnumeric characters


Location = Remainder (Record Key divided by Prime number)
= 64744/11001
= 5 Remainder 9739

b) BURNS 10652
It works like a one-way street cannot be worked backwards.

Advantages:
Supports applications demanding quick record retrieval because locating and
reading desired record into memory usually requires a single access to the disk.
Involves a single calculation for finding the record number
Permits both numeric and alphanumeric keys
Easily implemented with COBOL, C, PASCAL instructions

29
Compiled by P. Chamanga

Disadvantages:
Hashing algorithms might result in collisions, that is identical remainders called
crashes or synonyms. For example Product Number C-64744 and F-42742 both
yield remainder 9739 when divided by 11001.
When collisions occur an indicator is stored in the first record to warn a user of the
crash. The indicator reveals where the other record really resides.
Due to collisions, extra disk space is allotted for a record that would otherwise
collide with another
Due to random order of the file, a sorting step must occur before listing or otherwise
processing the file in sequence
When file becomes full, a programmer writes a one-time program to rebuild it with
expansion space.

NOTE:
Sequential, direct, or indexed files depend on the users needs
For instant access to data direct and indexed techniques apply
For batch environments sequential techniques apply
HASHING

It is a process of designing an algorithm to calculate an address

HASH TABLE METHODS:

- This is a table scheme in which updates, searches and deletions could ideally be done in a constant
time
- We seek a mathematical function which produces table addresses when supplied with the key
- Since there are many more possible key values than addresses, this becomes a many-to-one function
in which many different values can read to the same address.
- Since we do not know which keys will arise in front, it is possible that 2 keys with the same address
will arrive and a hash collision will occur
- Therefore to design a good hash table we must find a solution to the following two problems:-

a) Find a hash function that minimises the number of collisions by spreading arriving records
around the table as evenly as possible

b) Since any hash function is many-to-one collisions are inevitable and therefore a good way
of resolving them is necessary

- There are basically four methods which are used to produce hash tables which are:- (mainly for
system software programming)

1) Truncation
2) Division
3) Midsquare
4) Partition/Folding

TRUNCATION

- This is a method where you normally take the last characters of the address
e.g h(2467) = 467
h(12601) = 601
h(12467) = 467

- the advantage of truncation is that it is a fast method


- The disadvantage is that you must study the keys thoroughly to minimise collisions

DIVISION

30
Compiled by P. Chamanga

- You take the key and MOD it by the MAXSIZE that is you will use the function:-
key MOD MAXISIZE

e.g 21 MOD 8 will give an address 5 (remainder)


- This method id popular because it has got a wide range of addresses
- Its main disadvantage is that the computer takes more time in dividing
- It is an advantage to use maximize which is a prime number to reduce the number of zeros such that
we do not end up having a sequential search

MIDSQUARE

- It converts the filename into its decimal equivalence, finds the middle digit and square it to give the
address
e.g. 49294 = 4
24683 = 36

PARTITION/FOLDING

- This method divides the number into groups, adds the individual groups to give the address
eg 510324 = 51+03+24
78
Therefore h(510324) = 78

RESOLVING HASH COLLISIONS

- The technique of searching in a systematic and repetitive fashion for an alternative notation is called
PROBING

(1) Linear Probing

Uses the following formula:-

inc(i) = (i + 1)MOD MAXITEMS

Incrementing function

- The incrementing function takes an address (i) not a key and produces another hash address
- If the new location is occupied, we take that hash address and pass it again through the incrementing
function etc until we find an open location and with luck we may be able to place it in a few probes
- Therefore we should have an indicator to tell whether the position is occupied or unoccupied and as
such we say that using linear probing we first of all, apply h(k) and then as many increments (i) as
we need

Disadvantages:
- Clearly linear probing results in clustering where a number of synonyms will be adjacent to each
other and mixed with others and as the table runs these clusters will inevitably grow larger and larger
making update, search and delete operations run more slowly

Advantages:
- It is suitable for small lists

31
Compiled by P. Chamanga

(2) Non-Linear Probing

- Uses the following equation:

- inc(i,p) = (i + ap bp2)MOD MAXITEMS


where p = number of probes and
a,b = +_1

(3) Bucket Hashing

- It establishes a bucket or a separate storage for all members of a given synonym


- The hashing function is used to determine which bucket the new arrival belongs to
- In most cases linear lists are used as structures for the buckets

2. Clustering File Organisation Technique


The basic idea behind clustering is to try and store records that are logically related and
physically close together on disk.

Physical data clustering is an extremely important factor in performance as can easily be


seen from the following example:

Suppose the stored record most recently accessed is record R1, and suppose the next
stored record required is record R2. Suppose also that stored R1 is stored on page P1
and R2 is stored on page P2. Then:-
1. If P1 and P2 are one and the same, then the access to R2 will not require
any physical input or output at all, because the desired page, page 2 will
already be in a buffer in main memory
2. If P1 and P2 are distinct but physically close together in particular if they
are physically adjacent then the access to record R2 will require a physical
input/output (unless of course Page P2 also happens to be in a main
memory buffer), but the seek time involved in that input/output will be
small, because the read/write heads will already be close to the desired
position. In particular, the seek time will be 0 if P1 and P2 are in the same
cylinder.

32
Compiled by P. Chamanga

3. Indexing
This is another file organisation method, which is divided into two areas namely:-
1. The Data Area
Contains all the records with all values or entries organised
sequentially which can be in ascending order

2. Index Area
Contains the record key per given track number. This record
key must be the highest in that track number. The 2 areas are
linked or joined by pointers

The general structure of an indexed file is structured as


follows:

CITY (INDEX) FILE SUPPLIER FILE

CITY S# NAME CITY


Harare 101 Simba Harare
Bulawayo 102 Tino Bulawayo
Mutare 103 Rue Harare
104 Rudo Mutare
105 Rufaro Harare
106 Takura Bulawayo
107 Rachel Mutare
The above supplier file is said to be indexed by city file.

The fundamental advantage of an index file is that:


It speeds up retrieval or accessing.
It offers great flexibility which allows both random and sequential access to data

But there is a disadvantage too, because:


It slows down updates. For instance every time a new stored record is added to the indexed
file, a new entry will also have to be added to the index.
It means extra work of maintaining the different tables
Amount of memory needed to store tables
Extra disk space for the index and overflow areas

Indexes can be used in essentially two different ways:


1. They can be used for sequential access to the indexed files where sequential means in the
sequence defined by values of the indexed fields. The city index will allow records in the
supplier file to be accessed in city sequence.
2. Indexes can also be used for direct access to individual records in the indexed file on the
basis of a given value for the indexed field.

Dense and Non-dense Indexing


A dense index contains one index record for each data record in the index file (1:1). Fully
inverted index on every field.

A non-dense index is sometimes called sparse index, does not contain an entry for every stored
record in the indexed file (1:m). Less storage space used index for a number of records.

S# Index Supplier
S1 Smith London
S2 Jones Paris
S2 S3 Blake Paris
S4 S4 Clarke London
S5 S5 Adams Athens

33
Compiled by P. Chamanga

S6 Brown Paris

34
Compiled by P. Chamanga

3. Compression Techniques

This is a way of minimising amount of storage for stored data by replacing the data with some
representation.

There are three types of compression:

Front Compression
Rear Compression
Hierarchical Compression

Front Compression:

It is replacing front characters identical to previous entry by corresponding count.


The blanks are padded with b

Example:

The following 4 names appear in a stored table. The field length is 10 characters. Apply front
differential compression:
Farai
Farasiya
Farisai
Farikayi

Solution:

Farai 0 - Faraibbbbb
Farasiya 4 - siyabb
Farisai 3 - isaibbb
Farikayi 4 - kayibb

Rear Compression:

It eliminates all trailing blanks replacing with appropriate count


It is dropping all characters to the right for the entry in question to differentiate it from its
immediate neighbors
First number is as in front differential compression and second is a count of number of
characters recorded
This results in loss of some information when the data is decompressed but it is available in
full somewhere in the data file

35
Compiled by P. Chamanga

Example:

The following names appear in a stored table. The field length is 15 characters. Apply rear
compression.
Abrahams,GK
Ackermann,LZ
Ackroyd,S
Adams,T
Adams,TR
Adamson,CR
Allen,S
Ayres,ST
Bailey,TE
Baileyman,D

Solution:
Expanded form
Abrahams,GK 0-2 Ab Ab
Ackermann,LZ 1-3 cke Acke
Ackroyd,S 3-1 r Ackr
Adams,T 1-7 dams,T Adams,T
Adams,TR 7-1 R Adams,TR
Adamson,CR 5-1 o Adamso
Allen,S 1-1 l Al
Ayres,ST 1-1 y Ay
Bailey,TE 0-7 Bailey Bailey
Baileyman,D 6-1 m Baileym

36
Compiled by P. Chamanga

Hierarchical Compression:

A supplier stored file might be clustered by values of the city field, foe example all London
suppliers would be stored together etc. The set of all supplier records for a given city might be
compressed into a single hierarchic stored record, in which the city value in question appears
only once, followed by all the other details for each supplier who happens to be in that city.

It consists of two parts:


Fixed part (city field)
Varying part (set of supplier entries). Varying in the sense that the number of
entries it contains (i.e. the number of suppliers in the city in question) varies from
one occurrence of the record to another, that is, a repeating group.
This is only possible if there is intra-file clustering.

Athens
S5 Adams 3
0

London
S1 Smith 20 S4 Clark 20

Paris
S2 Jones 10 S3 Blake 30
Intra-file

Page p1 page p2
S1 Smith 20 London S2 Jones 10 Paris
P1 300
P1 300

P2 200 P3 400 P5 100 P2 400

Inter-file
Combines supplier and shipment files into a single file and then apply intra-file compression to that single
file.

37
Compiled by P. Chamanga

DATA MODELS TYPES

There are four types of database models that is:-


1. Hierarchical
2. Network
3. Relational
4. Object-Oriented

The hierarchical and network models use standard files and provide structures that allow them to be cross-
referenced and integrated. They have been available since early 1970s. The relational model uses tables
to store data. It provides the ability to cross-reference and manipulate the data and it provides for data
integrity. The object-oriented model uses objects.

1. The Hierarchical Model


In the hierarchical database, data relationships follow hierarchies or trees, which reflect either a one to
one relationship or a one to many relationship among record types

The upper most record in the tree structure is called the root record. From there data organised into
groups containing parent record can have many child records (called siblings), but each child record can
have only one parent record. Parent records are higher in the data structure than are child records,
however, each child can become a parent and have its own child records. Because relationships between
data items follow defined paths, access to the data is fast. However, any relationship between data items
must be defined when the database is being created.

Motor Car
Products

Fiesta Escort Sierra

1100cc 1300cc 1100cc 1300cc 1600cc 1600cc2000cc 2300cc 2800cc

Parent-Child relationship

The depth of the hierarchy can be deeper

38
Compiled by P. Chamanga

Properties of a Hierarchical Schema

1. One record type, called the root of the hierarchical schema does not participate as a child record
type in any Parent-Child relationship (PCR)
2. Every record type except the root participates as a child record type in exactly one PCR type
3. A record type can participate as parent record type in any number (zero/more) of PCR type
4. A record type that does not participate as parent record type in any PCR type is called a LEAF of
the hierarchical schema
5. If a record type participates as parent in more than one PCR type then its child record types are
ordered. The order is displayed, by convention, from left to right in a hierarchical diagram.

39
Compiled by P. Chamanga

2. The Network Model

A network database is similar to a hierarchical database except that each record can have more than one
parent, thus creating a many-to-many relationship among the records. For example, a customer may be
called on by more than one salesperson in the same company, and a single salesperson may call on more
than one customer. Within this structure, any record can be related to any other data element.

The main advantage of a network database is its ability to handle sophisticated relationships among
various records. Therefore more than one path can lead to a desired data level.

The network database structure is more versatile and flexible than is the hierarchical structure because the
route to data is not necessarily downwards, it can be in any direction.

In the network structure again similar to the hierarchical structure data access is fast, because
relationships must be defined during the database design. However network complexity limits users in
their ability to access the database without the help of programming staff.

Motor Car
Products

Fiesta Escort Sierra

1100cc 1300cc 1600cc 2000cc

permits a record to belong to a number of parents.

40
Compiled by P. Chamanga

3. The Relational Model

A relational database is composed of many tables in which data are stored, but a relational database
involves more than just the use of tables. Tables in a relational database must have unique rows, and the
cells (the intersections of a row and a column - equivalent to a fields) must be single-valued (that is, each
cell must contain only one item of information, such as a name, address, or identification number).

It is built from tables of data elements known as relations.

A row is called a tuple and a column is called an attribute. The data type describing the types of values
that can appear in each column is called a domain.

Domain

Is a set of atomic values. An atomic value means that each value in the domain is indivisible as far as the
relational model is concerned.

Logical definition of domains

Hre_phone_number -The set of valid 6-digit numbers in Harare


Cell_phone_numbers -The set of 9-digit numbers within a particular supporting network
Employee_age -Possible ages of employees of a company, a value between 18 and 65 years old

A domain has a name, data type, and format.

Relation

A relation schema is a set of attributes. It is used to describe a relation. The degree of a relation is the
number of attributes n of its relation schema.
Defined as a set of tuples and tuples in a relation do not have any particular order. Values within a tuple
are ordered. Values in the tuple are atomic therefore composite and multivalued attributes are not allowed
that is the First Normal Form assumption.

Relations may represent facts about entities or about relationships


Example

MAJOR(StudentID, DeptCode) asserts that students major in academic departments.

41
Compiled by P. Chamanga

Tuple
All tuples in a relation must be distinct - no two tuples can have the same combination of values for their
attributes. Superkey is an attribute such that it can not be duplicated within a relation e.g. {studentID,
Name, Age} cannot remove any attribute. A relation may have more than one key - each of the keys is
called a candidate key.

Example:

studentid and candidate_number

Example:

Relation of degree 7

STUDENT(name, ID_number, home_phone, address, cell_phone, age, GPA)

where GPA is Grade_Point_Average

This is illustrated below:


STUDEN Name ID# Home_phon address cell_phone Age GPA
T e
Ben Evans 15-041681C32 212279 2 Stoney Rd 011209876 30 3.9
11 Park St 091239448 18 3.2
2 Dunmow Rd 091332144 19 3.5
10-88th Ave 023312541 25 2.9
12 Wyatt Rd 023353659 28 3.9

A database management system that allows data to be readily created, maintained, manipulated, and
retrieved from a relational database is called Relational Database Management System (RDBMS). The
RDBMS, not the user, must ensure that all tables conform to the requirements. The RDBMS also must
contain features that address the structure, integrity and manipulation of the database.

In a relational database, data relationships do not have to be predefined. Hence users can query a
relational database and establish data relationships spontaneously by joining common fields. A database
query language is a helpful tool that acts as an interface between users and a relational DBMS. The
language helps the users of a relational database to easily manipulate, analyse and create reports from the
data contained in the database. It is composed of easy-to-use statements that allow people other than
programmers to use the database.

42
Compiled by P. Chamanga

Relation CAR Relation ENGINE


Model Number Name Engine Model Number
1 Fiesta 950 1
2 Escort 1100 1
3 Sierra 1100 2
1300 1
1300 2
1600 2
1600 3
2000 3

4. The Object-Oriented Model

While the relational model is well suited to the needs of strong and manipulating business data, it is not
well suited for handling the data needs of certain complex applications, such as computer-aided design
(CAD) and computer aided software engineering (CASE).

Business data follow a defined data structure that the relational models handle well. However,
applications such as CAD and CASE deal with a variety of complex data types that can not be easily
expressed by relational models. Such programs also require massive amounts of persistent data (data that
can not be altered and that are stored in their own private memory space), and a database for them must
be able to evolve without affecting the data in memory that the application uses to operate.

An object-oriented database uses objects and messages to accommodate new types of data and provide for
advanced data handling. A database management system that allows objects to be readily created,
maintained, manipulated and retrieved from an object-oriented database is called an Object-Oriented
Database Management System (OODBMS)

An object-oriented database management system must still provide features that you would expect in any
other database management system, but there is still no clear standard for the object-oriented model.

Logical Database Design

A logical database design is a detailed description of a database in terms of the ways in which the users
will use the data.

During this phase an analyst performs a detailed study of the data identifying how the data is grouped
together and how they relate to each other. An analyst must also determine which fields have multiple
occurrences of data, which fields will be keys or indexes and the size and type of each field.

A Schema as a complete description of the contents and structure of a database. It defines the database to
the system, including the record layout, the names, length and size of all fields and the data relationships.

A Subschema defines each user's view, or specific parts of the database that a user can access. A
subschema restricts each user to certain records and fields within the database. Every database has one
and only one schema, but each user must have a subschema.

43
Compiled by P. Chamanga

STRUCTURED QUERY LANGUAGE (SQL)

In SQL, commands are given to define the structure of the database. Each database is identified by a
name, which is given in a CREATE DATABASE command.

The entities are defined as tables, with each attribute defined as a column in the table. A table then is
given a name, and each attribute declared by giving it a column name and stating its type. Supported data
types include:-

CHARACTER - values
SMALLINT - A restricted range of integers
DECIMAL - Which allow a fixed number of decimal places
FRONT - For floating point values
MONEY - Currency values
DATE - For dates

Each data type allows a certain set of possible values. There is also a possibility of a column having an
unknown value called NULL. When a column is specified, it is assumed to allow a value unless the
phrase NOT NULL is specified

NULL values should not be allowed in any column, which forms part of the primary key of the table.

BELOW IS AN SQL COMMAND USED TO DEFINE A DATABASE

The name Art.db is chosen for the database, while the tables are called painting, artist and gallery. The
database MONEY has been used so is assumed to be supported by the implementation. The only column,
which allows a NULL value, is Nationality in the artist table. A NULL value in this column of a particular
row would mean that the actual value is unknown.

CREATE DATABASE Art.db

CREATE TABLE Painting


(Title char(20) NOT NULL,
Artist-name char(20) NOT NULL,
Cost money NOT NULL,
Gallery-name char(15) NOT NULL)

CREATE TABLE Artist


(Artist-name char(20) NOT NULL,
Initial char(5) NOT NULL
Nationality char(15) )

44
Compiled by P. Chamanga

CREATE TABLE Gallery


(Gallery-name char(15) NOT NULL,
Gallery-Add char(20) NOT NULL)

CREATE UNIQUE INDEX painting.IDX on painting


(Title, artist-name)

CREATE UNIQUE INDEX artist.IDX on artist


(artist-name)

CREATE UNIQUE INDEX gallery.idx on gallery


(gallery-name)

UNIQUE INDEXES are defined on the tables for the primary keys to prevent the system allowing rows in
the tables with duplicate values in the key.

Instead, an INDEX is created for the key and is specified as unique, so that any attempt to add rows with
same key will be trapped as an error. For the gallery and artist tables, the key has just one component
attribute, but the key for the painting table has two attributes and the index is created for the pair (title,
artist-name).

Indexes may be created for any number of columns in the table. Usually their purpose is to speed up
access to the data using the column value. Each index must be given a name, although it is not used again
and unless it is to be deleted. The names used for the indexes in the illustration above are painting.idx,
artist.idx and gallery.idx.

45
Compiled by P. Chamanga

RETRIEVING DATA FROM ONE TABLE

The SQL SELECT statement is used to retrieve data from a table. It combines elements of the relational
algebra operation via its various options

SELECTION

In its simplest form, a SELECT command will select all data from the table, as in the example:-

SELECT *
FROM Art

The asterisk (*) indicates that all the columns (fields) of the table Art are to be selected.

Using the WHERE clause will restrict the rows (records) which are selected to those satisfying the
condition for example:-

SELECT *
FROM Art
WHERE cost > 5000

In this form the SQL SELECT provides the functions of the SELECT statement of the relational algebra

Practical example for the two SELECT statement to view the contents of Art table is pictured as
follows:-

TABLE: Art

TitleArtist_NameCostGallery_NamePoolVictor300ChitamboPeelJohn1000NyashaSonyArthur1500H
arareReelmTecla800NyashaTitoAmon4500Mutare
Questions:

1. Write an SQL code to view all records from the Art table

2. Write an SQL code to view all records from the Art table where cost is less than $1500 and
Gallery_Name is equal to Nyasha

Write an SQL statement to list only the columns Title, and Cost in the table Art

NB: Your statements should be supported by resulting tables.

46
Compiled by P. Chamanga

Solution 1:

SELECT *
FROM Art

Resulting Table

TitleArtist_NameCostGallery_NamePoolVictor300ChitamboPeelJohn1000NyashaSonyArthur1500Harare
ReelmTecla800NyashaTitoAmon4500Mutare
Solution 2:

SELECT *
FROM Art
WHERE cost < 1500 and Gallery_Name = Nyasha

Resulting Table

TitleArtist_NameCostGallery_NamePeelJohn1000NyashaReelmTecla800Nyasha
Solution 3:

SELECT Title, Cost


FROM Art

Resulting Table

TitleCostPool300Peel1000Sony1500Reelm800Tito4500

47
Compiled by P. Chamanga

PROJECTIONS

There is a provision in the SQL SELECT to cover the PROJECT of relational Algebra

The rows selected from a table can be projected into a list of their columns by including the column list
instead of the asterisk. The command:-

SELECT Title, Artist_Name, Gallery_Name


FROM Art
WHERE Cost > 1000

A table with 3 columns will be produced

This is obtained from the Art table by first retrieving the rows, which satisfy the condition (Cost > 1000),
then projecting them into the 3 columns and the cost values are omitted from the result.

The result of SELECT including a projection is structured as follows:-

Table Art

TitleArtist_NameGallery_NameSonyArthurHarareTitoAmonMutare
If the SELECT command specifies all the components of the primary key of the table as part of the
column list the resulting rows will also be identified by the key value

In particular there will be no duplicate rows in the table, however, if the list of columns does not contain
the key or primary key, there maybe duplicate rows in the resulting table. An example is shown below
which is the result of applying the command.

SELECT Gallery_Name
FROM Art
WHERE Cost > 700

48
Compiled by P. Chamanga

Resulting Table

Gallery_NameNyashaHarareNyashaMutare
A variation of the SELECT command can be used to ensure that duplicate rows are removed from the
result. It uses the DISTINCT Key word within the SELECT

SELECT DISTINCT Gallery_Name


FROM Art
WHERE Cost > 700

The above code will remove all duplicate rows producing the following table:-

Gallery_NameNyashaHarareMutare
* This is so, because we are projecting on Gallery_Name only but using a DISTINCT command
where we have to satisfy a condition given.

49
Compiled by P. Chamanga

ORDERING THE ROWS

All of the SELECT commands mentioned previously produce tables as their results with the rows
appearing in the order in which they are found

It is possible to specify a particular order for the rows based on the selected column values by including an
'ORDER BY' clause

For example:

SELECT DISTINCT Gallery_Name


FROM Art
WHERE Cost > 700
ORDER BY Gallery_Name

This will produce the rows in ascending order of gallery name as shown on the table below

Gallery_NameHarareMutareNyasha

50
Compiled by P. Chamanga

GROUPED DATA

There are additional clauses in the SELECT command, which allows it to deal with groups in data rather
than individual rows. The GROUP BY clause combines records with identical values in the specific field
list into a single record

The final result of the SELECT is formed by projecting values into the selected columns. For example,
consider the command:-

SELECT Gallery_Name
FROM Art
WHERE cost < 1000
GROUPED BY Gallery_Name

It will produce a list of Gallery_Names, which hold Art whose cost is < 1000. The GROUP BY clause
causes all the selected rows with the same Gallery_Name to be grouped into a single row.

The projection onto the Gallery_Name is then performed and resulting table has got no duplicate names.

In fact it is equivalent to SELECT DISTINCT command. An added advantage of grouping data is that
there are standard functions, which can be applied to groups and producing one value for the whole group.
They include:-

1. SUM FUNCTION - To sum values in one column

2. AVG FUNCTION - To calculate the average value in a column

3. MIN FUNCTION - To find the minimum value in a column

4. MAX FUNCTION - To find the maximum value in a column

5. COUNT FUNCTION - To count the number of values in a column

Example 1
Write an SQL command to calculate the SUM of ALL COST in table painting.
Solution:
SELECT SUM(cost)
FROM painting

The computer will then sum up all the cost figures in the table painting and display the total only.

Example 2
Write an SQL statement using table painting to display the following output:

Gallery-name Cost

Example 3

Write an SQL statement to find the total galleries in the table painting.

Solution:
SELECT DISTINCT(gallery-name)
FROM painting
ORDER BY gallery-name

OR

51
Compiled by P. Chamanga

SELECT COUNT(gallery-name)
FROM painting
GROUPED BY gallery-name

Example 4

Write an SQL command to find or to list the maximum cost value in the table painting.

Solution:
SELECT MAX(cost)

52
Compiled by P. Chamanga

FROM painting

SUB QUERIES

The WHERE clause can be express a complex condition. It can be used in what is called a SUBQUERY.
This makes use of another SELECT statement as part of the condition Nested SELECT statement.
Suppose we want to find all paintings by a particular artist the following statement is issued.

SELECT artist-name
FROM artist-table
WHERE artist-name = John

This produces a table of artist names equal to John. It can be used as part of the WHERE condition in the
SELECT statement which accesses or retrieves the tuple painting.

The SQL statement is structured as follows:

SELECT *
FROM painting IN(SELECT artist-name
FROM artist
WHERE artist-name = John)

It extracts rows from the table painting where the artist name appears in the sub query. The IN operator is
used to perform this test on the result of the sub query. This IN operator and its
negation/complement/inverse NOT IN are not the only operators for use in sub queries

SELECT *
FROM painting NOT IN(SELECT artist-name
FROM artist
WHERE artist-name = John)

This will extract any other records but not Johns.

ALL and ANY operators can be used with a relational operator such as >= to test the column value against
the result of the sub query. To select the titles of the most costly paintings we could use the following
command:

SELECT title
FROM painting
WHERE cost >= ALL(SELECT cost
FROM painting)
CONSTRUCTING USER ACCESS

When a central database is used for a number of different users who have different requirements, it is
essential to be able to tailor the data to the different needs. In this case, there are two SQL features which
provide these facilities:

Defining views to limit what is seen


Granting access privileges to particular users

VIEWS

A view is a virtual (does not physically exist) table obtained from the real tables by a SELECT statement.
Its main use is to tailor the data of a table to the needs of particular users, so that it omits details of no
interest or should not see.
In the example of the table painting, it may be desired to let most users see all the data except for the cost.

53
Compiled by P. Chamanga

A view can be created which omits the cost column as follows:

CREATE VIEW details AS


SELECT title, artist-name, gallery-name
FROM painting

This view is given the name details.


To the users it looks just like a table and can be treated as a table in most SQL commands. However, it is
not a real table. Its data is obtained from the painting table by performing the SELECT statement each
time it is accessed.

The statement:
SELECT *
FROM details
WHERE gallery-name = Chipangali

Uses the view as a table. It retrieves the data relating to the paintings in the Chipangali gallery, but does
not include the cost, since the virtual table is formed by ignoring the cost column and is not part of the
view. Views can be created for any SELECT statement, not just like those which limit the columns of a
table.
A virtual table of all paintings held at the gallery Chipangali would be created by the command:

CREATE VIEW Chipangali AS


SELECT *
FROM painting
WHERE gallery-name = Chipangali

This would contain all of the 4 columns of the table painting, but only those rows relating to the gallery
Chipangali.
Once a view has been created its definition as a SELECT statement will exist until a DROP VIEW
command is performed.
While it exists, it can be treated as a table although it is only a virtual table.

GRANTING PRIVILEGES
Users of a database are identified by a user name. Individual users can be granted privileges which give
them certain permission to use the SQL command on the database
Permissions may also be granted to all users by using the key word PUBLIC instead of the user name.
The GRANT CONNECT command is available to define passwords for a list of users. It has the form:
GRANT CONNECT TO <user list>
IDENTIFIED BY <passwords>

It can be used to set up the password(s) for the new users or to alter the existing user passwords. Some
implementations do not use this facility, but rely on the operating system to deal with passwords for users.
Specific privileges to permit the use of SQL statements on a table or view are allocated by further GRANT
command. They have the following form:

GRANT <privilege list>


ON <table or view>
TO <user list>

Where table is the name of the table or view, user list is either a list of names or the key word PUBLIC,
and privilege list is a list of key words for the privileges.

The privileges are any of the following:

SELECT
INSERT
DELETE

54
Compiled by P. Chamanga

UPDATE <column list>


ALTER
ALL

And permit use of the corresponding SQL commands or statements

UPDATE may have a list of commands, stating those which are allowed to be updated. The default is, to
allow columns to be updated.

The ALL privilege permits all commands or privileges to be used or selected.

GRANT SELECT, UPDATE(cost, gallery-name)


ON painting
TO John, Nancy

Would let all 2 named users to use the SELECT command on the table painting and UPDATE the
columns cost and gallery-name only.

Since the privileges can be granted selectively a considerable degree of control of user access to data is
available.

Class exercise:

Given the following table:

Student
Stud-ID Student-Name Town Course-Level Fee
HND1002 Chipo Harare HND 7500
ND2001 Edmore Mutare ND1 6500
ND200100 Takura Harare ND2 3000
ND2003 Simba Kwekwe ND1 6500
ND2008 Esther Bulawayo ND1 6500
HND1004 Rachel Mutare HND 7500
NC3001 James Gweru NC 3500
NC3007 Oscar Kwekwe NC 3500
ND2009 Linda Bulawayo ND1 6500

1. Create 3 views to the database Student so that


a) The Principal can only see the course-level and fee fields.
b) The Accountant can have access to all columns.
c) The Head of Department can only access student names, Stud-Ids and towns.
2. Write SQL statements which can only permit the Accountant to SELECT, INSERT, DELETE and
UPDATE all columns of the student table.
3. Write SQL statements which list only towns in their unique order where course fee is above $3500.
4. Write an SQL statement which list all students in both HND and NC course-level.

55
Compiled by P. Chamanga

Question:
Given the following ERD design a detailed database using SQL necessary for the illustration.

Stud-name town crs-title

crs-id
dob
STUDENT COURSE
ATTENDS

Stud-# nationality #-of-stud

TEACHE
S

Qualification #-of-stud

TEACHER

name

56
Compiled by P. Chamanga

CREATE DATABASE college.db

CREATE TABLE student


(stud-name char(20) NOT NULL,
stud-# smallint(5) NOT NULL,
dob date(8) NOT NULL,
town char(20) NOT NULL,
nationality char(15))

CREATE TABLE teacher


(name char(20) NOT NULL,
#--of-crs smallint(8) NOT NULL,
qualification char (30) NOT NULL,
#-of-stud smallint(2))

CREATE TABLE course


(crs-id smallint(6) NOT NULL,
crs-title char(20) NOT NULL,
#-of-stud smallint(2))

ALTERING THE DATABASE STRUCTURE


A database structure can be modified in a number of ways. Extra tables can be added using the create
table command, and extra indexes can be set up.

DROP TABLE <tablename> and


DROP INDEX <index-name>
Can be used to remove tables and indexes while
DROP DATABASE <database-name> can be used to remove the whole database.

ALTER TABLE <tabele-name>


ADD col-name1 char(20), This can be used to alter tables.

ADDING, DELETING and UPDATING GATA

1. Adding Data: SQL provides an INSERT command to add a single record to a table, for example:
INSERT INTO student VALUES
(HND1006, James Made, Mutare, HND, 7500)

This will add a row to the student table with all column values defined. The indexes associated
with the tables are updated automatically such that reentering the same record will be rejected.

2. Deleting Data: The DELETE command is used to remove rows or records from a table. In its
simplest form it will remove all rows as in the command:
DELETE FROM<table-name> will remove all rows
DELETE FROM <TABLE-NAME>
WHERE <condition> only rows meeting the set condition will be removed.

The WHERE clause is used in the DELETE command and in other commands. The conditions
can be quite complex, enabling the commands to be very selectively applied.

They allow:
(a) AND, OR and NOT to be used as logical connections
(b) Numerical and character data to be compared for either equality or inequality such as:
>, <, =, >=, <=

57
Compiled by P. Chamanga

SYNONYM USAGE

SELECT UNIQUE p#
FROM sp spx
WHERE p# IN
SELECT p#
FROM sp
WHERE s# ( =) spx.s#
NOT IN

OR

SELECT p#
FROM sp
GROUP BY p#
HAVING COUNT (s#) > 1

Part #s for all parts supplied by more than one supplier

DICTIONARY
Collection of relations i.e. Catalog and columns

Catalog - contains a row for each relation defined to the system


Table-name, (key), creator, # of columns etc.
Group of named schemas (consists of tables/views and definitions and
user defined specifications affecting physical placement of data on
disk)
Library that contains ready to use functions
RDMS data dictionary
Systems database that contains information concerning various objects
that are of interest to the system itself eg base tables, views, indexes,
users, access privileges.
Table in which DBMS maintains data about the database
Contains administrative information eg. Access permission

Columns - contains a row for every column of every relation defined to


the systems
Table-name, column-name (composite key), data type (char, numeric
etc), length etc.

User can query this system eg.


SELECT tname, FROM columns, WHERE cname = s#
List of table names for tables with column s#.

Useful to a user who does not know all the fields of some tables but only an attribute.

CREATE SYNONYMS
Specifies an alternative name for a table/view; often used to define an abbreviation or to avoid
prefacing with the owner name of the table.

DROP SYNONYMS
Destroys a synonym declaration

COMMENTS STATEMENT

Provides an explanatory remark for table columns (stored as part of internal definition tables.
Used in updating a catalog together with DELETE, CREATE TABLE, ALTER, INSERT

58
Compiled by P. Chamanga

COMMENT ON TABLE s IS Each row represents one supplier


No comment on index

COMMENT ON COLUMN p.city IS Location of unique warehouse storing this part;

59
Compiled by P. Chamanga

DATABASE DEVELOPMENT LIFE CYCLE

Different authors give different names to the stages for example:

First version

Database development

Is a top-down systematic approach


Transforms business information requirements into an operational database
Consists of the following five stages

Strategy and Analysis


Design
Build and Document
Transition
Production

Once the design is in place, one can build the database by executing SQL commands.

Strategy and Analysis

Study and analyse the business requirements. Interview users and managers to identify the
information requirements. Incorporate the enterprise and application mission statements as well
as any future system specifications
Build models of the system. Transfer the business narrative developed in the strategy and
analysis phase into a graphical representation of business information needs and rules. Confirm
and define the model with the analysts and experts.

Design

Design the database. The entity relationship model maps entities to tables, attributes to columns,
relationships to foreign keys, and business rules to constraints.

Build and Document

Build the prototype system. Write and execute the commands to create the tables and supporting
objects for the database.
Develop user documentation, help-screen text, and operations manuals to support the use and
operation of the system

Transition

Refine the prototype. Move an application into production with user acceptance testing,
conversion of existing data, and parallel operations. Make any modifications required.

Production

Roll out the system to the users. Operate the production system. Monitor its performance, and
enhance and refine the system.

Second version

The second version also has five stages as follows

60
Compiled by P. Chamanga

Data Planning
Requirements Specifications
Conceptual Design
Logical Design
Physical Design

Data Planning:

It states all the long term strategic procedures required to develop a proper database system
Analyst develops a model of business processes and documenting all processes involved which
will be used as input to the second stage

Requirements Specification

Defines and represents the users requirements of a business process using everyday language or
any other methodologies such as DFDs

Conceptual Design

It involves transaction of the users requirements of a business process by using ERDs

Logical Design

Translates the conceptual design of a business process by representing data using database
models that is Network database model or Hierarchical database model or Relational database
model.

Physical Design

Defines data storage techniques and access methods of a business process


Create master records which need to be permanently stored for updating and generating
information

61
Compiled by P. Chamanga

DISCUSSION

At what stage would one deal with the following things and why?

ERDs
Normalisation

Illustrate with some practical examples

62
Compiled by P. Chamanga

DATABASE SYSTEM LIFE CYCLE

Planning

Greatest interaction between DBA and user group


Results in a complete set of data definitions recorded in the DD

Conceptual Modelling organisations data


External Interaction with users and other system specialists
Internal
Integrity control

Initial load/creation populating the database


Rights and duties of users that satisfy their responsibilities since it is a corporate database eg.
Updating Rights, Accessing Rights

63
Compiled by P. Chamanga

STAGEMAJOR FUNCTIONS
Planning correctness, increase programmers
1. Develop entity charts productivity)
2. Analyse costs and benefits 3. Establish security techniques (Passwords,
3. Develop implementation plan access tables, encryption)
4. Evaluate and select software and hardware 4. Load databases (Special programs to load
5. Establish application priorities from different files)
6. Develop data standards (Naming 5. Specify test procedures
conversions and definitions eg Customer: 6. Establish procedures for backup and
Prospective, Prior, No Longer) recovery
7. Conduct user training
Requirements Formulation & Analysis
1. Define user requirements Operation & Maintenance
2. Develop data definitions 1. Monitor database performance
3. Develop data dictionary 2. Tune and reorganise databases
3. Enforce standards
Design 4. Support users
1. Design conceptual model
2. Design External models (Modelling Growth & Change
Organisations data, DBA interact with 1. Implement change control procedures 2.
users and other system specialists in data Plan growth and change
processing) Change in size: Storage space utilisation,
3. Design Internal models (schemas) DBA allocate additional space, reallocate
4. Design Integrity controls existing space
Change in content/structure: new
Implementation application requests, alter logical and
1. Specify database access policies (Rights) physical database structure
2. Develop standards for application Change in usage pattern: performance
programming (For consistency & monitoring, assigning frequently accessed
records to faster devices, additional higher
performance hardware devices.

64
Compiled by P. Chamanga

DATABASE LIFE CYCLE


Managed by the Database Administrator. There are 6 stages:
1. Planning
2. Requirements Formulation & Analysis
3. Design
4. Implementation
5. Operation & Maintenance
6. Growth & Change

Planning

Growth & Change Requirements


Formulation &
Analysis

Operation & Design


Maintenance

Implementation

Planning:
Its purpose is to develop a strategic plan for database development that supports the overall
organisation business plan

Requirements Formulation & Analysis


Is concerned with identifying data elements currently used by the organisation, precisely defining
these elements & their relationship, and documenting the results in a form that is convenient to
the design that is to follow. In addition to identifying current data, requirements Formulation &
Analysis attempts to identify new data elements or changes in existing data elements that will be
required in the near future.

Design Stage
Its purpose is to develop a database architecture that will meet the information needs of the
organisation now and in the future. There are 3 stages in database design, that is, Conceptual,
Implementation & Physical design.

a) Conceptual Design: Its purpose is to synthesise the various user views and information
requirements into a global database design. The design is called Conceptual Schema/Data
Model and may be expressed in one of the several forms that is, entity relationship diagram,
semantic data model, normalise relation. The Conceptual Data Model describes entities,
attributes and relationships.
b) Implementation Design: Its purpose is to map the Conceptual Data Model into a logical
schema that can be processed by a particular DBMS. The conceptual data model is mapped
into hierarchical, network or relational data model.
c) Physical Design: Last stage of Database design concerned with designing stored record
formats, selecting access methods and deciding on physical factors such as record blocking.
Also concerned with database security, integrity and backup and recovery.

65
Compiled by P. Chamanga

Implementation Stage:
Once the database is completed, the implementation process begins. The first step is the creation
or initial load of the database. Database administration manages the loading process and resolves
any inconsistencies that arise during this process.

Operation & Maintenance Stage


This is the ongoing process of updating the database to keep it current. Examples of updating
include adding a new employee record, changing a student address, deleting an invoice.
Database Administrator is responsible for developing procedures that ensure that the database is
kept current and that is protected during update operations. A Database Administrator must
perform the following functions:
a) Assigning responsibility for data collection, editing and verification
b) Establish appropriate update schedules
c) Establish an active and quality assurance program, including procedures for protecting,
restoring and auditing the database.

Growth and Change Stage


The database is a model of the organisation itself. As a result it is not static but reflects dynamic
changes in the organisation and its environment. The Database Administrator must plan for
change, monitor the performance of the database both efficiency and user satisfaction, and take
whatever action are required to maintain a high level of system performance and success.

66
Compiled by P. Chamanga

Functions of Database Administration


Summarised according to the:

1. Planning:
Develop entity charts
Analyse costs and benefits
Develop implementation plan
Evaluate and select software or hardware
Establish application priorities
Develop data standards
2. Requirements Formulation & Analysis:
Define user requirements
Develop data definitions
Develop data dictionary
3. Database Design:
Design conceptual model
Design external models
Design internal models
Design integrity controls
4. Database Implementation:
Specify database access policies
Develop standards for application programming
Establish security techniques
Load database
Specify test procedures
Establish procedures for backup & recovery
Conduct user training
5. Operations & Maintenance:
Monitor database performance
Tune and reorganise database
Enforce standards
Supports users
6. Growth & Change
Implement change control procedures
Plan growth & change

DATABASE IMPLEMENTATION
DBMS Functions:
Data storage, retrieval & update. A database may be shared by many users, the DBMS must provide
multiple user views and allow users to store, retrieve and update their data easily and efficiently

Data Dictionary/Directory
The DBMS must maintain a user accessible data dictionary

Recovery Services:
The DBMS must be able to restore the database or return it to a non-condition in the event of some system
failure. Sources of system failure include:
Operator error
Disk head crashes
Program error

Security mechanisms:
Data must be protected against accidental or intentional misuse or destruction. The DBMS must provide
mechanism for controlling accessed data and defining what action (read only, update may be taken by
each user.)

67
Compiled by P. Chamanga

NORMALISATION
Is the analysis of functional attributes (data items). The purpose of normalization is to reduce complex
user views to a set of small, stable data structures. Normalized data structures are more flexible, stable
and easier to maintain than unnormalized structures.

Steps in Normalization:

USER VIEWS

UNNORMALISED
RELATION
Remove repeating groups

1NF
RELATIONS

Remove partial dependencies

2NF
RELATIONS Remove transitive dependencies
3NF
RELATIONS

Remove overlapping candidate keys

BCNF
RELATIONS

Remove multivalued dependencies

4NF
RELATIONS Remove join dependencies

5NF
RELATIONS

68
Compiled by P. Chamanga

1. User views are identified


2. Each user view is converted to the form of an unnormalized relation
3. Any repeating groups are then removed from the unnormalized relations to produce a set of
relations in 1st NF
4. Any partial dependencies are removed from these relations, the result is a set of relations in 2 nd
NF.
5. Any transitive dependencies are removed creating a set of relations in 3 rd NF.

Unnormalised Relation:
It is a relation that contains one or more repeating groups for example GRADE-REPORT:

GRADE-REPORT
Stud# Studname Major Course# Crs-title Lec-name L-office Grade
38214 Takura IS IS350 Dbase Chamanga 6 A
IS465 SAD Makura 10 C
69173 Esther PM IS465 SAD Makura 10 A
PM300 Proj-Mgt Makura 10 B
QM400 OR Kachepa 11 C

Stud# studname 1:1

Major 1:1

Stud# course# 1 : M

Crs-title 1 : M

Lec-name 1:M

L-office 1 : M

There are multiple values at the intersection of certain rows and columns. Since each student takes more
than one course, the course data in the above relation constituents a repeating group within student data.
In an unnormalised relation, a single attribute can not save as a candidate or primary key. Suppose we
take student number as a primary key, there is a one-to-one relationship from student number to student
name and major. However, the relationship is one-to-many from student number to course and remaining
attributes. The student number is not a primary key, since it does not uniquely identify all the attributes in
this relation.

Disadvantages of Unnormalised Relations:


They contain redundant data which may result in inconsistent data, for example, information
pertaining to course number IS465 is contained in 2 locations (2 tuples in the sample). Suppose that
we want to change the course title from SAD to ASAD, to make this change, we would have to check
the entire grade-report relation. To locate all occurrences of course number IS465, if we fail to update
all occurrences the data would be inconsistent.

Normalised Relations:
A normalised relation is one that contains only single values at the intersection of each row and
column. A normalised relation contains no repeating groups. To normalise a relation that contains a
single repeating group we remove the repeating group and form 2 relations. The 2 new relations
formed from the above example are as in Student(S) and Student-Course(SC). Student relation is
already in 3rd NF whereas Student-Course relation is in 1 st NF.

Therefore stud# is not a candidate key because it does not uniquely identify all attributes in this relation.

Redundancy exist, for example, course# IS465 is contained in multiple rows.

69
Compiled by P. Chamanga

Update anomally when one wants to change SAD to ASAD in crs-title there is need to search the entire
relation failure of which results in data inconsistent.

Notation for Unnormalised data:

Grade Report(stud#, studname, major{course#, crs-title, Lec-name, L-office, Grade})

Where { shows a repeating group.

INF

A relation with a single repeating group will form 2 relations by removing the repeating group.

S(student)
Stud# Studname Major
38214 Takura IS
69173 Esther PM

SC(student-course)
Stud# Course# Crs-title Lec-name L-office Grade
38214 IS350 Dbase Chamanga 6 A
38214 IS465 SAD Makura 10 C
69173 IS465 SAD Makura 10 A
69173 PM300 Proj-Mgt Makura 10 B
69173 QM400 OR Kachepa 11 C

INF with primary key (stud#, course#) attributes from repeating group.
Primary key uniquely identifies students grade.

Student-Course still has data redundancy which results in update anomalies in INSERTING, DELETING,
UPDATING data.

INSERT:
To insert a new course it is impossible because if no student is taking that course that results in a null
value for stud# which is not allowed.
DELETE:
To delete a student record for a particular tuple results in loosing course title, and lecturer details.
Leaving the course details results in a NULL value for stud# which is part of the key and it is not allowed.

UPDATE:
To update course title since it appears a number of times for example, SAD there is need to search through
every tuple. There is inefficiency and might result in data inconsistencies in the case of failure to update
all occurrences.

The above problems being a result of nonkey attributes which are dependent on only part of the key, that
is, course# for example:

(stud#, course#) grade

course# crs-title

Lec-name

L-office

Grade is fully dependent on (stud, course#) whereas Crs-title, Lec-name, L-office partially depend on the
primary key (stud#, course#). As shown below.

70
Compiled by P. Chamanga

Crs-title

Stud#
grade
Lec-name
Course#

L-office

Partially dependent on primary key Fully functionally


(stud#, course#) dependent on primary key
(stud#, course#)

2NF
By removing attributes which are partially dependent on the primary key creating 2 relations:
1. With attributes fully dependent on the primary key
2. With attributes partially dependent on part of the primary key

R(Registration)
Stud# Course# Grade
38214 IS350 A
38214 IS465 C
69173 IS465 A
69173 PM300 B
69173 QM400 C
3NF

CL(Course-Lecturer)
Course# Crs-Title Lec-Name L-Office
IS350 DBase Chamanga 6
IS465 SAD Makunga 8
PM300 Project Mgt Makunga 8
QM400 OR Kachepa 11
2NF

Course title appeara once in course-lecturer relation which solves the update anomally. Course data can
be inserted, deleted without reference tostudent data
Course# Crs-Title
Lec-Name
L-Office
Lec-Name L-Office This illustrates that there is a unique office for a lecturer,
that is transitive dependency when one nonkey attribute is dependent on one or more nonkey attributes.

Course# Lec-Name L-Office


Transitive Dependency

Problems with 2NF

INSERT:

71
Compiled by P. Chamanga

It is impossible to insert a new lecturer since it is dependent on course#. The new lecturer is not yet
assigned to teach at least one course. It is not possible for example to insert Ms Mvududu until one or
more courses have been assigned to her.

DELETE:
Deleting course data results in a lecturer data lost for example, deleting course# IS350 results in loss of
Chamanga data

UPDATE:
Lecturer data occur many times therefore changing lecturer office for Makunga requires searching every
tuple failure to which will result in data inconsistency for example one tuple reads Rm 8 and another will
read Rm 12.

3NF
Removing attributes that participate in transitive dependency, for example, Lec-Name and L-Office results
in the following relations:
C(Course)
Course# Crs-Title Lec-Name
IS350 DBase Chamanga
IS465 SAD Makunga
PM300 Project-Mgt Makunga
QM400 OR Kachepa
Primary Key (Course#) and Foreign Key (Lec-Name)

L(Lecturer)
Lec-Name L-Office
Chamanga 6
Kachepa 11
Makunga 8
Primary Key (Lec-Name)

The assumption is that L-Office can have more than one occupant therefore Lec-Name becomes primary
key and associates the 2 relations course and lecturer.

In this 3NF insertion and deletion can be done without referencing other entities. Updates are also
possible because they are confined to a single tuple within a relation

The whole Grade-Report View will be represented by the following relations:

C(Course)
Course# Crs-Title Lec-Name
IS350 DBase Chamanga
IS465 SAD Makunga
PM300 Project-Mgt Makunga
QM400 OR Kachepa

L(Lecturer)
Lec-Name L-Office
Chamanga 6
Kachepa 11
Makunga 8

R(Registration)
Stud# Course# Grade
38214 IS350 A
38214 IS465 C
69173 IS465 A

72
Compiled by P. Chamanga

69173 PM300 B
69173 QM400 C

S(student)
Stud# Studname Major
38214 Takura IS
69173 Esther PM

Relations in 3NF are sufficient for most practical database design problems. When a relation has more
than one candidate key, problems may arise even if it is in 3NF, hence the further normal forms come in,
for example, BCNF, 4NF, 5NF, DKNF.

73
Compiled by P. Chamanga

BCNF (Boyce Codd Normal Form)


When a relation has more than one candidate key anomalies may result even though the relation is in 3 rd
NF.

SMA(student-Major-Advisor)
Stud# Major Advisor
123 Physics Edwin
123 Music Chioniso
456 Biology Machuma
789 Physics Tawanda
999 Physics Edwin

The semantic rules of the above relation are as follows:


1. Each student may major in several subjects
2. For each major, a given student has only one advisor
3. Each major has several advisors
4. Each advisor advises only one major

A dependency diagram summarisng the above rule:

Student#

Advisor

Major

The relation is in 3rd NF since


1. There are no repeating groups
2. No partial dependencies
3. No transitive dependencies

They are still anomalies in the relation above, that is, suppose that student# 456 changes her major, from
Biology to Maths, when the tuple of that student is updated, we lose that Machuma advises Biology
(update anomaly)

Suppose we want to insert a tuple with the information that Gamu advises in Computers. This can not be
done until at least one student majoring in Computers is assigned Gamu as an advisor (insertion anomaly)

In the above relation there are 2 candidate keys, student#, major and student#, advisor. The type of
anomalies that exist in this relation can occur when there are 2 or more overlapping candidate keys.

BCNF definition
A relation is in BCNF if and only if every determinant is a candidate key.

Determinant is any attribute simple or composite on which some other attribute is fully functionally
dependent, for example, in the above relation, the attribute advisor is determinant, since major is fully
functionally dependent on advisor.

To make the above relation in BCNF we make Advisor a candidate key and project the original 3 rd NF
relation into 2 relations that are in BCNF.

SA(Student-advisor) AM(Advisor-Major)
Student# Major Advisor Major

74
Compiled by P. Chamanga

123 Physics Edwin Physics


123 Music Chioniso Music
456 Biology Machuma Biology
789 Physics Tawanda Physics
999 Physics

Fourth Normal Form (4NF)

Even when a relation is in BCNF it may still contain unwanted redundancy that may result in update
anomalies, for example, consider the following unnormalised relation

O(Offering)
Course Instructor Textbook
Mgt White Drucker
Black Peters
Green
Finance Gray Weston
Gilford

Assumptions:
1. Each course has one or more instructors
2. For each course, all of the textbooks indicated are used.

O(Offering)
Course Instructor Textbook
Management White Drucker
Management Green Drucker
Management Black Drucker
Management White Peters
Management Green Peters
Management Black Peters
Finance Gray Weston
Finance Gray Gilford
Normalised Relation
From the normalised relation offering, for each course, all possible combinations of instructor and
textbooks appear in the resulting relation. The primary key of this relation consist of all the 3 attributes
(BCNF). The above relation contains redundant data. This can lead to update anomalies, that is, suppose
you want to add a third textbook to the management course. This would require the addition of 3 new
rows to the relation, one for each instructor. From the above relation you can see that for each course
there is a well defined set of instructors (one-to-many relationship) and a well defined set of textbooks
(one-to-many relationship). However, the instructors and textbooks are independent of each other. The
relationship can be summarised as follows:

Multivalued dependency

course instructor textbook

Multivalued Dependency
Exists when there are 3 attributes for example, a, b, & c, and for each value of a there is a well defined set
of values of b and a well defined set of values of c. However, the set of values of b is independent of set c
and vice-versa

75
Compiled by P. Chamanga

To remove the multivalued dependency from a relation, we project the relation into 2 relations each of
which contains one of the 2 independent attributes.

4NF
A relation is in 4NF if it is in BCNF and contains no multivalued dependencies.

The 2 new relations formed are as follows:

L(Lecturer) T(Text)
Course Instructor Course Textbook
Mgt White Mgt Drucker
Mgt Black Mgt Peters
Mgt Green Finance Weston
Finance Gray Finance Gilford

5NF
The normal formal form is designed to cope with join dependency. A relation that has a joint dependency
can not be decomposed by projection into other relations.

5th NF: a relation is said to be in 5NF if it is in 4NF and all loin dependencies are removed.

76
Compiled by P. Chamanga

Limitations of Normalisation

Users may have to join several tables for retrieval which require additional computer time

Referential integrity is more difficult to enforce when a table is decomposed via normalisation

It ignores operational considerations

Objectives of Normalisation:
Reduce redundancy
Produce a stable data structure.

77
Compiled by P. Chamanga

78
Compiled by P. Chamanga

SECURITY AND INTEGRITY

SECURITY
Security refers to the protection of data against unauthorised access, alterations or destruction

INTEGRITY
Refers to the accuracy or validity of data

In other words security involves ensuring that the users are allowed to do the things they are trying to do
Integrity also involves ensuring the things they are trying to do are correct.

In both cases the system needs to be aware of certain rules that users must not violate. These rules must
be specified (typically by the DBA), using suitable language, and must be maintained in the system
catalog or dictionary and in both cases the DBA or DBMS must monitor user operations to ensure that the
rules are thus enforced.

GENERAL SECURITY CONSIDERATIONS

There are numerous aspects to the security problem, among them are the following:
1. The legal, social and ethical aspects: Examples are does the person making a request, say for the
customer credit have a legal right to the requested information?
2. Physical control: Is the computer or terminal room locked or otherwise guarded?
3. Policy Questions: How does the enterprise owing the system decide on who should be allowed access
to hat?
4. Operational Problems: If a password scheme is used, how are the passwords kept secret and how are
they changed?
5. Hardware controls: Does the processing unit provide any security features such as storage protection
keys or a privileged operation mode.
6. Operating system security: Does the operating system erase the contents of storsge and data files
when they are finished with?

Now, modern DBMS typically support either or both of the two the approaches to data security. The
approaches are: Discretional or Mandatory.

Discretional Control: (User profile)


A given user will have different access rights (also known as privileges or authorities) on different objects;
further different users will typically have different rights on the same objects.
Discretional schemes are thus very flexible, WHY? Because users have the right to choose what they want
and can use their own modes.

79
Compiled by P. Chamanga

Mandatory Control:
Each data element is tagged or labeled with a certain classic level and each user is given certain clearance
level.
A given data object can be accessed only by users with the appropriate clearance level. This is enforced by
the DBA
Regardless of whether we are dealing with a discretional or mandatory scheme, all the decisions as to
which users have to perform which operation or which object are policy decisions, not technical ones.
All the DBMS can do is to enforce those decisions once they are made.
It follows that, the result of those policy decisions:-
Must be made known to the system (by means of statements in some appropriate definition language),
and
Must be remembered by the system (by means of saving them in the catalog, in the form of security
rules also known as authorisation rules)

There must be a means of checking a given access request against the applicable security rules (by access
requests here we mean the combination of requested operation plus requested object plus requested user, in
general).

This checking is done by the DBMS security subsystem, also known as the authorisation subsystem.

In order that maybe able to decide which security rules are applicable to a given access request, the
subsystem must be able to recognise the source of that request that is, it must be able to recognise the
requesting user. For that reason, when users sign in to the system they are typically required to supply not
only their user ID (to say who they are), but also a password (to prove they are who they say they are).
The password supposed to be known only to the system and to the legitimate users of the user ID
concerned.
Regarding this last point, incidentally note that any number of distinct users might be able to share the
same group User ID. In this way the system can support user groups, and can thus provide a way of
allowing everyone, for instance, in accounting department to share the same privileges.

The operations of adding individual users to or removing individual users from a given group can then be
performed independent of the operation of specifying the privileges that apply to that group.

Note however that the obvious place to keep a record of which groups are again in the catalog.
To repeat from the previous section most DBMS support either discretionary control or mandatory or both.
Infact, it would be more accurate to say that most systems support discretionary control and some systems
support mandatory control as well. Discretionary control is thus more likely to be encountered in practice.
As already noted, there is need to be a language that supports the definition of security rules. We therefore
begin by describing a hypothetical example of such a language, shown as follow:-

CREATE SECURITY RULE pr3


GRANT SELECT, UPDATE(cost)
ON painting
WHERE gallery-name = Chitombo
TO John, Peter, Anna
ON ATTEMPT violation REJECT

The above example is meant to illustrate the point that security rules have 5 components as follows:

1. A name (pr3 painting rule 3) in the example the rule will be registered in the system catalog under
the name pr3. The name will probably also appear in a message or diagnostics produced by the
system in response to an attempted violation of the rule.
2. One or more privileges (SELECT & UPDATE in the example) specified by means of the GRANT
clause.
3. The scope to which the rule applies specified by means of the ON clause. In the example the scope is
painting tuples or records where the gallery-name is not Chitombo.
4. One, or more users (more accurately user IDs) who are to be granted the specified privileges over the
specified scope, specified by means of the TO clause.

80
Compiled by P. Chamanga

5. A violation response specified by the ON ATTEMPT violation clause, telling the system what to do if
the user attempts to violate a rule. In the example, the violation response is simply to REJECT the
attempt and provide suitable diagnostics. Such a respond will surely be the one mostly required on
practice so it is set to be the default response.

DESTROYING EXISTING RULES


General syntax:

DESTROY SECURITY RULE <security rule-name>

For example:
DESTROY SECURITY pr3

For simplicity we assume that destroying a given named relation wil automatically destroy any security
rules that apply to that relation.

AUDIT TRAILS
Its a special file or database in which the system keeps track automatically of all operations performed by
users on a regular database. A typical entry in the audit trail might contain the following information:

Requests (sourse text)


Terminal from which the operation was invoked
User who invoked the operation
Date and time of the operation
Basic relation(s), tuples and attributes affected
Old values
New values

RECOVERY
Recovery is the process of rebuilding a system pack to its original status after a system, media, transaction
failure etc.

SYSTEM FAILURE
Shut downs caused by hardware or hubs in the O/S, hardware system or other system software will be
referred to as a system crash. When the system crashes, all transactions currently executing terminates.

The contents of internal memory (which include I/O buffers) are assumed lost. However, we assume that
external memory including disks on which the database resides are not affected by the system failure.

CONCURRENCY

Data Sharing
There are several problems which can result from the sharing of access to the database that is there is lost
update. If 2 users are allowed to hold the same tuple concurrently the first of the 2 subsequent update
operations will be nullified by the second, since the effect of the second will be to overwrite the result of
the first.

Solution
1. Grant the user issuing the first hold an exclusive lock on the data held
2. No other user will be allowed to access the data while it is locked to the first user
3. The user issuing the second hold will have to wait until the first user releases the lock
4. The second user will in turn be granted an exclusive lock on the data
5. The effect of the second hold will be to retrieve the data as updated by the first user.

However, the exclusive locking technique leads in turn to other problems that is deadlock and starvation
(discussed previously)

81
Compiled by P. Chamanga

DATA SECURITY
The protection of data in the database against unauthorised disclosure, alteration or destruction.

Authorisation Mechanisms
a) Identification
b) Authentication

Identification Users have to identify themselves to the system before accessing the database by
supplying an operator/username using machine readable cards
Authentication - The process if proving their identification by providing passwords, pin numbers,
answering some questions from the system.

Access Control
For each user the system will maintain a user profile, generated from the user definition supplied by the
DBA.

The details of the appropriate identification and authentication procedures would have been given on the
access controls. Operations allowed for a particular user to perform are to be given. The DBMS will go
through a series of test to determine whether to grant or delay access to the user. The tests may be
arranged in a sequence of increasing complexity, so that a program may reach its final decision as quickly
as possible.

DATABASE INTEGRITY
Ensuring that the data is accurate all times.

Database access control lock procedure.


Used to ensure that a given operation is authorised
Ensures that integrity constraints are not violated.

Constraints
Each relation in the database will have a set of integrity constraints associated with it.
These constraints will be held in the data dictionary as part of the conceptual schema
They specify for example, that values of a particular attribute in some relation are to be within certain
boundary, or that within each tuple of some relation the values of one attribute may not exceed that of
another.

Integrity Constraints and Enforcements

1. Primary Key Posses a property of uniqueness. No 2 tuples in the relation may have the same
value for this attribute or attribute combination
2. No component of a primary key value may be null

Enforcement
The DBMS must reject any attempt to generate a tuple whose key value is null or is a duplicate of the
one that already exists.
Bounds Entry
Values occurring in a particular attribute may be required to lie within certain bounds (eg values of
employee age: 15<age<60)
The constraints are specified by the Bounds Entry. The lower and upper limit have to be defined.

Values Entry
There may be a very small set of permitted values of some particular attribute combination eg
permitted values for primary colour are red, blue, green etc. In this case the permitted values could
simply be listed in a values entry for the relevant attribute or attribute combination

NB It might be desirable to list values or ranges of values that are not permissible for the attributes
concerned.

82
Compiled by P. Chamanga

Format Entry
Values of a particular attribute may have to conform to a particular format. Eg the first character of a
supplier number must be the letter S.
The constraint is specified in a format entry for the relevant attribute.

Average Function
The set of values of a particular attribute relation may have to specify some statistical constraints eg
no employee may earn a salary that is more than twice the average salary for the department.
The predicate defining this constraint will enforce the library function AVERAGE
To enforce it the DBMS will have to monitor all storage operations against the employee relation

NB
All examples given above are of static constraints that is they specify conditions that must hold for
every given state of the database.
Another important type of constraint involves transition from one state to another eg when
employees salary is updated, the new value must be greater than the old value.
To specify such constraints it will need to specify the old and new values
The keywords OLD and NEW are reserved for this purpose.

A special case of transition is that from non-existence (ie addition of new tuple) or from existence to non-
existence (ie deletion of an existing tuple)

RECOVERY ROUTINES
Recovery routines are used to restore the database, or some portion of the database, to an earlir state after
a system failure (hardware or software) has caused the contents of the database in main storage to be lost.
They take as input a backup copy of the database (produced by the dump routines) together with the
system journal (which contains details of operations that have occurred since the dump was taken) and
produce as output a new copy of the data as it was before the failure occurred.

NB Any transactions that were in progress at the time of the failure will probably have to be restarted.

BACKUP ROUTINE
Dump routines
These are used to take backup copies of selected portions of the database, also usually on tape.
It is normal practice to dump the database regularly say once a week
If the database is very large it may be more practical to dump one seventh of every day
Each time a dump is taken, a new system journal may be started and the previous one erased or
archived
Backup is normally initiated automatically by the DBMS before the database has committed its
change.

Checkpoint/Restart Routines
Backing up and rerunning a long transaction in its entirety can be a time consuming process
Some systems permit transactions to take checkpoint at suitable points in their executions
The checkpoint routines will cause all changes made since the last checkpoint to be committed.
The checkpoint facility allows a long transaction to be divided up into a sequence of short ones
The checkpoint routine may also record values of specified program variables in a checkpoint entry in
the system journal

Audit Trail/System Journal/System Log


Used to record every operation on the database
For each operation the journal will typically include the following information:
(a) An identification of the transaction concerned
(b) A time stamp
(c) An identification of the terminal and user concerned
(d) The full text of the input change

83
Compiled by P. Chamanga

And in the case of an operation involving change to the database, the type of change and address of the
data changed, together with its before and after values

Encryption/scrambling
Used to protect or is the protection of the database against an infiltrator who attempts to by pass
against the system
Example of by passing the system involves a user who physically removes part of the database for
example by stealing a disk pack
Apart from normal security measures to prevent unauthorised personnel from entering the computer
centre, the most important safeguard against physical removal of part of the database is the use of
scrambling techniques
Scrambling/encryption and privacy transformations techniques involves the following:
(a) Shuffling the characters of each tuple (or record or message) into different order
(b) Replacement of each character (or group of characters) by a different character (or group
of characters), from the same alphabet or different one
(c) Groups of characters are algebraically combined in some way with a special group of
characters (privacy key) supplied by the owner of the data.

TRANSACTIONS
A transaction is a unit of work with the property that the database is;
a) In a consistent state (state of integrity) both before it and after it but
b) Is possibly not in such in state between these 2 times

In general any changes made to the database during a transaction should not be visible to concurrent
transactions until such changes have been made, in order to prevent these concurrent transactions
from seeing the database in an inconsistent state.
Any data changed by a given transaction including data created or destroyed by that transaction
should remain locked until that transaction terminates
The above discipline must be enforced by the DBMS
A transaction will be backed out if on completion it is found that the database is not in a state of
integrity
A transaction may also be backed out if the system detects a deadlock: A general strategy for such a
situation is to choose one of the deadlocked transaction, say the one most recently started or the one
that has made the changes and remove it from the system, thus freeing its locked resources for use by
other transactions.
The process of back out involves undoing all the changes that the transaction has, made releasing all
resources locked by the transaction and scheduling all the transaction for re-execution.

Example of Transaction
In a banking system a typical transaction might be
Transfer amount X from account A to account B This would be viewed as a single operation and a user
would have to enter a command such as

Transfer X = 100 A = 462351 B = 90554 at a terminal

The above transaction requires several changes to be made to the underlying database.
Specifically it involves updating the balance value in 2 distinct account tuples
Although the database is in a state of integrity before and after the sequence of changes, it may not be
throughout the entire transaction, ie some of the intermediate state (or transitions) may violate one or
more integrity constraints
It follows that there is need to be able to specify that certain constraints should not be checked until
the end of the transaction. These are called deferred constraints
By contrast, constraints that are enforced continuously during the intermediate steps of the
transaction are called intermediate

NB: The data sublanguages must include some means of signaling the end of the transaction, in order to
cause the DBMS to apply deferred checks

84
Compiled by P. Chamanga

CONCURRENCY

In most systems, several users can access a database concurrently. The operating system switches
execution from one user program to another to minimise waiting for input or output operations
Within this approach transactions are often interleaved, that is, several steps are performed on transaction
A, then several steps on transaction B, followed by more steps on transaction A and so on.

Effects of concurrent updates:


The effects of concurrent update without concurrency control are illustrated below.

1. 2 users are in the process of updating the same record which represents a savings account record for
customer A
2. At present time customer A has a balance of $100 in her account
3. User 1 reads her record into the user work area, intending to post a customer withdrawal of $150
4. Next user 2 reads the same record into that user area, intending to post a customer deposit of $25
5. User 1 posts the withdrawal and stores the record, which now indicates a balance of $50
6. User 2 then posts the deposit (increasing the balance to $125) and stores this record on top of the one
stored by user 1
7. The record now indicates a balance of $125
8. In this case the transaction for user 1 has been lost because of interference between transaction

85
Compiled by P. Chamanga

INCONSISTENT ANALYSIS

Usually occurs in traditional file approach when the same data are stored in multiple locations,
inconsistencies in the data are inevitable that is, several of the files below contain customer data

Billing Sales Order


Program Processing
Program

Customer File Accounts


Receivable Customer File Inventory File

Suppose there is an address for one of the customers

If the files are to be consistent this change for address must be made simultaneously and correctly to
each of the files containing the customer address data item
Since files are controlled by different users it is very likely that some files will reflect the old address
while others reflect the new address. Inconsistency in stored data are one of the most common
sources of errors in computer applications that is, the outdated customer address may lead to a
customer invoice being mailed to the wrong location. As a result, the invoice may be returned as the
customer payment delayed or lost.

A transaction is a logical unit of work.

TRANSACTION RECOVERY

SYSTEM RECOVERY

DATABASE INTEGRITY

CONCURRENCY

THE UNCOMMITTED DEPENDENCY PROBLEM

INCONSISTANCY ANALYSIS PROBLEM

86
Compiled by P. Chamanga

Integrity Constraints and Enforcement

These constraints are kept in the DD as part of the Conceptual Schema

The definition of a Primary Key and its uniqueness property, no duplicates, no NULL values and to
enforce it the DBMS rejects an attempt to input records with NULL primary key values or
duplicate values
Functional Dependencies represent another form of integrity constraint.
Comparison expression eg qtyout value not to exceed qtyord value
Lower and Upper limit values specified
Valid/Permitted values for a certain attribute
Attribute values conforming to a particular format
Statistical constraint eg no employee may earn more than twice the average salary for the department

DEADLOCK

Occurs when each of the two transactions is waiting for the other to release an item

Solution

Deadlock prevention protocol


Every transaction locks all items it needs in advance

Deadlock detection
No locks but periodically checks if the system is in a state of deadlock

Wait-for graph
Abort some of the transactions if theres a deadlock

Database Security and Protection

Techniques for protecting the database authorisation to database access

Database Semantic Integrity

Techniques to keep database in a consistent state with respect to specified constraints on the
database

Both Database security and Protection, and Database semantic Integrity are stored in the
DBMS catalog.

87
Compiled by P. Chamanga

SUPPORT ROUTINES

Journaling Routines:
Records every operation in system log/audit trail/system journal

Dump Routines:
Take back-up copies of the database, restarts a new system log after every dump routine

Recovery Routines:
Used to restore the database or some portion of the database after a system failure (hardware or
software) has caused contents of the database buffers in main storage to be lost.

Backout Routines:
Initiated automatically by the DBMS before transaction changes are committed.

Checkpoint/Restart Routines:
Cause all changes made since the last checkpoint to be committed. Instead of restarting a long
transaction it only restarts from the last checkpoint.

Detection Routines:
Detects any violations and back the transaction out of the system with the information on list of
constraints violated and offending tuples.

88
Compiled by P. Chamanga

Database

concepts

and

design
Courage Makota

89

Vous aimerez peut-être aussi