Académique Documents
Professionnel Documents
Culture Documents
Chamanga
1. DATABASE ENVIRONMENT
Definition of a Database:
It is a shared collection of interrelated data designed to meet the varied information needs of an
organisation.
It is a structured collection of stored operational data used by all the application systems of an
organisation. It is independent of any individual application
It is a central source of data to be shared by many users for a variety of related applications.
Data as a Resource:
Information, which is the analysis and synthesis of data is one of the most vital of corporate
resources of late.
Database Concepts:
The two essential concepts are based on:
A data model is the logical structure of the data as it appears at a particular level of the
database system. Each application, which uses a database, has its own data model
Data Models
How data appears as viewed by different applications using the same database system.
E.g. customer accounts file contain details about goods - stock file contain details about
goods
Data Independence
Data models are not affected by any changes in storage techniques
Central data model & associated data models are distinct from the arrangement of data on
any particular storage media.
1
Compiled by P. Chamanga
Entity
An object or event about which someone chooses to collect data is an entity. An Entity may be a
person, or a place for example, a sales person, a city or a product. An entity can be also an event
or unit of time, such as a machine breakdown, a sale, or a month or a year.
Entity Class
It is a collection of entities with similar characteristics. It is known as Entity Sets/Entity Types.
It is grouped for convenience.
2
Compiled by P. Chamanga
Attribute
It is a property of a real world-entity rather than as a data-oriented term.
It is a property of an entity eg. Customer
Customer Number
Customer Name
Address
Telephone
Credit Limit
Balance
An attribute is some characteristics of an entity. There can be many attributes for each entity for
example, a patient can have many attributes, such as last name, first name, address, city and so
on.
The word data item is also used in conjunction with an attribute. Data element is simply a
synonym for data item.
Data items can have values. These values can be of fixed or variable length. They can be
alphabetic, numeric or alphanumeric. Sometimes a data item can be referred to as a field.
A field represents something physical not logical, therefore many data items can be packed into a
field. A field can be read and can be converted to a number of data items. A common example of
this is to store the date in a single field as mm/dd/yyyy. In order to sort the file, in the order of
date, three separate data items are extracted from the field and sorted first by year, then by
month, and finally by day.
Typical values assigned to data items may be numbers, alphabetic characters, special characters,
and a combination of all three. These can be illustrated as follows: -
3
Compiled by P. Chamanga
Identifier
This is an attribute that uniquely distinguishes an entity from the rest eg. EC Number identifies
an employee.
Association
Forms a relationship between two or more entities eg.
Direct representation of association between entities distinguishes data base approach from
conventional file application.
Relationships
These are associations between entities (sometimes they are referred to as data associations).
They imply that values for the associated data items are in some way dependent on each other.
Represents an entity
Represents a relationship
Records
A record is a collection of data items that have something in common with the entity described.
Below is a diagram to illustrate the structure of a record
4
Compiled by P. Chamanga
Order File
Order# Description Quantity Amount
Keys
A key is one of the data items in a record. When a key uniquely identifies a record, it is called a
primary key for example order# can be a primary key because there is only one number assigned
to each customer order. In this way a primary key identifies the real world, that is customer
order.
A key is called a secondary key if it can not uniquely identify a record. Secondary keys can be
used to select a group of records that belong to a set for example orders that come from the city of
Mutare. When it is not possible to identify a record uniquely by using one data item found in a
record a key can be constructed by choosing two or more data items and combine them.
When a data item is used as a key in a record, the description is underlined therefore in the order
record: -
If an attribute is a key in another file it should be underlined with a dashed line (_ _ _ _ _) and it
is a foreign key in this file.
Metadata
Metadata is data about the data in the file/database.
It describes the name given, type and the length assigned to each data item
It describes the length and composition of each of the records
It is kept in a Data Dictionary
Example
Data Item Data Type Length
Name Character 10
Surname Character 15
Date of Birth Date 10
Weight Numeric 2
Data item
This is a unit fact, the smallest named unit of data in a database that has meaning to a user.
It is also known as data element, field, or attribute.
Preferences:
Data item - unit of data
Field - is physical rather than logical term that refers to the column position within a record
where a data item is located.
Examples:
Employee-Name, Student#
Data Aggregate
It is a collection of data items that is named and referenced as a whole
Example:
NAME = Last-Name, First-Name, Initials
5
Compiled by P. Chamanga
In COBOL data aggregates are referred to as group items. In the data dictionary they should
include; data aggregate name, description, names of the included data items.
6
Compiled by P. Chamanga
Each user defines and implements the files needed for a specific application.
Data records are physically organised on storage devices using either sequential or random file
organisation, so that, each application has its own separate data file or files and software
programs.
Example:
Although both users are interested in data about students, each maintains separate files, programs
to manipulate these files, each requires data not available from the other's files
Results in redundancy in defining and storing data resulting in wastage of storage space &
redundant efforts to maintain common data up-to-date
In the Database Approach, a single repository of data is maintained, defined once and accessed by
various users.
What if the information required to solve a particular problem is located in more than one file?
Often extra programming and data manipulation will be required to obtain that information, for
example:-
Suppose you want to know all of the orders outstanding for a particular
customer. Some of the information is maintained in the order file, for an order
entry application. The rest of the information is maintained in a customer
master file. Thus the required information is stored in several files, each of
which is organised in a different way. To extract the required information,
there is need to sort both files until the records are arranged in the same order.
Records from these files will have to be matched, and the data items from the
merging of both files will have to be extracted and output.
- Obtaining this information requires additional programming and creation of more files
- Most organisations have developed information systems one at a time, as the need arises,
each with its own set of programs, files and users. After some time, these applications
and files may reach to a point where the organisation's information resources maybe out
of control.
- Some symptoms of this crisis are:
Data redundancy (similar data in different files)
Program or Data Dependency
Data Confusion (caused by continuously opening and closing different
files)
Excessive costs
1. Data Redundancy
7
Compiled by P. Chamanga
Refers to the presence of duplicate data in multiple data files or in several data files.
The same piece of data, such as employee name and address, will be maintained and
stored in several different files by several systems. Separate software programs must be
developed to update this information and keep it current in each file in which it appears.
2. Program/Data Dependency
Refers to the close relationships between data stored in files and specific software
programs required to update and maintain these files. Every computer program or
application must describe the location of the data it uses. In a traditional file
environment, any change to the format or structure of the data in the file necessitates a
change in all of the software programs that use the data.
These problems can be easily viewed or pictured or visualised through the following illustrations:
Cust Name
Social Security#
Address Loan
Loan A/C ID Loan Accounting
Interest Rate Account System
Loan Period
Loan Balance
Cust Name
Social Security#
Address Checking Checking
Checking A/C ID Account Accounting
Account Balance System
8
Compiled by P. Chamanga
In the conventional file processing, the user defines and implements files for specific
applications. In the database approach, a single repository of data is maintained and defined
once and accessed by various users.
Four characteristics most important in distinguishing a database system from a traditional file
processing system are:
In a database system, the DBMS access programs are written independently of any
specific files. The structure of data files is stored in the DBMS catalog separately from
the access programs. This is called program-data independence.
DBMS CONCEPTS
A data model is the main tool for providing abstraction. It is a set of concepts used to describe
the structure of a database. It includes a set of operations for specifying retrievals and updates.
It is important to distinguish between the description of a database and the database itself. The
description of a database is called a database schema. The description of an entity is called a
schema. Data in the database at a particular moment is called a database instance.
DBMS ARCHITECTURE
Here we are looking at an architecture for database systems, called the three-level-schema
architecture.
The goal of the three-level schema architecture is to separate the user applications from the
physical database. In this architecture, schemas can be defined at the following three levels:
9
Compiled by P. Chamanga
The internal level has an internal schema, which describes the physical storage structure of the
database. It is the one closest to the physical storage, that is, it is the one concerned with the way
the data is physically stored. This is usually the one taken by the systems programmers. The
systems programmer is concerned with the actual physical organisation and placement of the data
element in the database. The internal view is the internal or hardware view of the database. The
internal schema uses a physical data model and describes the complete details of data storage and
access paths for the database. The systems programmer designs and implements this view by
allocating cylinders, tracks and sectors for the various segments of the database, so that various
programs can run as smoothly and efficiently as possible.
The conceptual level has a conceptual schema, which describes the structure of the whole
database for a community of users. It is a logical view. It is how the Database appears to be
organised to the people who designed it. The conceptual schema is a global description of the
database that hides the details of physical storage structures and concentrates on describing
entities, data types, relationships and constraints. It is the view usually used by the Database
Administrator. It includes all the data elements in the Database and how these data elements
logically relate to each other.
The external or view level includes a number of external schemas or user views. It is the one
concerned with the way the data is viewed by individual users, and is usually used by an
application programmer. Each external schema describes the database view of one group of
database users. Each view typically describes the part of the database that a particular user group
is interested in and hides the rest of the database from that user group.
10
Compiled by P. Chamanga
END USERS
external/conceptual
mapping
CONCEPTUAL CONCEPTUAL
LEVEL SCHEMA
conceptual/internal
mapping
INTERNAL INTERNAL
LEVEL SCHEMA
STORED
DATABASE
11
Compiled by P. Chamanga
External External
View A View B
External/Conceptual External/Conceptual
Mapping A Mapping B
Conceptual DBMS
View
Conceptual/Internal Mapping
Internal View
Stored Database
12
Compiled by P. Chamanga
Data Independence
The three-schema architecture can be used to explain the concept of data independence, which
can be defined as the capacity to change the schema at one level of a database system without
having to change the schema at the next higher level. There are two types of data independence:
Logical data independence is the capacity to change the conceptual schema without
having to change external schemas or application programs. We may change the
external schema to expand the database by adding a new record type or data item, or to
reduce the database by removing a record type or data item.
Physical data independence is the capacity to change the internal schema without
having to change the conceptual (or external) schemas. Changes to the internal schema
may be needed because some physical files are reorganised for example, by creating
additional access structures to improve the performance of retrieval or update. If the
same data as before remains in the database, we should not have to change the
conceptual schema.
Improved consistency of data while reducing the waste in storage space due to a reduced
redundancy.
2. Data Independence
A Database system keeps descriptions of data separate from the applications that can
occur without necessarily requiring changes in every application program that uses the
data.
Most DBMS offer application program development tools that help application
programmers in writing program codes. These tools can be very powerful, and they
usually improve an application programmers productivity substantially. Object oriented
databases provide developers with libraries of reusable codes to speed up development of
13
Compiled by P. Chamanga
applications. Users also increase their productivity when Query Languages and report
generators allow them to produce reports from the database with little technical
knowledge and without any help from the programmers, thus avoiding the long time
periods that MIS departments typically take to develop new applications. The result is
greater use of the corporate database for ad-hoc queries. Users also increase their
productivity when they use microcomputer software designed to work with mainframe
database. This allows them to acquire and manipulate data with easy, without requiring
the assistance of programmers.
7. Improved Data Integrity: Because data redundancy is minimised and the threat to data
integrity is reduced. Data integrity ensures that the data in the database is accurate.
Updated values are available to all applications and it ensures data consistency to all
applications.
14
Compiled by P. Chamanga
Problems/Disadvantages Of Databases
DBMS provide many opportunities and advantages, but these advantages may come at a price.
DBMS also poses problem as:-
1. Resource Problems
Characterised with high initial investment and possible need for additional hardware. It
requires large software system for creation and maintenance. It also requires a fairly
large computer to support it. A Database system usually requires extra computing
resources. After all, the new database system programs must run much more data, must
be stored on-line to answer queries, which we hope will increase. As a result much more
terminals may be needed to put managers and other users on-line, to the additional hard
disk system, which may be needed to put more data on-line and make it available to
managers. Communications devices maybe needed to connect the extra terminals to the
database. It maybe even necessary to increase the size or number of CPUs to run the
extra software required by the database system.
Currently PCs are becoming more powerful and DBMS becoming more compact
therefore the problem is becoming less serious. It is also being overcome by availability
of distributed relational databases
2. Security Problems
A database must have sufficient controls to ensure that data is made available to
authorised personnel only and that adding, deleting and updating of data in the database
is accomplished by authorised personnel only. Access security means much more than
merely providing log in codes, account codes and passwords. Security considerations
should include some means of controlling physical access to terminals, tapes, and other
devices. Security considerations should also include the non-computerised procedures
associated with the database such as forms to control the updating or deletion of records
or files and procedures for storing source documents. In addition, access to employee,
vendor, and customer data should conform to various state regulations, such as the 1974
Privacy Act, and the 1978 Right to Financial Privacy Act. Certainly the database should
contain an archiving feature to copy all important files and programs and these should
be procedures for regular update and storage of these archival copies.
3. Ownership Problem
In file based systems employees who run application programs on application specific
files frequently feel that the data in these files are theirs and theirs alone. Users, such as
payroll department, personnel develop ownership of the files in the system. When a
database of such files is created, the data is owned by the entire company. Any user with
a need should be able to obtain the authority to read or otherwise access the data.
However, for a database to be successful the data must be viewed and treated as a
corporate resource, not as an individual's property.
Security and integrity may be compromised if DBA does not administer the database
properly.
The organisation experiences and overhead cost for providing security, concurrency
control, and recovery and integrity functions.
The generality with which the DBMS provides for defining and processing data can also
be problematic.
15
Compiled by P. Chamanga
Database Management System is a layer of software which maintain the database & provide an
interface to the data for application programs, which use it.
It allows creation, accessing, modification and updating of the database and the retrieval of data
and the generation of the reports.
The DBA (Database Administrator) ensures that the database meets its objectives.
To define, implement and control the database storage including the structure of the
database.
To coordinate the data resources of the whole enterprise using user and management
cooperation.
To ensure that policies and procedures are established to guarantee effective production,
control and use of data.
To decide on the information content of the database & structure of different data models.
16
Compiled by P. Chamanga
|
Application |------+
Programs +------+----- DBMS-------DATABASE
Users |------+
|
17
Compiled by P. Chamanga
BIT
BYTE/CHARACTER
DATA ELEMENT/FIELD
RECORD
FILE
DATABASE
18
Compiled by P. Chamanga
- In a DBMS, applications do not obtain the data they need directly from the storage media
(database)
- They request the data from the DBMS
- The DBMS then retrieves the data from the storage media and provides them to the application
programs
- A DBMS operates between application programs and the data
The illustration below shows the relationship of Application Programs, the DBMS and the Database.
+----------------------+
|APPLICATION |
|PROGRAM +------+
| | |
+----------------------+ | +------------+ D
| | | A
+-----------------------+ | | | T
| | +-----------| | A
|APPLICATION | | | B
|PROGRAM +-----------------| DBMS | A
| | | | S
+-----------------------+ +--------| | E
+-----------------------+ | | |
| | | +------------+
|APPLICATION +--------+
|PROGRAM |
| |
+-----------------------+
19
Compiled by P. Chamanga
COMPONENTS OF A DBMS
DBMS system software is usually developed by commercial vendors and purchased by organisations.
Some of these components are typically used by information specialists in the system, for example,
information systems specialists typically use the Data Dictionary, Data Languages, Teleprocessing
Monitor, Applications Development Systems, Security Software and archiving and recovery system
components of DBMS.
Other components such as Report Writers and Query Languages may be used by both programmers and
other non-specialists.
Contains the names and description of every data element in the Database
Through the use of its data dictionary, a DBMS stores data in a consistent manner thus reducing
redundancy. For example, the data dictionary ensures that the data element representing the number of an
inventory item named (stocknum) will be of uniform length and have other uniform characteristics
regardless of the application program that uses it.
Application developers use the data dictionary to create the records they need for the programs they are
developing
A Data Dictionary checks records that are being developed against the records that already exists in the
database and prevents redundancy in data element names
Because of the data dictionary an application program does not have to specify the characteristics of the
data it wants from the database. It merely requests the data from the DBMS
This may permit changing the characteristics of a data element in the data dictionary without changing it
in all the application programs that use the data element
Defines Metadata
20
Compiled by P. Chamanga
DATA LANGUAGES
To place a data element into the Data Dictionary, a special language is used to describe the characteristics
of the data element.
To ensure uniformity in accessing data from the database, a DBMS will require that standardised
commands be used in application programs.
These commands are part of a specialised language used by programmers to retrieve and process data
from the Database.
A DML usually consists of a series of commands such as FIND, GET, APPEND etc.
These commands are placed in an application program to instruct the DBMS to get the data the
application needs at the right time
SECURITY SOFTWARE
A security software package provides a variety of tools to shield the Database from unauthorised access.
Archiving programs provide the Database Manager with the tools to make copies of the database, which
can be used in the case of the original database records are damaged.
Restart or recovery systems are tools used to restart the database and to recover lost data in the event of a
failure
REPORT WRITERS
A report Writer allows the programmers, managers and other users to design output reports without
writing an application program in a programming language such as COBOL, SQL and other QUERY
Language
A Query Language is a set of commands for creating, updating and accessing data from a Database.
Query Languages allow programmers to ask ad-hoc questions of the database interactive without the aid of
programmers
SQL is a set of about several English like commands that has become a standard in Database industry and
development
For SQL is used in many DBMS, managers who understand SQL syntax are able to use the same set of
commands regardless of the DBMS.
This software must provide the manager with access to data in many Database Management
Environments. The basic form of an SQL command is:-
21
Compiled by P. Chamanga
After FROM you list the name of the file or group of records that contain those fields
After WHERE you list any condition for the search of the records
Example:
If you wish to select all customer names from customer database where the city in which the customer
lives is Harare
Solution:
OR
The results would be a list of ALL fields/Specified fields of customers located in Harare only
These Query Languages are structured so that the commands used are as close to standard English as
possible. For example the following statement might be used:
Query Languages allow users to retrieve data from database without having detailed information about the
structure of the records or without being concerned about the processes the DBMS uses to retrieve the
data. Furthermore managers do not have to learn COBOL, BASIC etc
TELEPROCESSING MONITOR
It is a communications software package that manages communication between the database and remote
terminals
Teleprocessing monitors often handle order entry systems that have terminals located at remote sales
locations.
These maybe developed by DBMS software firms and offered as a companion package to their database
products
22
Compiled by P. Chamanga
Internal Schema
1. Earliest data processing applications there was no formal data management software, all data
descriptions and input/output instructions were coded in each application program resulting in no
data independence every change to a data file required modification or rewriting of the application
program.
2. Access Methods was the first formal data management software. It is a software routine that manages
the details of accessing and retrieving records in a file providing storage independence. Storage units
can be changed (newer units replacing older units) without altering or modifying application
programs.
3. Two-level schema (two-schema architecture) was the most early database management systems
employed. Logical schema corresponds to an external or user view that describes the data as seen by
each application program. A physical schema corresponds to the internal schema that describes the
representation of data in computer facilities. This resulted in physical data independence that is, the
data structures or methods of representing data in secondary storage could be altered without
modifying application programs e.g. To achieve efficiency, linked lists could be used instead of
indexes without changing application programs. The two-level schema was characteristics of
structured database management systems, such as those that use the hierarchical and network data
models. This did not provide logical data independence.
4. Three-level schema provided by contemporary relational DBMS. The conceptual schema provides an
integrated view of the data resource for the entire organisation. This schema (conceptual) evolves
over time new data definitions are added to it as the database grows and matures. It provides both
logical and physical independence. It has logical data independence; the conceptual schema can grow
and evolve over time without affecting the external schema resulting in existing application programs
not need to modify as database evolves.
23
Compiled by P. Chamanga
A database management system that provides these three levels of data is said to follow a three-schema
architecture.
A schema is a logical model of a database. It captures the metadata that describe an organisations data in
a language that can be understood by the computer.
Physical data independence insulates a user from changes to the internal model
Logical data independence insulates a user from changes to the conceptual model.
24
Compiled by P. Chamanga
FILE ORGANISATION
A file contains groups of records used to provide information for operations, planning, decision
making etc.
It is a technique for physically arranging the records of a file on a secondary storage device.
File Organisation
Sequential Indexed
Direct
Hardware independent
(VSAM)
Hardware
dependent (ISAM)
25
Compiled by P. Chamanga
b)
Key H P Z
A D H K M P Q Z
d)
Key Hashing routine Relative record #
26
Compiled by P. Chamanga
Organisation Access
a) Sequential: Sequential:
Physical order of records in the file is the same Accessing a record is only by first
as the order in which records were written to accessing all records that physically
the file normally in ascending order of precede it
primary key
b) Indexed Sequential: Random/Sequential:
Records are stored in physical sequence Random access of individual records is
according to the primary key. possible without accessing other records.
The file management system or access method,
builds an index, separate from data records Entire file can be accessed sequentially
that contains key values and pointers to the
data themselves
c) Relative: Relative:
Also known as direct file organisation Each record can be retrieved by specifying
Records are often loaded in primary key its relative record number, which gives the
sequence so that the file can be processed position of the record relative to the
sequentially, but records can also be in random beginning of the file.
sequence.
The user or application program has
To specify the relative location of a desired
record.
d) Hashed: Relative:
Also known as direct file organisation in which Record is located by its relative record
hash addressing is used. number, as for a relative organisation.
The primary key value for a record is converted .
by an algorithm (called hashing routine) into a
relative record number.
Records are not in logical order.
Hashing algorithm scatters records throughout
the file, and is normally not in primary key
sequence.
27
Compiled by P. Chamanga
File organisation is rarely changed but record access mode can change each time the file is used.
Since it is not feasible to reserve a physical address for each possible record a method
called Hashing is used. Hashing is the process of calculating an address from the record
key.
Suppose that there were 500 employees in an organisation and we wanted to use the
Social Security Number as a key, it would be inefficient to reserve 999 999 999
addresses, one for each social security number.
Therefore, we could take the social security number and use it to derive the address of
the record. There are many hashing techniques, a common one is to divide the original
number by a prime number. This is known as the Division Method that approximates
the storage locations and then use the remainder as the address, as follows:-
Begin with the Social Security Number 053-4689-42. Then divide by 509,
yielding 105047. Note that 105047 multiplied by 509 does not equal 53468923
instead. The difference between the original number 53468942 and 53468923
is 19.
The storage location of the record for an employee whose Social Security
Number is 472-3840-86 has the same remainder.
When this occurs, the second person's record should be placed in a special
overflow area.
Example:
Qn. Given the following number 472-3840-86 divided by 509
(prime number). Find the physical location.
Solution:
28
Compiled by P. Chamanga
Modular Arithmetic
Divide key by the number of locations available for storage, and take the remainder
for example, 100 locations and a 4-digit key 1537:
Storage location 153 remainder 37
Therefore the storage location is 37
Alphanumeric keys need to be converted to base 36 or ASCII code for each
character or digit
Folding
Divide key into two or more parts added together. For example 872377 = 872
377
1249
Then apply the Modular Arithmetic
b) BURNS 10652
It works like a one-way street cannot be worked backwards.
Advantages:
Supports applications demanding quick record retrieval because locating and
reading desired record into memory usually requires a single access to the disk.
Involves a single calculation for finding the record number
Permits both numeric and alphanumeric keys
Easily implemented with COBOL, C, PASCAL instructions
29
Compiled by P. Chamanga
Disadvantages:
Hashing algorithms might result in collisions, that is identical remainders called
crashes or synonyms. For example Product Number C-64744 and F-42742 both
yield remainder 9739 when divided by 11001.
When collisions occur an indicator is stored in the first record to warn a user of the
crash. The indicator reveals where the other record really resides.
Due to collisions, extra disk space is allotted for a record that would otherwise
collide with another
Due to random order of the file, a sorting step must occur before listing or otherwise
processing the file in sequence
When file becomes full, a programmer writes a one-time program to rebuild it with
expansion space.
NOTE:
Sequential, direct, or indexed files depend on the users needs
For instant access to data direct and indexed techniques apply
For batch environments sequential techniques apply
HASHING
- This is a table scheme in which updates, searches and deletions could ideally be done in a constant
time
- We seek a mathematical function which produces table addresses when supplied with the key
- Since there are many more possible key values than addresses, this becomes a many-to-one function
in which many different values can read to the same address.
- Since we do not know which keys will arise in front, it is possible that 2 keys with the same address
will arrive and a hash collision will occur
- Therefore to design a good hash table we must find a solution to the following two problems:-
a) Find a hash function that minimises the number of collisions by spreading arriving records
around the table as evenly as possible
b) Since any hash function is many-to-one collisions are inevitable and therefore a good way
of resolving them is necessary
- There are basically four methods which are used to produce hash tables which are:- (mainly for
system software programming)
1) Truncation
2) Division
3) Midsquare
4) Partition/Folding
TRUNCATION
- This is a method where you normally take the last characters of the address
e.g h(2467) = 467
h(12601) = 601
h(12467) = 467
DIVISION
30
Compiled by P. Chamanga
- You take the key and MOD it by the MAXSIZE that is you will use the function:-
key MOD MAXISIZE
MIDSQUARE
- It converts the filename into its decimal equivalence, finds the middle digit and square it to give the
address
e.g. 49294 = 4
24683 = 36
PARTITION/FOLDING
- This method divides the number into groups, adds the individual groups to give the address
eg 510324 = 51+03+24
78
Therefore h(510324) = 78
- The technique of searching in a systematic and repetitive fashion for an alternative notation is called
PROBING
Incrementing function
- The incrementing function takes an address (i) not a key and produces another hash address
- If the new location is occupied, we take that hash address and pass it again through the incrementing
function etc until we find an open location and with luck we may be able to place it in a few probes
- Therefore we should have an indicator to tell whether the position is occupied or unoccupied and as
such we say that using linear probing we first of all, apply h(k) and then as many increments (i) as
we need
Disadvantages:
- Clearly linear probing results in clustering where a number of synonyms will be adjacent to each
other and mixed with others and as the table runs these clusters will inevitably grow larger and larger
making update, search and delete operations run more slowly
Advantages:
- It is suitable for small lists
31
Compiled by P. Chamanga
Suppose the stored record most recently accessed is record R1, and suppose the next
stored record required is record R2. Suppose also that stored R1 is stored on page P1
and R2 is stored on page P2. Then:-
1. If P1 and P2 are one and the same, then the access to R2 will not require
any physical input or output at all, because the desired page, page 2 will
already be in a buffer in main memory
2. If P1 and P2 are distinct but physically close together in particular if they
are physically adjacent then the access to record R2 will require a physical
input/output (unless of course Page P2 also happens to be in a main
memory buffer), but the seek time involved in that input/output will be
small, because the read/write heads will already be close to the desired
position. In particular, the seek time will be 0 if P1 and P2 are in the same
cylinder.
32
Compiled by P. Chamanga
3. Indexing
This is another file organisation method, which is divided into two areas namely:-
1. The Data Area
Contains all the records with all values or entries organised
sequentially which can be in ascending order
2. Index Area
Contains the record key per given track number. This record
key must be the highest in that track number. The 2 areas are
linked or joined by pointers
A non-dense index is sometimes called sparse index, does not contain an entry for every stored
record in the indexed file (1:m). Less storage space used index for a number of records.
S# Index Supplier
S1 Smith London
S2 Jones Paris
S2 S3 Blake Paris
S4 S4 Clarke London
S5 S5 Adams Athens
33
Compiled by P. Chamanga
S6 Brown Paris
34
Compiled by P. Chamanga
3. Compression Techniques
This is a way of minimising amount of storage for stored data by replacing the data with some
representation.
Front Compression
Rear Compression
Hierarchical Compression
Front Compression:
Example:
The following 4 names appear in a stored table. The field length is 10 characters. Apply front
differential compression:
Farai
Farasiya
Farisai
Farikayi
Solution:
Farai 0 - Faraibbbbb
Farasiya 4 - siyabb
Farisai 3 - isaibbb
Farikayi 4 - kayibb
Rear Compression:
35
Compiled by P. Chamanga
Example:
The following names appear in a stored table. The field length is 15 characters. Apply rear
compression.
Abrahams,GK
Ackermann,LZ
Ackroyd,S
Adams,T
Adams,TR
Adamson,CR
Allen,S
Ayres,ST
Bailey,TE
Baileyman,D
Solution:
Expanded form
Abrahams,GK 0-2 Ab Ab
Ackermann,LZ 1-3 cke Acke
Ackroyd,S 3-1 r Ackr
Adams,T 1-7 dams,T Adams,T
Adams,TR 7-1 R Adams,TR
Adamson,CR 5-1 o Adamso
Allen,S 1-1 l Al
Ayres,ST 1-1 y Ay
Bailey,TE 0-7 Bailey Bailey
Baileyman,D 6-1 m Baileym
36
Compiled by P. Chamanga
Hierarchical Compression:
A supplier stored file might be clustered by values of the city field, foe example all London
suppliers would be stored together etc. The set of all supplier records for a given city might be
compressed into a single hierarchic stored record, in which the city value in question appears
only once, followed by all the other details for each supplier who happens to be in that city.
Athens
S5 Adams 3
0
London
S1 Smith 20 S4 Clark 20
Paris
S2 Jones 10 S3 Blake 30
Intra-file
Page p1 page p2
S1 Smith 20 London S2 Jones 10 Paris
P1 300
P1 300
Inter-file
Combines supplier and shipment files into a single file and then apply intra-file compression to that single
file.
37
Compiled by P. Chamanga
The hierarchical and network models use standard files and provide structures that allow them to be cross-
referenced and integrated. They have been available since early 1970s. The relational model uses tables
to store data. It provides the ability to cross-reference and manipulate the data and it provides for data
integrity. The object-oriented model uses objects.
The upper most record in the tree structure is called the root record. From there data organised into
groups containing parent record can have many child records (called siblings), but each child record can
have only one parent record. Parent records are higher in the data structure than are child records,
however, each child can become a parent and have its own child records. Because relationships between
data items follow defined paths, access to the data is fast. However, any relationship between data items
must be defined when the database is being created.
Motor Car
Products
Parent-Child relationship
38
Compiled by P. Chamanga
1. One record type, called the root of the hierarchical schema does not participate as a child record
type in any Parent-Child relationship (PCR)
2. Every record type except the root participates as a child record type in exactly one PCR type
3. A record type can participate as parent record type in any number (zero/more) of PCR type
4. A record type that does not participate as parent record type in any PCR type is called a LEAF of
the hierarchical schema
5. If a record type participates as parent in more than one PCR type then its child record types are
ordered. The order is displayed, by convention, from left to right in a hierarchical diagram.
39
Compiled by P. Chamanga
A network database is similar to a hierarchical database except that each record can have more than one
parent, thus creating a many-to-many relationship among the records. For example, a customer may be
called on by more than one salesperson in the same company, and a single salesperson may call on more
than one customer. Within this structure, any record can be related to any other data element.
The main advantage of a network database is its ability to handle sophisticated relationships among
various records. Therefore more than one path can lead to a desired data level.
The network database structure is more versatile and flexible than is the hierarchical structure because the
route to data is not necessarily downwards, it can be in any direction.
In the network structure again similar to the hierarchical structure data access is fast, because
relationships must be defined during the database design. However network complexity limits users in
their ability to access the database without the help of programming staff.
Motor Car
Products
40
Compiled by P. Chamanga
A relational database is composed of many tables in which data are stored, but a relational database
involves more than just the use of tables. Tables in a relational database must have unique rows, and the
cells (the intersections of a row and a column - equivalent to a fields) must be single-valued (that is, each
cell must contain only one item of information, such as a name, address, or identification number).
A row is called a tuple and a column is called an attribute. The data type describing the types of values
that can appear in each column is called a domain.
Domain
Is a set of atomic values. An atomic value means that each value in the domain is indivisible as far as the
relational model is concerned.
Relation
A relation schema is a set of attributes. It is used to describe a relation. The degree of a relation is the
number of attributes n of its relation schema.
Defined as a set of tuples and tuples in a relation do not have any particular order. Values within a tuple
are ordered. Values in the tuple are atomic therefore composite and multivalued attributes are not allowed
that is the First Normal Form assumption.
41
Compiled by P. Chamanga
Tuple
All tuples in a relation must be distinct - no two tuples can have the same combination of values for their
attributes. Superkey is an attribute such that it can not be duplicated within a relation e.g. {studentID,
Name, Age} cannot remove any attribute. A relation may have more than one key - each of the keys is
called a candidate key.
Example:
Example:
Relation of degree 7
A database management system that allows data to be readily created, maintained, manipulated, and
retrieved from a relational database is called Relational Database Management System (RDBMS). The
RDBMS, not the user, must ensure that all tables conform to the requirements. The RDBMS also must
contain features that address the structure, integrity and manipulation of the database.
In a relational database, data relationships do not have to be predefined. Hence users can query a
relational database and establish data relationships spontaneously by joining common fields. A database
query language is a helpful tool that acts as an interface between users and a relational DBMS. The
language helps the users of a relational database to easily manipulate, analyse and create reports from the
data contained in the database. It is composed of easy-to-use statements that allow people other than
programmers to use the database.
42
Compiled by P. Chamanga
While the relational model is well suited to the needs of strong and manipulating business data, it is not
well suited for handling the data needs of certain complex applications, such as computer-aided design
(CAD) and computer aided software engineering (CASE).
Business data follow a defined data structure that the relational models handle well. However,
applications such as CAD and CASE deal with a variety of complex data types that can not be easily
expressed by relational models. Such programs also require massive amounts of persistent data (data that
can not be altered and that are stored in their own private memory space), and a database for them must
be able to evolve without affecting the data in memory that the application uses to operate.
An object-oriented database uses objects and messages to accommodate new types of data and provide for
advanced data handling. A database management system that allows objects to be readily created,
maintained, manipulated and retrieved from an object-oriented database is called an Object-Oriented
Database Management System (OODBMS)
An object-oriented database management system must still provide features that you would expect in any
other database management system, but there is still no clear standard for the object-oriented model.
A logical database design is a detailed description of a database in terms of the ways in which the users
will use the data.
During this phase an analyst performs a detailed study of the data identifying how the data is grouped
together and how they relate to each other. An analyst must also determine which fields have multiple
occurrences of data, which fields will be keys or indexes and the size and type of each field.
A Schema as a complete description of the contents and structure of a database. It defines the database to
the system, including the record layout, the names, length and size of all fields and the data relationships.
A Subschema defines each user's view, or specific parts of the database that a user can access. A
subschema restricts each user to certain records and fields within the database. Every database has one
and only one schema, but each user must have a subschema.
43
Compiled by P. Chamanga
In SQL, commands are given to define the structure of the database. Each database is identified by a
name, which is given in a CREATE DATABASE command.
The entities are defined as tables, with each attribute defined as a column in the table. A table then is
given a name, and each attribute declared by giving it a column name and stating its type. Supported data
types include:-
CHARACTER - values
SMALLINT - A restricted range of integers
DECIMAL - Which allow a fixed number of decimal places
FRONT - For floating point values
MONEY - Currency values
DATE - For dates
Each data type allows a certain set of possible values. There is also a possibility of a column having an
unknown value called NULL. When a column is specified, it is assumed to allow a value unless the
phrase NOT NULL is specified
NULL values should not be allowed in any column, which forms part of the primary key of the table.
The name Art.db is chosen for the database, while the tables are called painting, artist and gallery. The
database MONEY has been used so is assumed to be supported by the implementation. The only column,
which allows a NULL value, is Nationality in the artist table. A NULL value in this column of a particular
row would mean that the actual value is unknown.
44
Compiled by P. Chamanga
UNIQUE INDEXES are defined on the tables for the primary keys to prevent the system allowing rows in
the tables with duplicate values in the key.
Instead, an INDEX is created for the key and is specified as unique, so that any attempt to add rows with
same key will be trapped as an error. For the gallery and artist tables, the key has just one component
attribute, but the key for the painting table has two attributes and the index is created for the pair (title,
artist-name).
Indexes may be created for any number of columns in the table. Usually their purpose is to speed up
access to the data using the column value. Each index must be given a name, although it is not used again
and unless it is to be deleted. The names used for the indexes in the illustration above are painting.idx,
artist.idx and gallery.idx.
45
Compiled by P. Chamanga
The SQL SELECT statement is used to retrieve data from a table. It combines elements of the relational
algebra operation via its various options
SELECTION
In its simplest form, a SELECT command will select all data from the table, as in the example:-
SELECT *
FROM Art
The asterisk (*) indicates that all the columns (fields) of the table Art are to be selected.
Using the WHERE clause will restrict the rows (records) which are selected to those satisfying the
condition for example:-
SELECT *
FROM Art
WHERE cost > 5000
In this form the SQL SELECT provides the functions of the SELECT statement of the relational algebra
Practical example for the two SELECT statement to view the contents of Art table is pictured as
follows:-
TABLE: Art
TitleArtist_NameCostGallery_NamePoolVictor300ChitamboPeelJohn1000NyashaSonyArthur1500H
arareReelmTecla800NyashaTitoAmon4500Mutare
Questions:
1. Write an SQL code to view all records from the Art table
2. Write an SQL code to view all records from the Art table where cost is less than $1500 and
Gallery_Name is equal to Nyasha
Write an SQL statement to list only the columns Title, and Cost in the table Art
46
Compiled by P. Chamanga
Solution 1:
SELECT *
FROM Art
Resulting Table
TitleArtist_NameCostGallery_NamePoolVictor300ChitamboPeelJohn1000NyashaSonyArthur1500Harare
ReelmTecla800NyashaTitoAmon4500Mutare
Solution 2:
SELECT *
FROM Art
WHERE cost < 1500 and Gallery_Name = Nyasha
Resulting Table
TitleArtist_NameCostGallery_NamePeelJohn1000NyashaReelmTecla800Nyasha
Solution 3:
Resulting Table
TitleCostPool300Peel1000Sony1500Reelm800Tito4500
47
Compiled by P. Chamanga
PROJECTIONS
There is a provision in the SQL SELECT to cover the PROJECT of relational Algebra
The rows selected from a table can be projected into a list of their columns by including the column list
instead of the asterisk. The command:-
This is obtained from the Art table by first retrieving the rows, which satisfy the condition (Cost > 1000),
then projecting them into the 3 columns and the cost values are omitted from the result.
Table Art
TitleArtist_NameGallery_NameSonyArthurHarareTitoAmonMutare
If the SELECT command specifies all the components of the primary key of the table as part of the
column list the resulting rows will also be identified by the key value
In particular there will be no duplicate rows in the table, however, if the list of columns does not contain
the key or primary key, there maybe duplicate rows in the resulting table. An example is shown below
which is the result of applying the command.
SELECT Gallery_Name
FROM Art
WHERE Cost > 700
48
Compiled by P. Chamanga
Resulting Table
Gallery_NameNyashaHarareNyashaMutare
A variation of the SELECT command can be used to ensure that duplicate rows are removed from the
result. It uses the DISTINCT Key word within the SELECT
The above code will remove all duplicate rows producing the following table:-
Gallery_NameNyashaHarareMutare
* This is so, because we are projecting on Gallery_Name only but using a DISTINCT command
where we have to satisfy a condition given.
49
Compiled by P. Chamanga
All of the SELECT commands mentioned previously produce tables as their results with the rows
appearing in the order in which they are found
It is possible to specify a particular order for the rows based on the selected column values by including an
'ORDER BY' clause
For example:
This will produce the rows in ascending order of gallery name as shown on the table below
Gallery_NameHarareMutareNyasha
50
Compiled by P. Chamanga
GROUPED DATA
There are additional clauses in the SELECT command, which allows it to deal with groups in data rather
than individual rows. The GROUP BY clause combines records with identical values in the specific field
list into a single record
The final result of the SELECT is formed by projecting values into the selected columns. For example,
consider the command:-
SELECT Gallery_Name
FROM Art
WHERE cost < 1000
GROUPED BY Gallery_Name
It will produce a list of Gallery_Names, which hold Art whose cost is < 1000. The GROUP BY clause
causes all the selected rows with the same Gallery_Name to be grouped into a single row.
The projection onto the Gallery_Name is then performed and resulting table has got no duplicate names.
In fact it is equivalent to SELECT DISTINCT command. An added advantage of grouping data is that
there are standard functions, which can be applied to groups and producing one value for the whole group.
They include:-
Example 1
Write an SQL command to calculate the SUM of ALL COST in table painting.
Solution:
SELECT SUM(cost)
FROM painting
The computer will then sum up all the cost figures in the table painting and display the total only.
Example 2
Write an SQL statement using table painting to display the following output:
Gallery-name Cost
Example 3
Write an SQL statement to find the total galleries in the table painting.
Solution:
SELECT DISTINCT(gallery-name)
FROM painting
ORDER BY gallery-name
OR
51
Compiled by P. Chamanga
SELECT COUNT(gallery-name)
FROM painting
GROUPED BY gallery-name
Example 4
Write an SQL command to find or to list the maximum cost value in the table painting.
Solution:
SELECT MAX(cost)
52
Compiled by P. Chamanga
FROM painting
SUB QUERIES
The WHERE clause can be express a complex condition. It can be used in what is called a SUBQUERY.
This makes use of another SELECT statement as part of the condition Nested SELECT statement.
Suppose we want to find all paintings by a particular artist the following statement is issued.
SELECT artist-name
FROM artist-table
WHERE artist-name = John
This produces a table of artist names equal to John. It can be used as part of the WHERE condition in the
SELECT statement which accesses or retrieves the tuple painting.
SELECT *
FROM painting IN(SELECT artist-name
FROM artist
WHERE artist-name = John)
It extracts rows from the table painting where the artist name appears in the sub query. The IN operator is
used to perform this test on the result of the sub query. This IN operator and its
negation/complement/inverse NOT IN are not the only operators for use in sub queries
SELECT *
FROM painting NOT IN(SELECT artist-name
FROM artist
WHERE artist-name = John)
ALL and ANY operators can be used with a relational operator such as >= to test the column value against
the result of the sub query. To select the titles of the most costly paintings we could use the following
command:
SELECT title
FROM painting
WHERE cost >= ALL(SELECT cost
FROM painting)
CONSTRUCTING USER ACCESS
When a central database is used for a number of different users who have different requirements, it is
essential to be able to tailor the data to the different needs. In this case, there are two SQL features which
provide these facilities:
VIEWS
A view is a virtual (does not physically exist) table obtained from the real tables by a SELECT statement.
Its main use is to tailor the data of a table to the needs of particular users, so that it omits details of no
interest or should not see.
In the example of the table painting, it may be desired to let most users see all the data except for the cost.
53
Compiled by P. Chamanga
The statement:
SELECT *
FROM details
WHERE gallery-name = Chipangali
Uses the view as a table. It retrieves the data relating to the paintings in the Chipangali gallery, but does
not include the cost, since the virtual table is formed by ignoring the cost column and is not part of the
view. Views can be created for any SELECT statement, not just like those which limit the columns of a
table.
A virtual table of all paintings held at the gallery Chipangali would be created by the command:
This would contain all of the 4 columns of the table painting, but only those rows relating to the gallery
Chipangali.
Once a view has been created its definition as a SELECT statement will exist until a DROP VIEW
command is performed.
While it exists, it can be treated as a table although it is only a virtual table.
GRANTING PRIVILEGES
Users of a database are identified by a user name. Individual users can be granted privileges which give
them certain permission to use the SQL command on the database
Permissions may also be granted to all users by using the key word PUBLIC instead of the user name.
The GRANT CONNECT command is available to define passwords for a list of users. It has the form:
GRANT CONNECT TO <user list>
IDENTIFIED BY <passwords>
It can be used to set up the password(s) for the new users or to alter the existing user passwords. Some
implementations do not use this facility, but rely on the operating system to deal with passwords for users.
Specific privileges to permit the use of SQL statements on a table or view are allocated by further GRANT
command. They have the following form:
Where table is the name of the table or view, user list is either a list of names or the key word PUBLIC,
and privilege list is a list of key words for the privileges.
SELECT
INSERT
DELETE
54
Compiled by P. Chamanga
UPDATE may have a list of commands, stating those which are allowed to be updated. The default is, to
allow columns to be updated.
Would let all 2 named users to use the SELECT command on the table painting and UPDATE the
columns cost and gallery-name only.
Since the privileges can be granted selectively a considerable degree of control of user access to data is
available.
Class exercise:
Student
Stud-ID Student-Name Town Course-Level Fee
HND1002 Chipo Harare HND 7500
ND2001 Edmore Mutare ND1 6500
ND200100 Takura Harare ND2 3000
ND2003 Simba Kwekwe ND1 6500
ND2008 Esther Bulawayo ND1 6500
HND1004 Rachel Mutare HND 7500
NC3001 James Gweru NC 3500
NC3007 Oscar Kwekwe NC 3500
ND2009 Linda Bulawayo ND1 6500
55
Compiled by P. Chamanga
Question:
Given the following ERD design a detailed database using SQL necessary for the illustration.
crs-id
dob
STUDENT COURSE
ATTENDS
TEACHE
S
Qualification #-of-stud
TEACHER
name
56
Compiled by P. Chamanga
1. Adding Data: SQL provides an INSERT command to add a single record to a table, for example:
INSERT INTO student VALUES
(HND1006, James Made, Mutare, HND, 7500)
This will add a row to the student table with all column values defined. The indexes associated
with the tables are updated automatically such that reentering the same record will be rejected.
2. Deleting Data: The DELETE command is used to remove rows or records from a table. In its
simplest form it will remove all rows as in the command:
DELETE FROM<table-name> will remove all rows
DELETE FROM <TABLE-NAME>
WHERE <condition> only rows meeting the set condition will be removed.
The WHERE clause is used in the DELETE command and in other commands. The conditions
can be quite complex, enabling the commands to be very selectively applied.
They allow:
(a) AND, OR and NOT to be used as logical connections
(b) Numerical and character data to be compared for either equality or inequality such as:
>, <, =, >=, <=
57
Compiled by P. Chamanga
SYNONYM USAGE
SELECT UNIQUE p#
FROM sp spx
WHERE p# IN
SELECT p#
FROM sp
WHERE s# ( =) spx.s#
NOT IN
OR
SELECT p#
FROM sp
GROUP BY p#
HAVING COUNT (s#) > 1
DICTIONARY
Collection of relations i.e. Catalog and columns
Useful to a user who does not know all the fields of some tables but only an attribute.
CREATE SYNONYMS
Specifies an alternative name for a table/view; often used to define an abbreviation or to avoid
prefacing with the owner name of the table.
DROP SYNONYMS
Destroys a synonym declaration
COMMENTS STATEMENT
Provides an explanatory remark for table columns (stored as part of internal definition tables.
Used in updating a catalog together with DELETE, CREATE TABLE, ALTER, INSERT
58
Compiled by P. Chamanga
59
Compiled by P. Chamanga
First version
Database development
Once the design is in place, one can build the database by executing SQL commands.
Study and analyse the business requirements. Interview users and managers to identify the
information requirements. Incorporate the enterprise and application mission statements as well
as any future system specifications
Build models of the system. Transfer the business narrative developed in the strategy and
analysis phase into a graphical representation of business information needs and rules. Confirm
and define the model with the analysts and experts.
Design
Design the database. The entity relationship model maps entities to tables, attributes to columns,
relationships to foreign keys, and business rules to constraints.
Build the prototype system. Write and execute the commands to create the tables and supporting
objects for the database.
Develop user documentation, help-screen text, and operations manuals to support the use and
operation of the system
Transition
Refine the prototype. Move an application into production with user acceptance testing,
conversion of existing data, and parallel operations. Make any modifications required.
Production
Roll out the system to the users. Operate the production system. Monitor its performance, and
enhance and refine the system.
Second version
60
Compiled by P. Chamanga
Data Planning
Requirements Specifications
Conceptual Design
Logical Design
Physical Design
Data Planning:
It states all the long term strategic procedures required to develop a proper database system
Analyst develops a model of business processes and documenting all processes involved which
will be used as input to the second stage
Requirements Specification
Defines and represents the users requirements of a business process using everyday language or
any other methodologies such as DFDs
Conceptual Design
Logical Design
Translates the conceptual design of a business process by representing data using database
models that is Network database model or Hierarchical database model or Relational database
model.
Physical Design
61
Compiled by P. Chamanga
DISCUSSION
At what stage would one deal with the following things and why?
ERDs
Normalisation
62
Compiled by P. Chamanga
Planning
63
Compiled by P. Chamanga
STAGEMAJOR FUNCTIONS
Planning correctness, increase programmers
1. Develop entity charts productivity)
2. Analyse costs and benefits 3. Establish security techniques (Passwords,
3. Develop implementation plan access tables, encryption)
4. Evaluate and select software and hardware 4. Load databases (Special programs to load
5. Establish application priorities from different files)
6. Develop data standards (Naming 5. Specify test procedures
conversions and definitions eg Customer: 6. Establish procedures for backup and
Prospective, Prior, No Longer) recovery
7. Conduct user training
Requirements Formulation & Analysis
1. Define user requirements Operation & Maintenance
2. Develop data definitions 1. Monitor database performance
3. Develop data dictionary 2. Tune and reorganise databases
3. Enforce standards
Design 4. Support users
1. Design conceptual model
2. Design External models (Modelling Growth & Change
Organisations data, DBA interact with 1. Implement change control procedures 2.
users and other system specialists in data Plan growth and change
processing) Change in size: Storage space utilisation,
3. Design Internal models (schemas) DBA allocate additional space, reallocate
4. Design Integrity controls existing space
Change in content/structure: new
Implementation application requests, alter logical and
1. Specify database access policies (Rights) physical database structure
2. Develop standards for application Change in usage pattern: performance
programming (For consistency & monitoring, assigning frequently accessed
records to faster devices, additional higher
performance hardware devices.
64
Compiled by P. Chamanga
Planning
Implementation
Planning:
Its purpose is to develop a strategic plan for database development that supports the overall
organisation business plan
Design Stage
Its purpose is to develop a database architecture that will meet the information needs of the
organisation now and in the future. There are 3 stages in database design, that is, Conceptual,
Implementation & Physical design.
a) Conceptual Design: Its purpose is to synthesise the various user views and information
requirements into a global database design. The design is called Conceptual Schema/Data
Model and may be expressed in one of the several forms that is, entity relationship diagram,
semantic data model, normalise relation. The Conceptual Data Model describes entities,
attributes and relationships.
b) Implementation Design: Its purpose is to map the Conceptual Data Model into a logical
schema that can be processed by a particular DBMS. The conceptual data model is mapped
into hierarchical, network or relational data model.
c) Physical Design: Last stage of Database design concerned with designing stored record
formats, selecting access methods and deciding on physical factors such as record blocking.
Also concerned with database security, integrity and backup and recovery.
65
Compiled by P. Chamanga
Implementation Stage:
Once the database is completed, the implementation process begins. The first step is the creation
or initial load of the database. Database administration manages the loading process and resolves
any inconsistencies that arise during this process.
66
Compiled by P. Chamanga
1. Planning:
Develop entity charts
Analyse costs and benefits
Develop implementation plan
Evaluate and select software or hardware
Establish application priorities
Develop data standards
2. Requirements Formulation & Analysis:
Define user requirements
Develop data definitions
Develop data dictionary
3. Database Design:
Design conceptual model
Design external models
Design internal models
Design integrity controls
4. Database Implementation:
Specify database access policies
Develop standards for application programming
Establish security techniques
Load database
Specify test procedures
Establish procedures for backup & recovery
Conduct user training
5. Operations & Maintenance:
Monitor database performance
Tune and reorganise database
Enforce standards
Supports users
6. Growth & Change
Implement change control procedures
Plan growth & change
DATABASE IMPLEMENTATION
DBMS Functions:
Data storage, retrieval & update. A database may be shared by many users, the DBMS must provide
multiple user views and allow users to store, retrieve and update their data easily and efficiently
Data Dictionary/Directory
The DBMS must maintain a user accessible data dictionary
Recovery Services:
The DBMS must be able to restore the database or return it to a non-condition in the event of some system
failure. Sources of system failure include:
Operator error
Disk head crashes
Program error
Security mechanisms:
Data must be protected against accidental or intentional misuse or destruction. The DBMS must provide
mechanism for controlling accessed data and defining what action (read only, update may be taken by
each user.)
67
Compiled by P. Chamanga
NORMALISATION
Is the analysis of functional attributes (data items). The purpose of normalization is to reduce complex
user views to a set of small, stable data structures. Normalized data structures are more flexible, stable
and easier to maintain than unnormalized structures.
Steps in Normalization:
USER VIEWS
UNNORMALISED
RELATION
Remove repeating groups
1NF
RELATIONS
2NF
RELATIONS Remove transitive dependencies
3NF
RELATIONS
BCNF
RELATIONS
4NF
RELATIONS Remove join dependencies
5NF
RELATIONS
68
Compiled by P. Chamanga
Unnormalised Relation:
It is a relation that contains one or more repeating groups for example GRADE-REPORT:
GRADE-REPORT
Stud# Studname Major Course# Crs-title Lec-name L-office Grade
38214 Takura IS IS350 Dbase Chamanga 6 A
IS465 SAD Makura 10 C
69173 Esther PM IS465 SAD Makura 10 A
PM300 Proj-Mgt Makura 10 B
QM400 OR Kachepa 11 C
Major 1:1
Stud# course# 1 : M
Crs-title 1 : M
Lec-name 1:M
L-office 1 : M
There are multiple values at the intersection of certain rows and columns. Since each student takes more
than one course, the course data in the above relation constituents a repeating group within student data.
In an unnormalised relation, a single attribute can not save as a candidate or primary key. Suppose we
take student number as a primary key, there is a one-to-one relationship from student number to student
name and major. However, the relationship is one-to-many from student number to course and remaining
attributes. The student number is not a primary key, since it does not uniquely identify all the attributes in
this relation.
Normalised Relations:
A normalised relation is one that contains only single values at the intersection of each row and
column. A normalised relation contains no repeating groups. To normalise a relation that contains a
single repeating group we remove the repeating group and form 2 relations. The 2 new relations
formed from the above example are as in Student(S) and Student-Course(SC). Student relation is
already in 3rd NF whereas Student-Course relation is in 1 st NF.
Therefore stud# is not a candidate key because it does not uniquely identify all attributes in this relation.
69
Compiled by P. Chamanga
Update anomally when one wants to change SAD to ASAD in crs-title there is need to search the entire
relation failure of which results in data inconsistent.
INF
A relation with a single repeating group will form 2 relations by removing the repeating group.
S(student)
Stud# Studname Major
38214 Takura IS
69173 Esther PM
SC(student-course)
Stud# Course# Crs-title Lec-name L-office Grade
38214 IS350 Dbase Chamanga 6 A
38214 IS465 SAD Makura 10 C
69173 IS465 SAD Makura 10 A
69173 PM300 Proj-Mgt Makura 10 B
69173 QM400 OR Kachepa 11 C
INF with primary key (stud#, course#) attributes from repeating group.
Primary key uniquely identifies students grade.
Student-Course still has data redundancy which results in update anomalies in INSERTING, DELETING,
UPDATING data.
INSERT:
To insert a new course it is impossible because if no student is taking that course that results in a null
value for stud# which is not allowed.
DELETE:
To delete a student record for a particular tuple results in loosing course title, and lecturer details.
Leaving the course details results in a NULL value for stud# which is part of the key and it is not allowed.
UPDATE:
To update course title since it appears a number of times for example, SAD there is need to search through
every tuple. There is inefficiency and might result in data inconsistencies in the case of failure to update
all occurrences.
The above problems being a result of nonkey attributes which are dependent on only part of the key, that
is, course# for example:
course# crs-title
Lec-name
L-office
Grade is fully dependent on (stud, course#) whereas Crs-title, Lec-name, L-office partially depend on the
primary key (stud#, course#). As shown below.
70
Compiled by P. Chamanga
Crs-title
Stud#
grade
Lec-name
Course#
L-office
2NF
By removing attributes which are partially dependent on the primary key creating 2 relations:
1. With attributes fully dependent on the primary key
2. With attributes partially dependent on part of the primary key
R(Registration)
Stud# Course# Grade
38214 IS350 A
38214 IS465 C
69173 IS465 A
69173 PM300 B
69173 QM400 C
3NF
CL(Course-Lecturer)
Course# Crs-Title Lec-Name L-Office
IS350 DBase Chamanga 6
IS465 SAD Makunga 8
PM300 Project Mgt Makunga 8
QM400 OR Kachepa 11
2NF
Course title appeara once in course-lecturer relation which solves the update anomally. Course data can
be inserted, deleted without reference tostudent data
Course# Crs-Title
Lec-Name
L-Office
Lec-Name L-Office This illustrates that there is a unique office for a lecturer,
that is transitive dependency when one nonkey attribute is dependent on one or more nonkey attributes.
INSERT:
71
Compiled by P. Chamanga
It is impossible to insert a new lecturer since it is dependent on course#. The new lecturer is not yet
assigned to teach at least one course. It is not possible for example to insert Ms Mvududu until one or
more courses have been assigned to her.
DELETE:
Deleting course data results in a lecturer data lost for example, deleting course# IS350 results in loss of
Chamanga data
UPDATE:
Lecturer data occur many times therefore changing lecturer office for Makunga requires searching every
tuple failure to which will result in data inconsistency for example one tuple reads Rm 8 and another will
read Rm 12.
3NF
Removing attributes that participate in transitive dependency, for example, Lec-Name and L-Office results
in the following relations:
C(Course)
Course# Crs-Title Lec-Name
IS350 DBase Chamanga
IS465 SAD Makunga
PM300 Project-Mgt Makunga
QM400 OR Kachepa
Primary Key (Course#) and Foreign Key (Lec-Name)
L(Lecturer)
Lec-Name L-Office
Chamanga 6
Kachepa 11
Makunga 8
Primary Key (Lec-Name)
The assumption is that L-Office can have more than one occupant therefore Lec-Name becomes primary
key and associates the 2 relations course and lecturer.
In this 3NF insertion and deletion can be done without referencing other entities. Updates are also
possible because they are confined to a single tuple within a relation
C(Course)
Course# Crs-Title Lec-Name
IS350 DBase Chamanga
IS465 SAD Makunga
PM300 Project-Mgt Makunga
QM400 OR Kachepa
L(Lecturer)
Lec-Name L-Office
Chamanga 6
Kachepa 11
Makunga 8
R(Registration)
Stud# Course# Grade
38214 IS350 A
38214 IS465 C
69173 IS465 A
72
Compiled by P. Chamanga
69173 PM300 B
69173 QM400 C
S(student)
Stud# Studname Major
38214 Takura IS
69173 Esther PM
Relations in 3NF are sufficient for most practical database design problems. When a relation has more
than one candidate key, problems may arise even if it is in 3NF, hence the further normal forms come in,
for example, BCNF, 4NF, 5NF, DKNF.
73
Compiled by P. Chamanga
SMA(student-Major-Advisor)
Stud# Major Advisor
123 Physics Edwin
123 Music Chioniso
456 Biology Machuma
789 Physics Tawanda
999 Physics Edwin
Student#
Advisor
Major
They are still anomalies in the relation above, that is, suppose that student# 456 changes her major, from
Biology to Maths, when the tuple of that student is updated, we lose that Machuma advises Biology
(update anomaly)
Suppose we want to insert a tuple with the information that Gamu advises in Computers. This can not be
done until at least one student majoring in Computers is assigned Gamu as an advisor (insertion anomaly)
In the above relation there are 2 candidate keys, student#, major and student#, advisor. The type of
anomalies that exist in this relation can occur when there are 2 or more overlapping candidate keys.
BCNF definition
A relation is in BCNF if and only if every determinant is a candidate key.
Determinant is any attribute simple or composite on which some other attribute is fully functionally
dependent, for example, in the above relation, the attribute advisor is determinant, since major is fully
functionally dependent on advisor.
To make the above relation in BCNF we make Advisor a candidate key and project the original 3 rd NF
relation into 2 relations that are in BCNF.
SA(Student-advisor) AM(Advisor-Major)
Student# Major Advisor Major
74
Compiled by P. Chamanga
Even when a relation is in BCNF it may still contain unwanted redundancy that may result in update
anomalies, for example, consider the following unnormalised relation
O(Offering)
Course Instructor Textbook
Mgt White Drucker
Black Peters
Green
Finance Gray Weston
Gilford
Assumptions:
1. Each course has one or more instructors
2. For each course, all of the textbooks indicated are used.
O(Offering)
Course Instructor Textbook
Management White Drucker
Management Green Drucker
Management Black Drucker
Management White Peters
Management Green Peters
Management Black Peters
Finance Gray Weston
Finance Gray Gilford
Normalised Relation
From the normalised relation offering, for each course, all possible combinations of instructor and
textbooks appear in the resulting relation. The primary key of this relation consist of all the 3 attributes
(BCNF). The above relation contains redundant data. This can lead to update anomalies, that is, suppose
you want to add a third textbook to the management course. This would require the addition of 3 new
rows to the relation, one for each instructor. From the above relation you can see that for each course
there is a well defined set of instructors (one-to-many relationship) and a well defined set of textbooks
(one-to-many relationship). However, the instructors and textbooks are independent of each other. The
relationship can be summarised as follows:
Multivalued dependency
Multivalued Dependency
Exists when there are 3 attributes for example, a, b, & c, and for each value of a there is a well defined set
of values of b and a well defined set of values of c. However, the set of values of b is independent of set c
and vice-versa
75
Compiled by P. Chamanga
To remove the multivalued dependency from a relation, we project the relation into 2 relations each of
which contains one of the 2 independent attributes.
4NF
A relation is in 4NF if it is in BCNF and contains no multivalued dependencies.
L(Lecturer) T(Text)
Course Instructor Course Textbook
Mgt White Mgt Drucker
Mgt Black Mgt Peters
Mgt Green Finance Weston
Finance Gray Finance Gilford
5NF
The normal formal form is designed to cope with join dependency. A relation that has a joint dependency
can not be decomposed by projection into other relations.
5th NF: a relation is said to be in 5NF if it is in 4NF and all loin dependencies are removed.
76
Compiled by P. Chamanga
Limitations of Normalisation
Users may have to join several tables for retrieval which require additional computer time
Referential integrity is more difficult to enforce when a table is decomposed via normalisation
Objectives of Normalisation:
Reduce redundancy
Produce a stable data structure.
77
Compiled by P. Chamanga
78
Compiled by P. Chamanga
SECURITY
Security refers to the protection of data against unauthorised access, alterations or destruction
INTEGRITY
Refers to the accuracy or validity of data
In other words security involves ensuring that the users are allowed to do the things they are trying to do
Integrity also involves ensuring the things they are trying to do are correct.
In both cases the system needs to be aware of certain rules that users must not violate. These rules must
be specified (typically by the DBA), using suitable language, and must be maintained in the system
catalog or dictionary and in both cases the DBA or DBMS must monitor user operations to ensure that the
rules are thus enforced.
There are numerous aspects to the security problem, among them are the following:
1. The legal, social and ethical aspects: Examples are does the person making a request, say for the
customer credit have a legal right to the requested information?
2. Physical control: Is the computer or terminal room locked or otherwise guarded?
3. Policy Questions: How does the enterprise owing the system decide on who should be allowed access
to hat?
4. Operational Problems: If a password scheme is used, how are the passwords kept secret and how are
they changed?
5. Hardware controls: Does the processing unit provide any security features such as storage protection
keys or a privileged operation mode.
6. Operating system security: Does the operating system erase the contents of storsge and data files
when they are finished with?
Now, modern DBMS typically support either or both of the two the approaches to data security. The
approaches are: Discretional or Mandatory.
79
Compiled by P. Chamanga
Mandatory Control:
Each data element is tagged or labeled with a certain classic level and each user is given certain clearance
level.
A given data object can be accessed only by users with the appropriate clearance level. This is enforced by
the DBA
Regardless of whether we are dealing with a discretional or mandatory scheme, all the decisions as to
which users have to perform which operation or which object are policy decisions, not technical ones.
All the DBMS can do is to enforce those decisions once they are made.
It follows that, the result of those policy decisions:-
Must be made known to the system (by means of statements in some appropriate definition language),
and
Must be remembered by the system (by means of saving them in the catalog, in the form of security
rules also known as authorisation rules)
There must be a means of checking a given access request against the applicable security rules (by access
requests here we mean the combination of requested operation plus requested object plus requested user, in
general).
This checking is done by the DBMS security subsystem, also known as the authorisation subsystem.
In order that maybe able to decide which security rules are applicable to a given access request, the
subsystem must be able to recognise the source of that request that is, it must be able to recognise the
requesting user. For that reason, when users sign in to the system they are typically required to supply not
only their user ID (to say who they are), but also a password (to prove they are who they say they are).
The password supposed to be known only to the system and to the legitimate users of the user ID
concerned.
Regarding this last point, incidentally note that any number of distinct users might be able to share the
same group User ID. In this way the system can support user groups, and can thus provide a way of
allowing everyone, for instance, in accounting department to share the same privileges.
The operations of adding individual users to or removing individual users from a given group can then be
performed independent of the operation of specifying the privileges that apply to that group.
Note however that the obvious place to keep a record of which groups are again in the catalog.
To repeat from the previous section most DBMS support either discretionary control or mandatory or both.
Infact, it would be more accurate to say that most systems support discretionary control and some systems
support mandatory control as well. Discretionary control is thus more likely to be encountered in practice.
As already noted, there is need to be a language that supports the definition of security rules. We therefore
begin by describing a hypothetical example of such a language, shown as follow:-
The above example is meant to illustrate the point that security rules have 5 components as follows:
1. A name (pr3 painting rule 3) in the example the rule will be registered in the system catalog under
the name pr3. The name will probably also appear in a message or diagnostics produced by the
system in response to an attempted violation of the rule.
2. One or more privileges (SELECT & UPDATE in the example) specified by means of the GRANT
clause.
3. The scope to which the rule applies specified by means of the ON clause. In the example the scope is
painting tuples or records where the gallery-name is not Chitombo.
4. One, or more users (more accurately user IDs) who are to be granted the specified privileges over the
specified scope, specified by means of the TO clause.
80
Compiled by P. Chamanga
5. A violation response specified by the ON ATTEMPT violation clause, telling the system what to do if
the user attempts to violate a rule. In the example, the violation response is simply to REJECT the
attempt and provide suitable diagnostics. Such a respond will surely be the one mostly required on
practice so it is set to be the default response.
For example:
DESTROY SECURITY pr3
For simplicity we assume that destroying a given named relation wil automatically destroy any security
rules that apply to that relation.
AUDIT TRAILS
Its a special file or database in which the system keeps track automatically of all operations performed by
users on a regular database. A typical entry in the audit trail might contain the following information:
RECOVERY
Recovery is the process of rebuilding a system pack to its original status after a system, media, transaction
failure etc.
SYSTEM FAILURE
Shut downs caused by hardware or hubs in the O/S, hardware system or other system software will be
referred to as a system crash. When the system crashes, all transactions currently executing terminates.
The contents of internal memory (which include I/O buffers) are assumed lost. However, we assume that
external memory including disks on which the database resides are not affected by the system failure.
CONCURRENCY
Data Sharing
There are several problems which can result from the sharing of access to the database that is there is lost
update. If 2 users are allowed to hold the same tuple concurrently the first of the 2 subsequent update
operations will be nullified by the second, since the effect of the second will be to overwrite the result of
the first.
Solution
1. Grant the user issuing the first hold an exclusive lock on the data held
2. No other user will be allowed to access the data while it is locked to the first user
3. The user issuing the second hold will have to wait until the first user releases the lock
4. The second user will in turn be granted an exclusive lock on the data
5. The effect of the second hold will be to retrieve the data as updated by the first user.
However, the exclusive locking technique leads in turn to other problems that is deadlock and starvation
(discussed previously)
81
Compiled by P. Chamanga
DATA SECURITY
The protection of data in the database against unauthorised disclosure, alteration or destruction.
Authorisation Mechanisms
a) Identification
b) Authentication
Identification Users have to identify themselves to the system before accessing the database by
supplying an operator/username using machine readable cards
Authentication - The process if proving their identification by providing passwords, pin numbers,
answering some questions from the system.
Access Control
For each user the system will maintain a user profile, generated from the user definition supplied by the
DBA.
The details of the appropriate identification and authentication procedures would have been given on the
access controls. Operations allowed for a particular user to perform are to be given. The DBMS will go
through a series of test to determine whether to grant or delay access to the user. The tests may be
arranged in a sequence of increasing complexity, so that a program may reach its final decision as quickly
as possible.
DATABASE INTEGRITY
Ensuring that the data is accurate all times.
Constraints
Each relation in the database will have a set of integrity constraints associated with it.
These constraints will be held in the data dictionary as part of the conceptual schema
They specify for example, that values of a particular attribute in some relation are to be within certain
boundary, or that within each tuple of some relation the values of one attribute may not exceed that of
another.
1. Primary Key Posses a property of uniqueness. No 2 tuples in the relation may have the same
value for this attribute or attribute combination
2. No component of a primary key value may be null
Enforcement
The DBMS must reject any attempt to generate a tuple whose key value is null or is a duplicate of the
one that already exists.
Bounds Entry
Values occurring in a particular attribute may be required to lie within certain bounds (eg values of
employee age: 15<age<60)
The constraints are specified by the Bounds Entry. The lower and upper limit have to be defined.
Values Entry
There may be a very small set of permitted values of some particular attribute combination eg
permitted values for primary colour are red, blue, green etc. In this case the permitted values could
simply be listed in a values entry for the relevant attribute or attribute combination
NB It might be desirable to list values or ranges of values that are not permissible for the attributes
concerned.
82
Compiled by P. Chamanga
Format Entry
Values of a particular attribute may have to conform to a particular format. Eg the first character of a
supplier number must be the letter S.
The constraint is specified in a format entry for the relevant attribute.
Average Function
The set of values of a particular attribute relation may have to specify some statistical constraints eg
no employee may earn a salary that is more than twice the average salary for the department.
The predicate defining this constraint will enforce the library function AVERAGE
To enforce it the DBMS will have to monitor all storage operations against the employee relation
NB
All examples given above are of static constraints that is they specify conditions that must hold for
every given state of the database.
Another important type of constraint involves transition from one state to another eg when
employees salary is updated, the new value must be greater than the old value.
To specify such constraints it will need to specify the old and new values
The keywords OLD and NEW are reserved for this purpose.
A special case of transition is that from non-existence (ie addition of new tuple) or from existence to non-
existence (ie deletion of an existing tuple)
RECOVERY ROUTINES
Recovery routines are used to restore the database, or some portion of the database, to an earlir state after
a system failure (hardware or software) has caused the contents of the database in main storage to be lost.
They take as input a backup copy of the database (produced by the dump routines) together with the
system journal (which contains details of operations that have occurred since the dump was taken) and
produce as output a new copy of the data as it was before the failure occurred.
NB Any transactions that were in progress at the time of the failure will probably have to be restarted.
BACKUP ROUTINE
Dump routines
These are used to take backup copies of selected portions of the database, also usually on tape.
It is normal practice to dump the database regularly say once a week
If the database is very large it may be more practical to dump one seventh of every day
Each time a dump is taken, a new system journal may be started and the previous one erased or
archived
Backup is normally initiated automatically by the DBMS before the database has committed its
change.
Checkpoint/Restart Routines
Backing up and rerunning a long transaction in its entirety can be a time consuming process
Some systems permit transactions to take checkpoint at suitable points in their executions
The checkpoint routines will cause all changes made since the last checkpoint to be committed.
The checkpoint facility allows a long transaction to be divided up into a sequence of short ones
The checkpoint routine may also record values of specified program variables in a checkpoint entry in
the system journal
83
Compiled by P. Chamanga
And in the case of an operation involving change to the database, the type of change and address of the
data changed, together with its before and after values
Encryption/scrambling
Used to protect or is the protection of the database against an infiltrator who attempts to by pass
against the system
Example of by passing the system involves a user who physically removes part of the database for
example by stealing a disk pack
Apart from normal security measures to prevent unauthorised personnel from entering the computer
centre, the most important safeguard against physical removal of part of the database is the use of
scrambling techniques
Scrambling/encryption and privacy transformations techniques involves the following:
(a) Shuffling the characters of each tuple (or record or message) into different order
(b) Replacement of each character (or group of characters) by a different character (or group
of characters), from the same alphabet or different one
(c) Groups of characters are algebraically combined in some way with a special group of
characters (privacy key) supplied by the owner of the data.
TRANSACTIONS
A transaction is a unit of work with the property that the database is;
a) In a consistent state (state of integrity) both before it and after it but
b) Is possibly not in such in state between these 2 times
In general any changes made to the database during a transaction should not be visible to concurrent
transactions until such changes have been made, in order to prevent these concurrent transactions
from seeing the database in an inconsistent state.
Any data changed by a given transaction including data created or destroyed by that transaction
should remain locked until that transaction terminates
The above discipline must be enforced by the DBMS
A transaction will be backed out if on completion it is found that the database is not in a state of
integrity
A transaction may also be backed out if the system detects a deadlock: A general strategy for such a
situation is to choose one of the deadlocked transaction, say the one most recently started or the one
that has made the changes and remove it from the system, thus freeing its locked resources for use by
other transactions.
The process of back out involves undoing all the changes that the transaction has, made releasing all
resources locked by the transaction and scheduling all the transaction for re-execution.
Example of Transaction
In a banking system a typical transaction might be
Transfer amount X from account A to account B This would be viewed as a single operation and a user
would have to enter a command such as
The above transaction requires several changes to be made to the underlying database.
Specifically it involves updating the balance value in 2 distinct account tuples
Although the database is in a state of integrity before and after the sequence of changes, it may not be
throughout the entire transaction, ie some of the intermediate state (or transitions) may violate one or
more integrity constraints
It follows that there is need to be able to specify that certain constraints should not be checked until
the end of the transaction. These are called deferred constraints
By contrast, constraints that are enforced continuously during the intermediate steps of the
transaction are called intermediate
NB: The data sublanguages must include some means of signaling the end of the transaction, in order to
cause the DBMS to apply deferred checks
84
Compiled by P. Chamanga
CONCURRENCY
In most systems, several users can access a database concurrently. The operating system switches
execution from one user program to another to minimise waiting for input or output operations
Within this approach transactions are often interleaved, that is, several steps are performed on transaction
A, then several steps on transaction B, followed by more steps on transaction A and so on.
1. 2 users are in the process of updating the same record which represents a savings account record for
customer A
2. At present time customer A has a balance of $100 in her account
3. User 1 reads her record into the user work area, intending to post a customer withdrawal of $150
4. Next user 2 reads the same record into that user area, intending to post a customer deposit of $25
5. User 1 posts the withdrawal and stores the record, which now indicates a balance of $50
6. User 2 then posts the deposit (increasing the balance to $125) and stores this record on top of the one
stored by user 1
7. The record now indicates a balance of $125
8. In this case the transaction for user 1 has been lost because of interference between transaction
85
Compiled by P. Chamanga
INCONSISTENT ANALYSIS
Usually occurs in traditional file approach when the same data are stored in multiple locations,
inconsistencies in the data are inevitable that is, several of the files below contain customer data
If the files are to be consistent this change for address must be made simultaneously and correctly to
each of the files containing the customer address data item
Since files are controlled by different users it is very likely that some files will reflect the old address
while others reflect the new address. Inconsistency in stored data are one of the most common
sources of errors in computer applications that is, the outdated customer address may lead to a
customer invoice being mailed to the wrong location. As a result, the invoice may be returned as the
customer payment delayed or lost.
TRANSACTION RECOVERY
SYSTEM RECOVERY
DATABASE INTEGRITY
CONCURRENCY
86
Compiled by P. Chamanga
The definition of a Primary Key and its uniqueness property, no duplicates, no NULL values and to
enforce it the DBMS rejects an attempt to input records with NULL primary key values or
duplicate values
Functional Dependencies represent another form of integrity constraint.
Comparison expression eg qtyout value not to exceed qtyord value
Lower and Upper limit values specified
Valid/Permitted values for a certain attribute
Attribute values conforming to a particular format
Statistical constraint eg no employee may earn more than twice the average salary for the department
DEADLOCK
Occurs when each of the two transactions is waiting for the other to release an item
Solution
Deadlock detection
No locks but periodically checks if the system is in a state of deadlock
Wait-for graph
Abort some of the transactions if theres a deadlock
Techniques to keep database in a consistent state with respect to specified constraints on the
database
Both Database security and Protection, and Database semantic Integrity are stored in the
DBMS catalog.
87
Compiled by P. Chamanga
SUPPORT ROUTINES
Journaling Routines:
Records every operation in system log/audit trail/system journal
Dump Routines:
Take back-up copies of the database, restarts a new system log after every dump routine
Recovery Routines:
Used to restore the database or some portion of the database after a system failure (hardware or
software) has caused contents of the database buffers in main storage to be lost.
Backout Routines:
Initiated automatically by the DBMS before transaction changes are committed.
Checkpoint/Restart Routines:
Cause all changes made since the last checkpoint to be committed. Instead of restarting a long
transaction it only restarts from the last checkpoint.
Detection Routines:
Detects any violations and back the transaction out of the system with the information on list of
constraints violated and offending tuples.
88
Compiled by P. Chamanga
Database
concepts
and
design
Courage Makota
89