Vous êtes sur la page 1sur 40

INSTITUTE OF INNOVATION IN TECHNOLOGY AND MANAGEMENT

Handouts for Subject: DATABASE MANAGEMENT SYSTEMS (DBMS)


Prepared for: BCA

Paper Code: BCA-110

UNIT-1:
DATA:
Data can be defined as a collection of raw facts. Raw
facts refer to a collection of numbers, characters,
images or other outputs.
In other words, data are facts and figures that are not
currently being used in a decision process and take
the form of historical records that are recorded and
filed.
Data is often viewed as the lowest level of abstraction from which information and
knowledge are derived. Data is limitless and present everywhere in the universe. We can
consider data as groups of information that represent the attributes of a variable or set of
variables. For Ex: Names, Telephone numbers and Addresses of your friends.
INFORMATION:
Information the word is derived from Latin word informer which means give form to.
Hence, here we are giving some meaningful form to meaningless data.
It is the collection of processed data gathered through various means of communication. In
other words, information is the processed data on which decisions are taken and actions are
performed thereafter. Information is organized so that it can be meaningful and has some
value to the recipient. The characteristics of information are as follows:
1) Accurate: To be useful, information must be accurate at all levels. Accurate
information provides a reliable and valid representation of raw facts. The cost of
inaccurate or distorted information can be extremely high.
2) Timely: Information is appreciated only if it is available on time. If information is
available ahead of time its value may be diminished.
3) Complete: Without complete information, a decision maker may get a distorted view
of reality which may lead to huge losses.
4) Precise: Information should be precise, containing all the essential elements of
relevant subject areas. We should not bury important information in the stacks of large
data.
Thus, it provides power to find and evaluate the problems and make decisions effectively and
efficiently.

DIFFERENCE BETWEEN DATA & INFORMATION:


DATA
i.
ii.
iii.
iv.
v.

INFORMATION

It is raw in nature.
It is processed data.
It is not used in decision making.
Decisions are made on the basis of information.
It gives birth to information.
When absorbed gives birth to knowledge.
They are recorded and filed.
They are retrieved and processed.
They are not organized and are of They are organized and are of
no significance to business.
large significance to business.

DATABASE:
A database is an organized collection of logically related data so that it can be easily managed
and updated. Database has some source from which data is derived, some degree of
interaction with events in the real world and an audience that is actively interested in the
contents of the database. A database is a structured collection of data.
E.g.: Dictionary, Student record registers.
Features of database:
1. Shared:
Data in database can be shared among different users.
2. Persistence:
Data exists permanently.
3. Security:
Data is protected from unauthorized access.
4. Non redundancy:
No duplicity of data.
5. Independent:
Data is independent at each level so that the changes made to one level does not reflect the
other.

DATABASE MANAGEMENT SYSTEM:


Database Management System is general purpose software
with a collection of programs that enables users to define,
create and maintain a database. It facilitates the processes of
defining, constructing, manipulating databases for various
applications. It also helps in sharing, protection and
maintenance of databases. It is a computerized record
keeping system. Database stores information whereas
DBMS is a system to manage database.
A database system is just a computerized record-keeping
system. The database is regarded as a kind of repository or container for a collection of
computerized data files where users can perform a variety of operations for e.g.

Defining a database:
Specifying the data types, structure and constraints for data to be stored in a database.

Constructing:
Storing data on some storage device that is controlled by the DBMS.

Manipulating:
Functions like querying, updating and generating reports from database.

Sharing:
Multiple users and programs access database concurrently.

Protection:
System protection from hardware failures and security protection.

Maintenance:
Allowing system to grow with changing requirements.

CHARACTERISTICS OF DATABASES:
1. Self describing nature of the database system:
The database system contains not only the database itself but also a complete description of
the database structure and constraints. This definition is stored in the DBMS catalog and is
called as metadata as it describes the actual structure of database.
2. Data isolation:
The structure of the DBMS files is stored in the DBMS catalog separately from the access
programs. This property is also known as program-data independence.
3. Support multiple views of data:
A database generally has many users each of which many require a different view of the
database. A view may be defined as a subset of the database from the database files but is not
3

stored separately. A multiuser DBMS generally provides this facility of providing multiple
views.
4. Data sharing and multi user transaction processing:
Since DBMS provides multiple views at the same time, so it must include concurrency
control software to ensure that several users trying to update the same data in do so in a
controlled manner.

COMPONENTS OF DBMS:

Hardware: The hardware is the actual computer system used for keeping and accessing
the database. Can range from a PC to a network of computers.

Software: This includes DBMS, operating system, network software (if necessary) and
also the application programs.

Data: It is used by the organization and a description of this data is called the schema.

Procedures: These are the instructions and rules that should be applied to the design and
use of the database and DBMS.

People: Includes database designers, DBAs, application programmers, and end-users.

NEED OF DATABASES:

Database and Database Management Systems (DBMS) have become essential for
managing our business, government, banks, universities and every other kind of
human endeavor.
They are a critical element of todays software industry to solve the problems of
managing huge amounts of data that are increasingly being stored.
A Database System is a central repository in an organizations information system and
is essential for supporting the organizations functions, maintaining the data for these
functions and helping users interpret the data in decision making.

APPLICATION AREAS OF DBMS:


Database is widely used all

around

the

world

1.Banking:
For customer information, accounts loans and banking transactions.
2.Airlines:
For reservations and schedule information.

in

differnt

sectors:

3.Universities:
For student information, course registrations and grades.
4.Credit card transactions:
For purchases on credit cards and generation of monthly statements.
5.Telecommunications:
For keeping records of calls made, generating monthly bills, maintaining balances on prepaid
calling cards and storing information about the communication networks.
6.Finance:
For storing information about holdings, sales and purchase of financial instruments such as
stocks and bonds.
7.Sales:
For customer, product and purchase information.
8.Manufacturing:
For management of supply chain and for tracking production of items in factories,
inventories of items in warehouses/stores and orders for items.
9.Human Resources:
For information about employees, salaries, payroll taxes and benefits and for generation of
paychecks.
10.Web based services:
For taking web users feedback,responses,resource sharing etc.

ADVANTAGES OF DBMS:

Data independence:
It provides an abstract view of the data that hides the details of data representation and
storage. The data should be such that the changes made to it at one level should not
affect other levels.

Data Access:
It provides us with a fast and efficient data access. For eg: if a bank officer wants to
know the number of customers whose a/c balance is Rs 1000 or more, he can simply
make a query and he will be provided with it.

Data integrity:

The data values stored in the database must satisfy certain types of consistency
constraints. For eg: the balance of a bank a/c may never fall below Rs 500.

Data security:
Not every type of database user should be allowed to access all the data. For eg: in a
college management system, students should not be allowed to access faculty details.

Concurrent Access and crash recovery:


For overall performance of the system and faster response, many systems allow
multiple users to update the data simultaneously. In such an environment, interaction
of concurrent updates may result in inconsistent data. So, it ensures concurrent access
of the data in such a way that the data is being accessed by only one user a time. It
also protects the system from crashes.

LIMITATIONS OF DBMS:
Although there are many advantages of DBMS, the DBMS may also have some minor
disadvantages. These are:
1.Cost of Hardware & Software:
A processor with high speed of data processing and memory of large size is required to run
the DBMS software. It means that you have to upgrade the hardware used for file-based
system. Similarly, DBMS software is also very costly.
2. Cost of Data Conversion:
When a computer file-based system is replaced with a database system, the data stored into
data file must be converted to database file. It is very difficult and costly method to convert
data of data files into database. You have to hire database and system designers along with
application programmers. Alternatively, you have to take the services of some software
house. So a lot of money has to be paid for developing software.
3. Cost of Staff Training:
Most DBMS are often complex systems so the training for users to use the DBMS is
required. Training is required at all levels, including programming, application development,
and database administration. The organization has to be paid a lot of amount for the training
of staff to run the DBMS.
4. Appointing Technical Staff:
The trained technical persons such as database administrator, application programmers, data
entry operators etc. are required to handle the DBMS. You have to pay handsome salaries to
these persons. Therefore, the system cost increases.

5. Database Damage:
In most of the organizations, all data is integrated into a single database. If database is
damaged due to electric failure or database is corrupted on the storage media, then your
valuable data may be lost forever.

TRADITIONAL FILE ORIENTED APPROACH


In computing, a file system is a method of storing
and organizing computer files and the data they
contain to make it easy to find and access them. A
file system is a special-purpose database for the
storage, organization, manipulation, and retrieval of
data. Each file in file system defines and manages its
own data.
The traditional file-oriented approach to information
processing has for each application a separate master
file and its own set of personal files.
It is the methodology which is applied to structured computer files. Files contain computer
records which can be documents or information which is stored in a certain way for later
retrieval. File organization refers primarily to the logical arrangement of data.
Computer based data processing initially involved holding all information in an organization
in different files and folders, known as file-oriented system. In typical file processing system
for an organization, system programmers may need to write several programs to meet the
needs of the organization as the need arises.
Let us consider the following example: part of a savings-bank enterprise that keeps
information about customers and savings accounts. One way to store this information on a
computer is to store it in operating system files. To allow users to manipulate the information,
the system shall have a number of application programs that manipulate the files, including:
A program to debit or credit an account
A program to add a new account
A program to find the balance of an account
7

A program to generate monthly statements


System programmers write these application programs to meet the needs of the bank. New
application programs are added to the system as the need arises. This typical File-processing
system is supported by a conventional operating system. Before database management
systems (DBMSs) came along, organizations usually stored information in these systems. In
simple terms, a File Management System (FMS) is a Database Management System that
allows access to single files or tables at a time. FMSs accommodate flat files that have no
relation to other files. The FMS was the predecessor for the Database Management System
(DBMS).
Advantages of file system:
1.
2.
3.
4.

Simpler to use and access.


Less expensive.
Fits the needs of many small businesses and home users.
Good for database solutions for hand held devices.

Disadvantages/Drawbacks of File Processing Systems include:


1. Data Redundancy:
In a file system if information is needed by two distinct applications, then it may be
stored in two or more files. For example, the particulars of an employee may be stored
in payroll and leave record applications separately. Some of this information may be
changing, such as the address, the pay drawn, etc. It is therefore quite possible that
while the address in the master file for one application has been updated the address
in the master file for second application may have not been. Sometimes, it may not be
easy to find that in how many files the repeating items such as the address has
occurred. The solution, therefore, is to avoid this data redundancy by storing the
address at just one place physically, and making it accessible to all applications.
2. Program/Data Dependency:
In the traditional file oriented approach if a data field (attribute) is to be added to a
master file, all such programs that access the master file would have to be changed to
allow for this new field that would have been added to the master record. This is
referred to as data dependence.
3. Lengthy Development Times:
There is little opportunity to leverage previous development efforts. Each new
application requires the developer to start from scratch by designing new file formats
and descriptions.
4. Lack of data sharing and availability:
Information cannot flow freely across different functional areas or different parts of
the organization. Users find different values of the same piece of information in two
different systems, and hence they may not use these systems because they cannot trust
the accuracy of the data.

5. Lack of Security:
Anyone can easily access some confidential /important data
other department.

related with some

FILE SYSTEM V/S DBMS:


FILE SYSTEM

DBMS SYSTEM

1. They are smaller systems.

1. They are larger systems.

2. They are cheaper.

2. They are expensive.

3. They are less secure and have simple structure.

3. They are very secure and have complex structure.

4. They have simple backup recovery system.

4. They have complex backup recovery system.

5.They are single user based.


6. Structure of data files embedded in
application programs, so any changes to the
structure of a file may require changing all
programs that access this file.
7. Conventional File processing environments
do not allow data to be retrieved in a
convenient and efficient manner.
8. Conventional file processing

data

is

scattered in various files and files may be in


different formats, writing new application
programs to retrieve new data is difficult.

5. They are multiuser based.


6. DBMS access programs do not require such
changes in most cases as the structure of data
files is stored in the DBMS catalog separately
from the access programs.
7. Database systems allow data retrieval in a
convenient and efficient manner according to
user preference.
8. This is not the case in database systems.

PEOPLE DEALING WITH DATABASES:


Users may be divided into those who actually use and control the content (called actors on
the scene) and those who enable the database to be developed and the DBMS software to be
designed and implemented (called workers behind the scene).

10

1) Database administrators (DBAs):


Centralized control of the database is exerted by a person or group of person under the
supervision of high-level administrator. This person or group is referred to as the
DBA. They are the user who are more familiar with the database and are responsible
for managing the database system, authorizing access, coordinating & monitoring
uses, acquiring resources. The DBA is responsible for many critical tasks:
1. Security and Authorization: The DBA is responsible for ensuring that unauthorized
data access is not permitted. In general, not everyone should be able to access the
data. Users can be granted permissions to access only views and relations. For
example: The DBA can enforce the policy by giving students permission to read only
the course information and not the faculty salaries information.

11

2. Data availability and recovery from failure: The DBA must take steps to ensure that if
system fails, users can continue to access as much as of the uncorrupted data as
possible. The DBA must also work to restore the data to a consistent state. The DBA
is also responsible for implementing procedures to back up the data periodically.
3. Database tuning: Database tuning describes a group of activities used to optimize and
homogenize the performance of a database. The goal is to maximize use of system
resources to perform work as efficiently and rapidly as possible. The needs of the
users are likely to evolve with time the DBA is responsible for modifying the
database, in particular the conceptual and physical schemas to ensure the adequate
performance as user requirements change.
2) Database designers: Responsible for designing the database, identifying the data
to be stored, choosing the structures to represent and store this data.
3) End Users: The persons that use the database for querying, updating, generating reports,
etc. The various types of end users are:

Casual end users: These users occasionally access the database and need different
information each time. They learn only a few facilities that they may be used
repeatedly. They use a sophisticated database query language to specify their requests
and are typically middle- or high-level managers or other occasional browsers.

Parametric (or naive) end users: Users who constantly query and update the database,
using standard types of queries and updates called canned transactions that have been
carefully programmed and tested. They need to learn very little about the facilities
provided by the DBMS. For example: A user of ATM falls in this category. The user is
instructed through each step of a transaction. The operations performed by this class
of user are very limited. Other such nave users are end user of the database who
works through a menu-oriented application program.

Sophisticated end users: They include Engineers, scientists, business analysts, and
others who thoroughly familiarize themselves with the facilities of the DBMS so as to
implement their applications to meet their complex requirements. Use full DBMS
capabilities for implementing complex applications.

Stand-alone users (personal databases): They Maintain personal databases by using


ready-made program packages that provides easy-to-use menu- or graphics-based
interfaces. An example is the user of a tax package that stores a variety of personal
financial data for tax purposes. Typically become very proficient in using a specific
software package

5) System Analysts/Application programmers: Professional programmers who are


responsible for developing application programs or user interfaces utilized by the
nave and online user fall under this category. They develop packages that facilities
data access for end user, who are usually not the computer professionals, using the
host and or data languages and software tools that DBMS vendors provide.

12

6) DBMS system designers and implementers are persons who design and implement the
DBMS modules and interfaces as a software package. A DBMS is a complex software
system that consists of many components or modules, including modules for
implementing the catalog, query language, interface processors, data access,
concurrency control, recovery, and security. The DBMS must interface with other
system software, such as the operating system and compilers for various programming
languages.
7) Tool developers include persons who design and implement toolsthe software
packages that facilitate database system design and use, and help improve
performance. Tools are optional packages that are often purchased separately. They
include packages for database design, performance monitoring, natural language or
graphical interfaces, prototyping, simulation, and test data generation. In many cases,
independent software vendors develop and market these tools.
8) Operators and maintenance personnel are the system administration personnel who
are responsible for the actual running and maintenance of the hardware and software
environment for the database system.
DBMS ARCHITECTURE:
A commonly used view of data approach is the three-level architecture suggested by
ANSI/SPARC (American National Standards Institute/Standards Planning and Requirements
Committee).

Objectives:

Insulation of application programs and data.


Support of multiple user views.
Use of schema to store the DB description (meta-data).

13

Under this approach, a database is considered as containing data about an enterprise. The
three levels of the architecture are three different views of the data:
1. External individual user view.
2. Conceptual logical user view.
3. Internal physical or storage view.
The three level database architecture allows a clear separation of the information meaning
(conceptual view) from the external data representation and from the physical data structure
layout. A database system that is able to separate the three different views of data is likely to
be flexible and adaptable. This flexibility and adaptability is data independence that we will
discuss further.
ADVANTAGES OF THREE-TIER SCHEME:
Each user is able to access the same data with a different view of the data as per their
requirements.
User is not concerned about the physical storage details.
Internal structure of the database is unaffected by changes to the physical storage
organization, such as changeover to a new storage device.
DBA is able to change the database storage structure without affecting the users view.
We now briefly discuss the three different views.
1. External Level or View Level or user view:
The external level is the view that the individual user of the database has. This view is
often a restricted view of the database and the same database may provide a number
of different views for different classes of users. In general, the end users and even the
applications programmers are only interested in a subset of the database. For example,
a department head may only be interested in the departmental finances and student
enrolments but not the library information. The librarian would not be expected to
have any interest in the information about academic staff. The payroll office will have
no interest in student enrolments.
2. Conceptual Level or Logical View: It describes the structure of the whole database of
users. This schema hides the details of physical storage structure and concentrate on
describing entities, data types, what data is stored in database and the relationships
among the data. This level contains the logical structure of the entire database as seen
by the DBA. The conceptual view represents:
All entities, their attributes, and their relationships.
The constraints on the data
Semantic information about the data
Security & integrity information
E.g. In case of student database entity is Student and attributes for this entity are
Roll No, Name, Course, Address etc.
14

Data Field

Data Type

Size

Constraint

Roll No

Number

10

unique

Name

Text

15

Not null

Course

Text

10

Not null

3. Internal Level or Storage Level or physical view: It is the physical representation of


the database on the computer. This level describes how the data is stored in the
database. It is concerned with how the data are physically stored on the hardware. The
internal view represents:
Storage space allocation for data
Record descriptions for storage
Storage path of the database.
The user requests specified at the external schemas level must be transferred into a request at
the conceptual schema level. This transformed request at the conceptual level should be
further transformed at the internal schema level for final processing of data in the stored
database as per the users request. Again the final result from the processed data as per users
request must be reformatted to satisfy the users external view. This process of transforming
of requests and results between the levels is called mapping. Thus the three-tier architecture
of the ANSI-SPARC architecture for the following 2 stage mappings as follows:
(i)

Conceptual/Internal Mapping:
The conceptual schema is related to the internal schema through
conceptual/internal mapping. It defines the correspondence between the
conceptual view and the stored database. It also enables DBMS to find the actual
records or combination of records in physical storage that constitutes a logical
record in the conceptual schema, together with any constraints to be enforced on
the operations for that logical record. In case of any change in the structure of the
stored database, the conceptual/internal mapping is also changed accordingly by
the DBA, so that the conceptual schema can remain invariant and effects of
changes to internal schema.

(ii)

External/Conceptual Mapping:
Each external schema is related to the conceptual schema by external/conceptual
mapping. It defines the corresponding between a particular external view and
conceptual view. A number of external views can exist at the same time, any
number of users can share a given external view and different external views can
overlap. There could be one mapping between conceptual and internal levels and
several mappings between external and conceptual levels.

Thus the Conceptual/Internal Mapping is the key to the physical data independence while the
External/Conceptual Mapping is the key to logical data independence.
15

DATA INDEPENDENCE:
Data independence can be defined as the capacity to change the schema at one level of a
database system without having to change the schema at the next higher level, thus insulating
application programs from changes in the way data is structured and stored. It is
accomplished by changing the mapping between the two levels. Data Independence is a
major objective of implementing DBMS in an organization. It is the type of data transparency
that matters for a centralized DBMS. The two types of data independence are:
(i)

Logical data independence:


Logical data independence is the capacity to change conceptual schema without
having to change external schemas or application programs. We may change the
conceptual schema to expand the database (by adding a record type or data item),
or to reduce the database (by removing a record type or data item). Application
programs that reference the external schema constructs must work as before, after
the conceptual schema undergoes a logical reorganization. The ability to modify
the conceptual schema without causing application programs to be rewritten,
usually done when logical structure of database is altered. Logical independence is
harder to achieve as the application programs are heavily dependent on the logical
structure of the data.

(ii)

Physical data independence:


Physical data independence is the capacity to change the internal schema without
having to change the conceptual schemas. Changes to the internal schema may be
needed because some physical files had to be reorganized for example creating
additional access structures-to improve the performance of retrieval or update.

Data independence is accomplished because; when the schema is changed at one level the
schema at the next higher level remains unchanged; only the mapping between the two levels
change.

SCHEMA AND INSTANCES:


SCHEMA: The overall design of the database is called the database schema. A schema
display only names of entities and the attributes, data types of attributes and size. The schema
remains same while the value filled into it changes with instance to instance. For eg:
Name
(text)(20)

Course Schema:

Roll No

Phone no

Address

(number)(4)

(number)(10)

(text)(10)

Course

Course Id

Department

16

Student Schema:

INSTANCE: Database changes over time when information is inserted or deleted. The
collection of information stored in the database at a particular moment is called an instance of
the database. For eg:
Instance of Student Database:
Rohit

6789123

9897651230

D-29, janak puri, New Delhi

STRUCTURE OF DBMS:
DBMS (Database Management System) acts
as an interface between the user and the
database. The user requests the DBMS to
perform various operations (insert, delete,
update and retrieval) on the database. The
components of DBMS perform these
requested operations on the database and
provide necessary data to the users. The
various components of DBMS are shown
below: 1. DDL Compiler - Data Description Language compiler processes schema definitions
specified in the DDL. It converts the data definition statements into a set of tables.
2. DML Compiler and Query optimiser - The DML commands such as insert, update, delete,
retrieve from the application program are sent to the DML compiler for compilation into
object code for database access. The object code is then optimised in the best way to execute
a query by the query optimiser and then send to the data manager.
3. Data Manager - The Data Manager is the central software component of the DBMS also
knows as Database Control System.
The Main Functions of Data Manager is:
Convert operations in user's Queries coming from the application programs or
combination of DML Compiler and Query optimizer which is known as Query
Processor from user's logical view to physical file system.

Controls DBMS information access that is stored on disk.

It also enforces constraints to maintain consistency and integrity of the data.

It also synchronizes the simultaneous operations performed by the concurrent users.

It also controls the backup and recovery operations.


17

4. Data Dictionary - Data Dictionary is a repository of description of data in the database. It


contains information about

Data - names of the tables, names of attributes of each table, length of attributes, and
number of rows in each table.

Relationships between database transactions and data items referenced by them which
is useful in determining which transactions are affected when certain data definitions
are changed.

Constraints on data i.e. range of values permitted.

Detailed information on physical database design such as storage structure, access


paths, files and record sizes.

Access Authorization - is the Description of database users their responsibilities and


their access rights.

Data dictionary is used to actually control the data integrity, database operation and accuracy.
It may be used as a important part of the DBMS.
Importance of Data Dictionary
Data Dictionary is necessary in the databases due to following reasons:
It improves the control of DBA over the information system and user's understanding
of use of the system.
It helps in documenting the database design process by storing documentation of the
result of every design phase and design decisions.

It helps in searching the views on the database definitions of those views.

It provides great assistance in producing a report of which data elements (i.e. data
values) are used in all the programs.

It promotes data independence i.e. by addition or modifications of structures in the


database application program are not affected.

5. Data Files - It contains the data portion of the database.


6. Compiled DML - The DML complier converts the high level Queries into low level file
access commands known as compiled DML.
7. End Users - Discussed in people dealing with databases.

18

DATABASE LANGUAGES:
To support a variety of users, DBMS must provide appropriate languages and interfaces for
each category for users to express database queries and updates.
Following languages are used to specify database schemas:
i)

Data definition language (DDL):


For describing data and data structures a suitable description tool, a data definition
language (DDL), is needed. With this help a data scheme can be defined and also
changed later. Typical DDL operations (with their respective keywords in the
structured query language SQL):
CREATE TABLE: Creation of tables and definition of attributes
ALTER TABLE: Change of tables by adding or deleting attributes
DROP TABLE: Deletion of whole table including content

ii)

Storage definition language (SDL):


It is used to specify the internal schema, storage method and access method used by
database.

iii)

View definition language (VDL):


It is used in the language that is used to specify user view

iv)

Data Manipulation language (DML):


Additionally a language for the descriptions of the operations with data like store,
search, read, change, etc. the so-called data manipulation, is needed. Such operations
can be done with a data manipulation language (DML). Within such languages
keywords like insert, modify, update, delete, select, etc. are common.
Typical DML operations (with their respective keywords in the structured query
language SQL):
INSERT: Add data

UPDATE: Change data

DELETE: Delete data

SELECT: Query data

19

DATA MODELS:

A data model is a method of abstraction or hiding superfluous (extra) details while


highlighting those required by database applications.

Data modeling is used to represent entities and their relationships in a database. A data
model is a conceptual model for structuring data.

A number of models for data representation have been developed. The models differ
in their method of representing the association amongst entities and attributes.

It is a set of concepts that can be used to describe the structure of a database. Structure
of database includes:
data types

relationships

constraints

basic operations for specifying retrievals and updates


TYPES OF DATA MODELS:

20

(I)

Hierarchical Model:
The hierarchical model is used to describe those record structures in which the
various physical records which make up the logical record are tied together in a
sequence which looks like an inverted tree. At the top of the structure is a single
record. Beneath that are one or more records each of which can occur one or more
times. Each of these can in turn have multiple records beneath them. In
diagrammatic form, the top to bottom set of records looks like an inverted tree or a
pyramid of records. The various records in the lower part of the structure are
accessed by first accessing the records above them and then following the chain of
pointers to the records at the next lower levels. The records at any given level are
referred to as the parent records and the records at the next lower level that are
connected to it, or dependent on it are referred to as its children or the child
records. There can be any number of records at any level, and each record can
have any number of children. Each occurrence of the structure normally represents
the collection of data about a single subject. This parent-child repetition can be
repeated through several levels.

21


1.
2.
3.
4.
5.
6.

Advantages:
Simplicity
Data Sharing
Data Security
Data Independence
Data Integrity
Efficiency

1.
2.
3.
4.
5.
6.
7.
8.
9.

Disadvantages:
Data Relationships are Difficult to Modify
Queries Restricted to Traversing the Hierarchy
Multiple Parents not Allowed
Implementation Complexity
Inflexibility
Database Management Problems
Lack of Structural Independence
Implementation Limitation
No Standards

(II)

Network Model:
The popularity of the network data model coincided with the popularity of the
hierarchical data model. Some data were more naturally modeled with more than
one parent per child. So, the network model permitted the modeling of many-tomany relationships in data. In 1971, the Conference on Data Systems Languages
(CODASYL) formally defined the network model. The basic data modeling
22

construct in the network model is the set construct. A set consists of an owner
record type, a set name, and a member record type. A member record type can
have that role in more than one set; hence the multi parent concept is supported.
An owner record type can also be a member or owner in another set. The data
model is a simple network, and link and intersection record types (called junction
records by IDMS) may exist, as well as sets between them. Thus, the complete
network of relationships is represented by several pair wise sets; in each set some
(one) record type is owner (at the tail of the network arrow) and one or more
record types are members (at the head of the relationship arrow). Usually, a set
defines a 1: M relationship, although 1:1 is permitted.
Advantages
i)
Simplicity
ii)
Facilitating more relationship types
iii)
Superior data access
iv)
Database Integrity
v)
Data Independence
vi)
Database Standards

Emp
loye
e1

Empl
oyee
2

Proj
ect
1

Pr
oje
ct
2

Disadvantages
i)
System Complexity
ii)
Absence of structural independence
iii)
Less User-Friendly

Figure showing Network Model:


Consider these tables:
EMPLOYEE
FNAME LNAME

SSN EDATE

AD
D

SEX

SALARY DNO

DEPARTMENT
DN
DNAME MGRSSN
O
Now the network model is as SUPERVISOR
follows:
SUP_SSN
WORKS_ON
(III) Relational Model:
SSN PNO
HOURS
Relational
Model is introduced by Ted
Codd of IBM in 1970. Central concept is a relation, which is actually a set

23

of records in a table of values. Examples: Oracle DBMS, DB2, SQL


Server and MS Access.
Model represents the database as a collection of relations. Each relation
resembles a table of values, each row representing a collection of related
values.
Formally the table is a relation in which each row is called a tuple and
column an attribute.
A schema is actually a description of data in terms of a data model. In this
model, the schema for a relation specifies its name, the name of each field
and the type of each field.
Ex: Student (sid: string; name: string, login: string, age: integer, gpa: real).
For eg:

ADVANTAGES:

Simplicity
Structural Independence
Ease of design, implementation, maintenance and uses
Flexible and powerful query capability
No anomalies

DISADVANTAGES:
Hardware Overheads
Easy-to-design capability leading to bad design
Properties of Relational Tables:

Values Are Atomic.


Each Row is Unique
Column Values Are of the Same Kind
The Sequence of Columns is Insignificant
The Sequence of Rows is Insignificant
Each Column Has a Unique Name.

The RELATIONAL database model is based on the Relational Algebra. For example, an
"orders" table might contain (customer-ID, product-code) pairs and a "products" table might
24

contain (product-code, price) pairs so to calculate a given customer's bill you would sum the
prices of all products ordered by that customer by joining on the product-code fields of the
two tables.

E.g. of Relational Database:


Consider the following tables OR RELATIONS:
EMPLOYEE
FNAME LNAME

SSN

DEPARTMENT
DNO
DNAME

MGRSSN

EDATE

ADD

SEX

SALARY

DNO

DEPT_LOCATION
DNO
DLOCATION
PROJECT
PNAME

PNO

WORKS_ON
SSN
PNO

Data
Models

PLOCATION

DNO

HOURS

COMPARISON BETWEEN MODELS:


Data Element
Relationship
Identity
Organization
Organization

25

Access
Language

Data
Independence

Structural
Independence

Hierarchica
l

Files, Records

Logical
Recordproximity in a based
linear tree.

Procedural

Yes

No

Files, Records

Intersecting
Networks

Recordbased

Procedural

Yes

No

Tables

Identifiers of
rows on one
table are
embedded as
attribute values
in another table

Valuebased

NonProcedural

Yes

No

Network

Relational

ENTITY-RELATIONSHIP MODEL:
ER Model is a popular high level conceptual data model. ER model describes the data as
entity, relationships and attributes. The basic objects that the ER model represents are entity
and attributes. The entity-relationship (ER) data model allows us to describe the data
involved in a real-world enterprise in terms of objects and their relationships and is widely
used to develop an initial database design.

ENTITY RELATIONSHIP DIAGRAM


Entity Relationship Diagrams (ERDs) illustrate the logical structure of databases. Peter Chen
developed ERDs in 1976.

An entity-relationship (ER) diagram is a graphical representation of ER model that illustrates


the interrelationships between entities in a database. ER diagrams often use symbols to
represent three different types of information. Boxes are commonly used to represent entities.
Diamonds are normally used to represent relationships and ovals are used to represent
attributes of entity.

26

BASIC TERMINOLOGIES RELATED TO ER-MODEL:


1. Entity:
An entity is an object that exists and which is distinguishable from other objects. An
entity may be an object with a physical existence (For example a particular person,
car, house, or employee) or it may be an object with a conceptual existence (for
example a company, job, or university course). The graphical representation of entity
is a rectangle: for example person and city both are the entities in above ER diagram.

Example:
Person: STUDENT, EMPLOYEE, CLIENT
Object: COUCH, AIRPLANE, MACHINE
Place: CITY, NATIONAL PARK, ROOM, WAREHOUSE
2. Entity Type and Entity Set:
An entity type defines a collection of entities that have same attributes. An entity
instance is a single item in this collection. An entity set is a set of entity instances i.e.
a collection of similar entities.
Example: STUDENT is an entity type; a student with ID number 555-55-5555 is an entity
instance; and a collection of all students is an entity set.

27

In the above figure there are two entities types named employee and company with the list of
attributes.
Entity set:
The collection of all entities of a particular entity type in the database at any point of time is
called an entity set or extension of the entity type.
3. Attributes:
Attribute names (or simply attributes) are properties of entity types. An attribute is a
property or characteristic of an entity type that is of interest to an organization. Some
attributes of common entity types include the following:
Example:
STUDENT = {Student ID, Name, Address, Phone, Email, DOB}
ORDER = {Order ID, Date of Order, Amount of Order}
ACCOUNT = {Account Number, Account Type, Date Opened, Balance}
CITY = {City Name, State, Population}
Types of Attributes:
(i)

Simple and Composite Attributes:


Simple attributes:
A simple or an atomic attribute, are attributes that cannot be
further divided into smaller components. E.g. City, State,
Employee Id etc.
Composite Attribute:
A composite attribute, however, can be divided into smaller subparts in which
each subpart represents an independent attribute. E.g. Name and Address.

(ii)

Single-Valued and Multi-Valued Attributes:


Single-Valued Attribute:
Most attributes have a single value for an entity instance; such attributes are
called single-valued attributes. E.g. Roll No, DOB, Age etc.
Multi Valued attributes:
A multi-valued attribute, on the other hand, may have more than one value
for an entity instance. E.g: Languages, which stores the names of the
languages that a student speaks. Since a student may speak several
languages, it is a multi-valued attribute. Such attributes are represented by
double ovals in ER diagram

28

(iii)

Stored and Derived Attributes:


Stored attributes:
The stored attribute are such attributes which are already stored in the database
and from which the value of another attribute is derived is called stored attribute.
For example, age of a person can be calculated from persons date of birth and
present date. Difference between these two dates gives the value of age. In this
case, date of birth is a stored attribute and age of the person is the derived
attribute.
Derived attributes:
The derived attributes are such attributes for which the value is derived or
calculated from stored attributes. For example date of birth of an
employee is the stored attribute but the age is the derived
attributed. Derived attributes are usually created by a formula or by
a summary operation on other attributes. The value of a derived
attribute can be determined by analyzing other attributes. For
example, Age is a derived attribute because its value can be derived from the
current date and the attribute Date of Birth. An attribute whose value cannot be
derived from the values of other attributes is called a stored attribute.

(iv)

Primary Key Attribute:


A key attribute (or identifier) is an attribute which uniquely
identifies an individual instance of an entity type. No two instances
within an entity set can have the same key attribute value. For the
STUDENT entity Student ID is the key attribute since each student identification
number is unique. Name, by contrast, cannot be an identifier because two students
can have the same name. Sometimes no single attribute can uniquely identify an
instance of an entity type. However, in these circumstances, we identify a set of
attributes that, when combined, is unique for each entity instance. In this case the
key attribute, also known as composite key, is not a simple attribute, but a
composite attribute that uniquely identifies each entity instance.

4. Relationships:
It represents an association among two or more entities. E.g. association between
teacher and student.
Relationship set and Relationship Type:
A relationship set is a grouping of all matching relationship
instances, and the term relationship type refers to the
relationship between entity types.

5. Degree of a Relationship:
29

The number of entity sets that participate in a relationship is called the degree of
relationship. The three most common degrees of a relationship in a database are unary
(degree 1), binary (degree 2), and ternary (degree 3).

a) Unary Relationship:
A unary relationship R is an
association between two instances of
the same entity type. This type of
relationship is called a recursive
relationship. For example, Employee
reports to Employee.

b) Binary Relationship:
A binary relationship R is an
association between two instances of
two different entity types. For example,
in college, a binary relationship exists
between a student (STUDENT entity)
and an instructor (FACULTY entity) of a single class; an instructor teaches a student.

c) Ternary Relationship:
A ternary relationship R is an association between three instances of three different
entity types. For example, consider a student using certain equipment for a project. In
this case, the STUDENT, PROJECT, and EQUIPMENT entity types relate to each
other with ternary relationships: a student checks out equipment for a project.

6. Role Name:
30

The role name signifies the role that a participating entity plays in each relationship
instance and helps to explain what the relationship means. For example, In the
WORKS_FOR relationship type, EMPLOYEE plays the role of employee or worker
and DEPARTMENT plays the role of department or employee.
Constraints on Relationship Types:
Relationship types usually have certain constraints that limit the possible combinations of
entities that may participate in the corresponding relationship set. For example, If the
company has a rule that each employee must work for exactly one department then we would
like to describe the constraints in the schema. Two types of relationship constraints:
1) Cardinality ratio
2) Participation
7. Cardinality of a Relationship:
The term cardinal number refers to the number used in counting. The cardinality of
relationship represents the minimum/maximum number of instances of entity A that
must/can be associated with any instance of entity B.
Types of Cardinality:

a. One-to-One Relationship:
In a one-to-one relationship, at most one instance of entity A can be associated with a
given instance of another entity B and vice versa. Ex:
1. One Person is married to one person only.
2. Manager manages one Department.
3. One teacher teaches one students.
b. One-to-Many Relationship:
In a one-to-many relationship, many instances of entity B can be associated with a given
instance of entity A. However, only one instance of entity A can be associated with a given
instance of entity B. Ex:
1. One Employee works on many projects.
31

2. One Faculty teaches many students.


c. Many-to-Many Relationship:
In a many-to-many relationship, many instances of entity A can be associated with a given
instance of entity B, and, likewise, many instances of entity B can be associated with a
given instance of entity A. Ex:
1. Many employees work on various projects.

Notations/Representations for cardinalities

8. Participation Constraints:
The participation constraint specifies whether the existence of an entity depends on it
being related to another entity via the relationship type.

There are two types of participation constraints:


1) Total Participation:
Example: If a company policy states that every employee must work for a department,
then an employee entity can exist only if it participates in a WORKS_FOR
relationship instance.
Thus the participation of EMPLOYEE in WORKS_FOR is called total participation
meaning that every entity in the total set of employee entities must be related to a
department entity via WORKS_FOR.
2) Partial Participation:
On the other hand we do not expect every employee to manage a department so the
participation of Employee in the Manages relationship type is Partial meaning that
32

some or part of the set of employee entity are related to a department entity via
MANAGES, but not necessarily all.
SYMBOLS USED IN E-R DIAGRAM:

STRONG & WEAK ENTITY SET:


1) Weak entity set:
An entity set that does not possess sufficient attributes to form a primary key is
called a weak entity set.
2) Strong entity set:
An entity set that has a primary key is called a strong entity set.
For Ex: The entity set TRANSACTION has 3 attributes:
I.

Transaction number.
33

II.
III.

Transaction date.
Transaction amount.

Though each transaction is distinct but different transactions on different accounts could
share the same number. Thus, this entity does not have a primary key. Thus transaction is a
weak entity set.
A member of a strong entity set is a dominant entity. A member of a weak entity set is a
subordinate entity. A weak entity set does not have a primary key, but we need a means of
distinguishing among the entities. The discriminator of a weak entity set is a set of
attributes that allows this distinction to be made.
So --------The primary key of a weak entity set is formed by taking the primary key of
the strong entity set on which the existence of weak entity depends plus weak entity sets
discriminator.
For Ex:
Consider the following:

In this example, (Loan-no & Payment no) acts as primary key for payment entity set. The
relationship between weak and strong entity set is called an Identifying Relationship.

CONCEPT OF SUBCLASS AND SUPERCLASS IN ER MODEL:


An entity type is used to represent both a type of entity and the entity set or collection of
entities of that type that exist in the database.
For example, the entity type EMPLOYEE describes the type (i.e. attributes and relationships)
of each employee entity and also refers to the current set of employee entities in the
COMPANY database.
In many cases, an entity type has numerous sub groupings of its entities that are meaningful
and need to be represented explicitly because of their significance to the database application.
For example, the entities that are members of the EMPLOYEE entity type may be grouped
34

further
into
SECRETARY,
SALARIED_EMPLOYEE and so on.

ENGINEER,

MANAGER,

TECHNICIAN,

The set of entities in the latter groupings is a subset of ENTITIES that belong to the
EMPLOYEE entity set, meaning that every entity that is a member of one of these sub
groupings is also an employee. We call each of these sub groupings a subclass of the
EMPLOYEE entity type and the EMPLOYEE entity type is called the super class for each of
these subclasses.
We call the relationship between a subclass and a super class as a super class/ subclass
relationship.
GENERALIZATION, SPECIALIZATION AND AGGREGATION:
1. Specialization:
It is the process of defining a set of subclasses or subsets of a super-class.
The set of subclasses is based upon some distinguishing characteristics of the entity in
the super class.
It is a top down process of defining super classes and their related subclasses.
Example: {SECRETARY, ENGINEER, TECHNICIAN} is a specialization of
EMPLOYEE based upon job type.

Another specialization of EMPLOYEE based in method of pay is


{SALARIED_EMPLOYEE, HOURLY_EMPLOYEE, PART-TIME_EMPLOYEE}.

2. Generalization:
The reverse of the specialization process is a generalization.
Generalization refers to the process of identifying some common characteristics of a
collection of entity sets and creating a new entity set that contains entities processing
these common features.
Several classes with common features are generalized into a super class; original
classes become its subclasses.
Example: CAR, TRUCK generalized into VEHICLE; both CAR and TRUCK become
subclasses of the super class VEHICLE.
We can view {CAR, TRUCK} as a specialization of VEHICLE
35

Alternatively, we can view VEHICLE as a generalization of CAR and TRUCK


It is a bottom-up design process which combines a number of entity sets that share the
same features into a higher-level entity set.

OR

3. Aggregation:
Aggregation is the abstraction concept for building a higher level object by compiling
information on an object.
One disadvantage of the ER model is that it cannot express relationships among
relationships. So this problem is overcome here.
Aggregation shows a has-a or is part-of relationship between entity types where one
represents the whole and other represents the part.
There are cases where this concept can be related:
36

1) The situations in which we aggregate attribute values of an object to form the whole
object.
2) When we represent an aggregation relationship as an ordinary relationship.
We call this relationship between the primitive object and their aggregate object
IS_A_PART_OF or inverse is called IS_A_COMPONENT_OF.

(i) E-R Diagram without Aggregation

(ii) E-R Diagram with Aggregation

SIGNIFICANCE OF ER MODEL:
1. An ER model maps well to the relational model i.e. it can be easily transformed into
tables.
2. An ER model can be easily used by the database designer to communicate the design
to the end user.
3. An ER model can be used as a design plan to implement a data model in DBMS
software.
ENTITY v/s ATTRIBUTE:
Sometimes it may not be clear whether a property should be modeled as an attribute or as an
entity. For Ex: consider adding address to the employee entity. Now, one option is to use an
attribute address but it is only appropriate if we need to record only one address per
employee. Another alternative is to create an entity address and to record associations
between employees and addresses using any relationship. This complex alternative is
necessary in two situations:
1. We have to record more than one address for an employee.

37

2. We want to capture the structure of an address in our ER diagram. For Ex: we might
break down an address into city, state, country and zip code. By representing an
address as an entity with these attributes, we can support queries such as Find all
employees with an address in New Delhi.
EXTRA PART:
Q: WHEN NOT TO USE A DBMS:

38

Q: DATA ABSTRACTION:
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. Since many
database-systems users are not computer trained, developers hide the complexity from users
through several levels of abstraction, to simplify users interactions with the system:

Physical Level: The lowest level of abstraction describes how the data are actually
stored. The physical level describes complex low-level data structures in detail.

Logical Level: The next-higher level of abstraction describes what data are stored in
the database, and what relationships exist among those data. The logical level thus
describes the entire database in terms of a small number of relatively simple
structures. Although implementation of the simple structures at the logical level may
involve complex physical-level structures, the user of the logical level does not need
to be aware of this complexity. Database administrators, who must decide what
information to keep in the database, use the logical level of abstraction.

The three levels of data abstraction

View Level: The highest level of abstraction describes only part of the entire
database. Even though the logical level uses simpler structures, complexity remains
because of the variety of information stored in a large database. Many users of the
database system do not need all this information; instead, they need to access only a
part of the database. The view level of abstraction exists to simplify their interaction
with the system. The system may provide many views for the same database.

39

40

Vous aimerez peut-être aussi