Vous êtes sur la page 1sur 23

DBMS Lecture Note

Prepared By:
Milan Vachhani
Assistant Professor, MCA Department,
B. H. Gardi College of Engineering and Technology, Rajkot
M 9898626213
milan.vachhani@gmail.com
http://milanvachhani.blogspot.com
&
Research Scholar, Department of Computer Science,
Saurashtra University, Rajkot

Unit -1
What is Data?
Data is a collection of facts from which conclusion may be drawn.
In computer science, data is anything in a form suitable for use with a computer. Data is
often distinguished from programs. A program is a set of instructions that detail a task for the
computer to perform. In this sense, data is thus everything that is not program code.
What is Database?
A database is a collection of data that is organized so that its contents can easily be accessed,
managed, and updated.
A database is a collection of data, typically describing the activities of one or more related
organizations.
Database is a structured collection of records or data that is stored in a computer system.

Tuples / Records

Entity/ Table

Name
Raj
Deepak
Vijay
Dhaval

Attributes / Fields
Address
City
PIN
Lakhsminagar
Rajkot
360001
Rudapark
Baroda
524413
Kalawad Road
Ah.bad
985542
Punit society
Surat
254412
.
.
.

[1]

Mobile
9898756214
9847562399
9984325678
9945678835

Table
Entity
Attributes
Fields
Tuples
Records

A table is a collection of data arrange in row and column format. A database may
contain one or more tables.
An Entity the distinguishable objects of real world.
E.g. Student, Customer, Employee..etc.
An attributes are the set of properties processed by an entity.
E.g. Name, Address, City, Mobile.etc.
The title of the column that holds a specific type of data is known as field. A table
can have maximum 255 fields.
Each record row in a table is tuple.
The collection of data horizontally for each field is known as record. A record is
complete information about an entity.

Database,
Database Management System (DBMS),
Database Systems

A database-management system (DBMS) is a collection of interrelated data and a set of


programs to access those data. The collection of data, usually referred to as the database,
contains information relevant to an enterprise.
The primary goal of a DBMS is to provide a way to store and retrieve database information
that is both convenient and efficient.
Database systems are designed to manage large bodies of information. Management of
data involves both defining structures for storage of information and providing mechanisms
for the manipulation of information. In addition, the database system must ensure the
safety of the information stored, despite system crashes or attempts at unauthorized
access. If data are to be shared among several users, the system must avoid possible
abnormal results.
Databases are usually designed to manage large bodies of information. This involves
Definition of structures for information storage (data modeling).
Provision of mechanisms for the manipulation of information (file and systems
structure, query processing).
Providing for the safety of information in the database (crash recovery and security).
Concurrency control if the system is shared by users.
Because information is so important in most organizations, computer scientists have
developed a large body of concepts and techniques for managing data.

[2]

Database-System Applications
Some representative applications are:
Enterprise Information
Sales: For customer, product, and purchase information.
Accounting: For payments, receipts, account balances, assets and other accounting
information.
Human resources: For information about employees, salaries, payroll taxes, and
benefits, and for generation of paychecks.
Manufacturing: For management of the supply chain and for tracking production of
items in factories, inventories of items in warehouses and stores, and orders for items.
Online retailers: For sales data noted above plus online order tracking, generation of
recommendation lists, and maintenance of online product evaluations.
Banking and Finance
Banking: For customer information, accounts, loans, and banking transactions.
Credit card transactions: For purchases on credit cards and generation of monthly
statements.
Finance: For storing information about holdings, sales, and purchases of financial
instruments such as stocks and bonds; also for storing real-time market data to enable
online trading by customers and automated trading by the firm.
Universities: For student information, course registrations, and grades (in addition to standard
enterprise information such as human resources and accounting).
Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner.
Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.

[3]

Purpose of DBMS
In the early days, database applications were built directly on top of file systems
Typical file-processing system is supported by a conventional operating system.
The system stores permanent records in various files, and it needs different application
programs to extract records from, and add records to, the appropriate files. Before database
management systems (DBMSs) were introduced, organizations usually stored information in
such systems.
Keeping organizational information in a file-processing system has a number of major
disadvantages:
1) Data redundancy and inconsistency
Same information is stored in multiple files, so duplication of information occurs.
Changes should be done in all copies of file otherwise data become inconsistent.
2) Difficulty in accessing data
Need to write a new program to carry out each new task.
The point here is that conventional file-processing environments do not allow needed
data to be retrieved in a convenient and efficient manner. More responsive dataretrieval systems are required for general use.
3) Data isolation multiple files and formats
Because data are scattered in various files, and files may be in different formats,
writing new application programs to retrieve the appropriate data is difficult.
4) Integrity problems
Developers enforce these constraints in the system by adding appropriate code in the
various application programs.
Integrity constraints (e.g. account balance > 0) become buried in program code
rather than being stated explicitly
However, when new constraints are added, it is difficult to change the programs to
enforce them. The problem is compounded when constraints involve several data
items from different files.
5) Atomicity of updates
Failures may leave database in an inconsistent state with partial updates carried out
Example: Transfer of funds from one account to another should either complete or not
happen at all
It must happen in its entirety or not at all. It is difficult to ensure atomicity in a
conventional file-processing system.
6) Concurrent access by multiple users
Concurrent accessed needed for performance
Uncontrolled concurrent accesses can lead to inconsistencies
Example: Two people reading a balance and updating it at the same time

[4]

7) Security problems
Not every user of the database system should be able to access all the data.
For example, in a university, payroll personnel need to see only that part of the
database that has financial information. They do not need access to information about
academic records.
In file-processing system enforcing such security constraints is difficult.
These difficulties, among others, prompted the development of database systems.
In what follows, we shall see the concepts and algorithms that enable database systems to
solve the problems with file-processing systems

Advantages of Database Management System


Database Management System (DBMS) aids in storage, control, manipulation and retrieval of
data. Following are the advantages of database management systems.
Database is a software program, used to store, delete, update and retrieve data.
A database can be limited to a single desktop computer or can be stored in large server
machines, like the IBM Mainframe. There are various database management systems
available in the market. Some of them are Sybase, Microsoft SQL Server, Oracle RDBMS,
PostgreSQL, MySQL, etc.
1) Warehouse of Information :The database management systems are warehouses of information, where large amount of
data can be stored. The common examples in commercial applications are inventory data,
personnel data, etc.
It often happens that a common man uses a database management system, without even
realizing, that it is being used. The best examples for the same would be the address book of
a cell phone, digital diaries, etc. Both these equipments store data in their internal database.

[5]

2) Defining Attributes :The unique data field in a table is assigned a primary key. The primary key helps in the
identification of data. It also checks for duplicates within the same table, thereby reducing
data redundancy.
There are tables, which have a secondary key in addition to the primary key. The secondary
key is also called 'foreign key'. The secondary key refers to the primary key of another table,
thus establishing a relationship between the two tables.
3) Systematic Storage :The data is stored in the form of tables. The table consists of rows and columns. The primary
and secondary key helps to eliminate data redundancy, enabling systematic storage of data.
4) Changes to schema :The table schema can be changed and it is not platform dependent. Therefore, the tables in
the system can be edited to add new columns and rows without hampering the applications,
which depend on that particular database.
5) No Language Dependence :The database management systems are not language dependent. Therefore, they can be
used with various languages and on various platforms.
6) Table Joins :The data in two or more tables can be integrated into a single table. This enables to reduce
the size of the database and also helps in easy retrieval of data.
7) Multiple Simultaneous Usage :The database can be used simultaneously by a number of users. Various users can retrieve
the same data simultaneously. The data in the database can also be modified, based on the
privileges assigned to users.
8) Data Security :Data is the most important asset. Therefore, there is a need for data security. Database
management systems help to keep the data secured.
9) Privileges :Different privileges can be given to different users. For example, some users can edit the
database, but are not allowed to delete the contents of the database.
10) Abstract View of Data and Easy Retrieval :DBMS enables easy and convenient retrieval of data. A database user can view only the
abstract form of data; the complexities of the internal structure of the database are hidden
from him. The data fetched is in user friendly format.

[6]

11) Data Consistency :Data consistency ensures a consistent view of data to every user. It includes the accuracy,
validity and integrity of related data. The data in the database must satisfy certain
consistency constraints, for example, the age of a candidate appearing for an exam should
be of number datatype and in the range of 20-25. When the database is updated, these
constraints are checked by the database systems.

View of Data
A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data.
A major purpose of a database system is to provide users with an abstract view of the data. That is,
the system hides certain details of how the data are stored and maintained.

Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. Since many databasesystem users are not computer trained, developers hide the complexity from users through several
levels of abstraction, to simplify users interactions with the system:
Physical level
The lowest level of abstraction describes how the data are actually stored. The physical
level describes complex low-level data structures in detail.
Logical level
The next-higher level of abstraction describes what data are stored in the database, and
what relationships exist among those data. The logical level thus describes the entire
database in terms of a small number of relatively simple structures.
Although implementation of the simple structures at the logical level may involve
complex physical-level structures, the user of the logical level does not need to be aware
of this complexity. This is referred to as physical data independence.
Database administrators, who must decide what information to keep in the database, use
the logical level of abstraction.
View level
The highest level of abstraction describes only part of the entire database. Even though
the logical level uses simpler structures, complexity remains because of the variety of
information stored in a large database.

[7]

Many users of the database system do not need all this information; instead, they need
to access only a part of the database. The view level of abstraction exists to simplify their
interaction with the system. The system may provide many views for the same database.
Following figure shows the relationship among the three levels of abstraction.

Instances and Schemas


Instance
Databases change over time as information is inserted and deleted. The collection of
information stored in the database at a particular moment is called an instance of the
database.
Schema
The overall design of the database is called the database schema.
Schemas are changed infrequently, if at all.
Example:
The concept of database schemas and instances can be understood by correspondence
to a program written in a programming language. A database schema corresponds to the
variable declarations (along with associated type definitions) in a program. Each variable
has a particular value at a given instant. The values of the variables in a program at a
point in time correspond to an instance of a database schema.
Database systems have several schemas, partitioned according to the levels of abstraction.
The physical schema describes the database design at the physical level, while the logical
schema describes the database design at the logical level.
A database may also have several schemas at the view level, sometimes called sub schemas,
that describe different views of the database.
Of these, the logical schema is by far the most important, in terms of its effect on application
programs, since programmers construct applications by using the logical schema. The

[8]

physical schema is hidden beneath the logical schema, and can usually be changed easily
without affecting application programs. Application programs are said to exhibit physical
data independence if they do not depend on the physical schema, and thus need not be
rewritten if the physical schema changes.

Data Models
A Collection of tools for describing :
Data
Data Relationship
Add Semantics
Data Constraints
Relational Model
Entity Relationship Model (for Database Design)
Object base Data Model (for Object-Oriented)
Semistructured Data Model (XML)
Other Older Models :
Network Model
Hierarchical Model
1) Relational Data Model
This Model uses a collection of tables to represent both data and the relationship
among those data.
Each table has multiple columns and each column has a unique name.
It is an example of Record-Base a model.
Database is structured in fix-format records of several types.
This is the most widely used Data Model

Tuples / Records

Entity/ Table

Name
Raj
Deepak
Vijay
Dhaval

Attributes / Fields
Address
City
PIN
Mobile
Lakhsminagar
Rajkot
360001
9898756214
Rudapark
Baroda
524413
9847562399
Kalawad Road
Ah.bad
985542
9984325678
Punit society
Surat
254412
9945678835
.
.
.
Example of tabular Data in Relational Model

[9]

Table
Entity
Attributes
Fields
Tuples
Records

A table is a collection of data arrange in row and column format. A database may
contain one or more tables.
An Entity the distinguishable objects of real world.
E.g. Student, Customer, Employee..etc.
An attributes are the set of properties processed by an entity.
E.g.:- Name, Address, City, Mobile.etc.
The title of the column that holds a specific type of data is known as field. A table
can have maximum 255 fields.
Each record row in a table is tuple.
The collection of data horizontally for each field is known as record. A record is
complete information about an entity.

Example :-

[10]

2) Entity Relationship Model


It is based on the real world that consists of a collection of Basic Object called Entity.
An Entity is a Thing or Object in the real world that distinguishable from other objects.
For example a person is an entity and bank account can be considered as an entity.
Entities are described in database by a set of Attributes.
A Relationship is an association among several entities.

Entity

Attribute

Relation

Flow of Relationship

A simple E-R Diagram for Customer and Account Relationship

3) Object Data Model


It can be seen an extending the (E-R) Model with notations of:
Encapsulation
Methods (Functions) and
Object identity
Combines the features of Object-Oriented Data Model and Relational Data Model.

[11]

4) Network Model
This model organizes data using two fundamental constructs, called records and sets.
Records contain fields, and sets define one-to-many relationships between records: one
owner, many members.

Access to the database was not via SQL query strings, but by a specific set of APIs, typically
for FIND, CREATES, READ, UPDATE and DELETE.
Each API would only access a single table (dataset), so it was not possible to implement a
JOIN which would return data from several tables.
It was not possible to provide a variable WHERE clause. The only selection mechanism
available was

[12]

Read all entries (a full table scans).


Read a single entry using a specific primary key.
read all entries on a child table which were associated with a selected entry on a
parent table
Any further filtering had to be done within the application code.
It was not possible to provide an ORDER BY clause. Data was presented in the order in
which it existed in the database. This mechanism could be tuned by specifying sort criteria
to be used when each record was inserted, but this had several disadvantages:
Only a single sort sequence could be defined for each path (link to a parent), so all
records retrieved on that path would be provided in that sequence.
It could make inserts rather slow when attempting to insert into the middle of a large
collection, or where a table had multiple paths each with its own set of sort criteria.
5) Hierarchical Model
In this model data is organized into a tree-like structure, implying a single upward link in
each record to describe the nesting, and a sort field to keep the records in a particular
order in each same-level list.

A hierarchical database consists of the following:


1. It contains nodes connected by branches.
2. The top node is called the root.
3. If multiple nodes appear at the top level, the nodes are called root segments.
4. The parent of node nx is a node directly above nx and connected to nx by a branch.
5. Each node (with the exception of the root) has exactly one parent.
6. The child of node nx is the node directly below nx and connected to nx by a branch.
7. One parent may have many children.

[13]

By introducing data redundancy, complex network structures can also be represented as


hierarchical databases. This redundancy is eliminated in physical implementation by
including a 'logical child'. The logical child contains no data but uses a set of pointers to
direct the database management system to the physical child in which the data is actually
stored. Associated with a logical child are a physical parent and a logical parent. The logical
parent provides an alternative (and possibly more efficient) path to retrieve logical child
information.

DBMS Architecture

1) Three Levels of Architecture

(A)

External Level
Users view of the database.
Consists of a number of different external views of the Database.
Describes part of the DB for particular group of users.
Provides a powerful and flexible security mechanism by hiding parts of the DB from
certain users. The user is not aware of the existence of any attributes that are
missing from the view.
It permits users to access data in a way that is customize to their needs, so that the
same data can be seen by different users in different ways, at the same time.

[14]

(B)

Conceptual Level
The logical structure of the entire database as seen by DBA.
What data is stored in the database.
The relationships among the data.
Complete view of the data requirements of the organization, independent of any
storage consideration.
Represents:
entities, attributes, relations
constraints on data
semantic information on data
security, integrity information
Supports each external view: any data available to a user must be contained in, or
derivable from the conceptual level.

(C)

Internal Level
Physical representation of the DB on the computer.
How the data is stored in the database.
Physical implementation of the DB to achieve optimal runtime performance and
storage space utilization.
Storage space allocation for data and indexes
Record description for storage
Record placement
Data Compression, encryption

We are now in a position to provide a single picture (below Figure) of the various components
of a database system and the connections among them.
The architecture of a database system is greatly influenced by the underlying computer system
on which the database system runs. Database systems can be centralized, or client-server,
where one server machine executes work on behalf of multiple client machines. Database
systems can also be designed to exploit parallel computer architectures. Distributed databases
span multiple geographically separated machines.

[15]

[16]

2) Two-Tier Architecture
Client manages main business and data processing logic and user interface.
Server manages and controls access to database.

[17]

3) Three-Tier Architecture
Client side presented two problems preventing true scalability:
Fat client, requiring considerable resources on clients computer to run effectively.
Significant client side administration overhead.
By 1995, three layers proposed, each potentially running on a different platform.
User interface layer runs on client.
Business logic and data processing layer middle tier runs on a server (application server).
DBMS stores data required by the middle tier. This tier may be on a separate server
(database server).
Advantages:
Thin client, requiring less expensive hardware.
Application maintenance centralized.
Easier to modify or replace one tier without affecting others.
Separating business logic from database functions makes it easier to implement load
balancing.
Maps quite naturally to Web environment.

[18]

Data Storage and Querying


A database system is partitioned into modules that deal with each of the responsibilities of
the overall system. The functional components of a database system can be broadly divided
into the storage manager and the query processor components.
Storage Manager
The storage manager is important because databases typically require a large amount of
storage space. Since the movement of data to and from disk is slow relative to the speed of
the central processing unit, it is imperative that the database system structure the data so as
to minimize the need to move data between disk and main memory
The storage manager is the component of a database system that provides the interface
between the low-level data stored in the database and the application programs and queries
submitted to the system. The storage manager is responsible for the interaction with the file
manager.
The raw data are stored on the disk using the file system provided by the operating system.
The storage manager translates the various DML statements into low-level file-system
commands.
Thus, the storage manager is responsible for storing, retrieving, and updating data in the
database.
The storage manager components include:
Authorization and integrity manager, which tests for the satisfaction of integrity
constraints and checks the authority of users to access data.
Transaction manager, which ensures that the database remains in a consistent (correct)
state despite system failures, and that concurrent transaction executions proceed
without conflicting.
File manager, which manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
Buffer manager, which is responsible for fetching data from disk storage into main
memory, and deciding what data to cache in main memory. The buffer manager is a
critical part of the database system, since it enables the database to handle data sizes
that are much larger than the size of main memory.

The storage manager implements several data structures as part of the physical system
implementation:

[19]

Data files, which store the database itself.


Data dictionary, which stores metadata about the structure of the database, in
particular the schema of the database.
Indices, which can provide fast access to data items. Like the index in this textbook, a
database index provides pointers to those data items that hold a particular value. For
example, we could use an index to find the instructor record with a particular ID, or all
instructor records with a particular name. Hashing is an alternative to indexing that is
faster in some but not all cases.
The Query Processor
The query processor is important because it helps the database system to simplify and
facilitate access to data. The query processor allows database users to obtain good
performance while being able to work at the view level and not be burdened with
understanding the physical-level details of the implementation of the system.
The query processor components include:
DDL interpreter, which interprets DDL statements and records the definitions in the
data dictionary.
DML compiler, which translates DML statements in a query language into an evaluation
plan consisting of low-level instructions that the query evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans
that all give the same result. The DMLcompiler also performs query optimization; that is,
it picks the lowest cost evaluation plan from among the alternatives.
Query evaluation engine, which executes low-level instructions generated by the DML
compiler.

Components of DBMS
In Previous question we have seen the DBMS Architecture. The components of this architectures are
the components of DBMS.
Following are the some of the components of DBMS.
DBMS external interfaces
Database language engines (or processors)
Query optimizer
Database engine
Storage engine
Transaction engine
DBMS management and operation component

[20]

Data Languages
A database system provides a data-definition language to specify the database schema and
a data-manipulation language to express database queries and updates.
In practice, the data-definition and data-manipulation languages are not two separate
languages; instead they simply form parts of a single database language, such as the widely
used SQL language.
(A) DDL
The Data Definition Language (DDL) is used to create and destroy databases and
database objects. These commands will primarily be used by database administrators
during the setup and removal phases of a database project.
Specific notation for defining the Data schema
Example : Create Table Account ( Acc_No

Char(10), Balance Integer )

DDL compiler generates a set of tables stored in a data dictionary


Data Dictionary contains Metadata (Data about Data)
Database Schema
Data Storage and Definition Language
Specifies the storage structure and access methods used
Integrity Constraints
Domain Constraints
Referential Integrity
Assertion
Authorization
(B) DML
Language for accessing and manipulating the data organized by the appropriate data
model
DML also known as query language
Data Manipulation is:
retrieval of information from the database
insertion of new information into the database
deletion of information in the database
modification of information in the database
There are basically two types:
Procedural DML:
What Data is Required?
How to get those Data?
Declarative DML (Nonprocedural):
What Data is Required?
Without specifying How to get those Data?

[21]

SQL is the most widely used Query Language.

Dr. E.F.Codds Rules :(1) The Information rule: All data should be in presented in table form.
(2) The Guaranteed Access rule: all data should be accessible without ambiguity.
(3) The Systematic Treatment of Null Values rule: a field should be allowed to remain empty. This
involves the support of null values. Which is distinct form an empty string or a number with a
value of zero.
(4) The Dynamic Online Catalog Based on the Relational Model rule: a relational database must
provide to access to its structure through the same tools that are used to access the data.
(5) The Comprehensive Data Sublanguage rule: the database must support one clearly defined
language that include data definition language, data manipulation, data integrity and database
transaction control.
(6) The View Updating rule: All views of the data which are theoretically updatable must be
updatable in practice by the DBMS.
(7) The High-level Insert, Update, and Delete rule: The capability of handling a base relation or a
derived relation as a single operand applies not only to the retrieval of data but also to the
insertion, update, and deletion of data.
(8) The Physical Data Independence rule: Application programs and terminal activities remain
logically unimpaired whenever any changes are made in either storage representations or
access methods.
(9) The Logical Data Independence rule: how data is viewed should not be changed when the
logical structure of the database changed. This rule is particularly difficult to satisfy.
(10) The Integrity Independence rule: Integrity constraints must be definable in the RDBMS.
(11) The Distribution Independence rule: An RDBMS has distribution independence. Distribution
independence implies that users should not have to be aware of whether a database is
distributed.
(12) The Nonsubversion rule: If the database has any means of handling a single record at a time,
that low-level language must not be able to subvert or avoid the integrity rules which are
expressed in a higher-level language that handles multiple records at a time.

[22]

DBMS v/s RDBMS


DBMS

RDBMS

Data handle as a File oriented system

Data handle as in form of table.


RDBMS is based on relational model, in which
data is represented in the form of relations, with
enforced relationships between the tables.
DBMS does not support the client server Most of the RDBMS support the client server
Architecture
architecture
There is no security of data
There are multiple level of security
DBMS does not impose any constraints or RDBMS defines the integrity constraint for the
security with regard to data manipulation. It is purpose of holding ACID PROPERTY
user or the programmer responsibility to ensure
the ACID PROPERTY of the database
In DBMS Normalization process will not be In RDBMS, normalization process will be present
present
to check the database table cosistency
In dbms no relationship concept
It is used to establish the relationship concept
between two database objects, i.e, tables
DBMS doesnt provide any recovery of data.
RDBMS helps in recovery of the database in case
of loss of data
DBMS may satisfy less than 7 rule of Dr. E. F. RDMS satisfy more than 7 rule of Dr. E. F. Codd
Codd
Eg: Foxpro
Eg: oracle, SQL..,

[23]