Vous êtes sur la page 1sur 38

MCA 327, Distributed Data Base Management System

Distributed Data Base


Management System
UNIT-1

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1.
u1.1

Learning Objective
• Distributed DBMS features and needs,
• Reference Architecture, Levels of Distribution
• Transparency, Replication, Distributed
database design – Fragmentation, allocation
criteria,
• Storage mechanisms, Translation of Global
Queries / Global Query Optimization, Query
• Execution and access plan

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 2

A Centralized Database

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 3

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

A Centralized DBMS on a Network

Centralized DBMSs
in which all of the
data is maintained at
a single site given as
in figure.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 4

Disadvantage of Centralized Database

Disadvantages:
• Single Point of failure
• Performance Bottleneck
• Contention- Competition for resources
It is a situation where two or more nodes
attempt to transmit a message across the
same wire at the same time, Contention
(term) is used especially in Networks

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 5

Parallel Database Architecture


• Three main architectures have been
proposed for building
parallel DBMSs.
• In a shared-memory system, multiple CPUs
are attached to an interconnection network
and can access a common region of main
memory.
• In a shared-disk system, each CPU has a
private memory and direct access to all
disks through an interconnection network.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 6

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Parallel Database Architecture


• In a shared-nothing system, each CPU has
local main memory and disk space, but no
two CPUs can access the same storage
area; all communication between CPUs is
through a network connection.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 7

Parallel Database Architecture

Interconnection Network P P P Memory Memory Memory

Interconnection Network P P P
P P P

Memory Memory Memory Global Shared Memory Interconnection Network

disk disk disk disk disk disk disk disk disk

(c).Shared Nothing (a).Shared Memory (b).Shared Disk

The best architecture


for parallel DBMSs

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 8

Parallel Database Architecture


1. The basic problem with the shared-
memory and shared-disk architectures is
interference.
2. As more CPUs are added, existing CPUs
are slowed down because of the
increased contention for memory
accesses and network bandwidth.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 9

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

• It has been shown that:


An average of 1% slowdown per additional CPU
limits the maximum speed-up to a factor of 37.
Adding additional CPUs actually slows down the
system.
A system with 1000 CPUs is only 4% as effective
as a single CPU.
• These observations motivated the development of
the shared-nothing architecture for large parallel
database systems.
Advantages of Shared Nothing:: Linear speed up and
Linear Scale up.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 10

Distributed Database System

• In a distributed database system, data is


physically stored across several sites, and
each site is typically managed by a DBMS
capable of running independent of the
other sites.
• The location of the data items and the
degree of autonomy of the individual sites
have a significant impact on all aspects of
the system, including query processing
and optimization, concurrency control, and
recovery.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 11

Distributed Database System


• In contrast to parallel database systems, the
distribution of data is governed by factors
such as local ownership and increased
availability, in addition to performance related
issues.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 12

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Distributed DBMS
A distributed
database (DDB) is
a collection of
multiple, logically
interrelated
databases
distributed over a
computer network.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 13

Distributed Database System

‘Union’ of two opposed technologies

Database Network
Technology Technology

Centralize Distribute

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 14

Distributed DBMS
Distributed Database-A logically interrelated
collection of shared data (and a description
of this data), physically distributed over a
computer network.

DDBMS - Software system that permits the


management of the distributed database
and makes the distribution transparent to
users.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 15

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

What is distributed …
• Processing logic- Processing logic/
Processing elements are distributed
-Inventory
-Personnel
-Sales
• Functions
Functions of a system could be delegated to
various pieces of hardware/ software
-Printing
-Email

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 16

What is distributed …
• Data
Data used by a no. of applications may be
distributed to a no. processing sites.
• Control

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 17

Components of DDBMS
Components of DDBMS are:
DB-Database Management component
DC-Data Communication Component
DD-Data Dictionary (determines about the
distribution of data in Network)
DDB-Distributed database component. (This
component manages all the above
components).

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 18

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Components of DDBMS are:


T T T
T

DB DC
Local DDB
Database 1 DD

Site 1

Site 2
DD
Local DDB
Database 2
DB DC

T T T T

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 19

Distributed DBMS Application

Sydney Perth

Sydney Data
Perth Data
Perth Data

Communications
Network
Sydney Data
Brisbane Data
Darwin Data

Darwin Brisbane

A Distributed Application

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 20

A Distributed multi-branch banking


System-Example
Branch
Teller Minicomputer
Terminals
Local Local
Teller Database Database
Terminals

Branch Branch
Minicomputer Minicomputer

Central
Database

Automatic
Local Teller
Database Terminals
Accounting
Terminals
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 21

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Features of Centralized Vs distributed


Databases

• Centralized Vs Hierarchical Control


Site Autonomy
• Data Independence Vs Distribution
Transparency
Actual Organization of data is transparent to
the application programmer
• Reduced Redundancy Vs Redundancy
Increased locality of Applications
Site Failure-

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 22

Features of Centralized Vs distributed


Databases
• Indexing, Chaining Vs Distributed Access
Plan
The Execution of programs which are local at
single site and the transmission of files
between sites.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 23

Advantages of DDBMS
1) Improved Performance - data located near
site.
2) Improved Availability - node failure will not
make system inoperable.
3) Improved Reliability - replicated data allows
data accessibility.
4)Organisational structure - many organizations
cover several sites.

5)Shareability and local autonomy - users at


different sites can share.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 24

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Business Advantages
Business Advantages
1) Economics
Several smaller computer may be cheaper
than a Mainframe system
2) Modular Growth-easier expansion
3) Integration
Allows for combining of several legacy
databases into one DBMS.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 25

Disadvantages of DDBMS
• Complexity- more complex than centralized
• Cost - added network and maintenance
costs
• Security - network must be made secure
• Integrity control more difficult
• Lack of standards
• Lack of experience- no tools or
methodologies
• Database design more complex
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 26

Reference Architecture
Distributed database facilitate distribution of
data across vast geographical spread.
Distributed database is a collection of various
database sites which are mapped as a single
global database.
Some levels may be missing, depending on
levels of transparency supported.
• Can be homogeneous or heterogeneous

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 27

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Reference Architecture
Global schema

Fragmentation schema

Allocation schema

Local mapping schema Local mapping schema Local mapping schema

Local schema Local schema Local schema

DB site 1 DB site 2 DB site N

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 28

Reference Architecture
1.Global schema defines all the data which
are contained in the distributed data base as if
the database were not distributed at all, or in
short global schema defines data as a whole.

GlobalSchema:Employee(EmpNo,Ename,Dept)

2.The Next layer is the Fragmentation


Schema specifying the way in which the global

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 29

Reference Architecture
relations are fragmented to serve the
purpose of distribution.

Fragmentation Schema:
Employee1=SLDept=‘Mgr’ Employee

Employee2=SLDept=‘Sales’ Employee

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 30

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Reference Architecture
3.Below the fragmentation schema exists the
allocation schema determining the sites on
which any particular fragment is to be
deployed.

Allocation Schema: Employee1 at site1,2


Employee2 at site3,4
4.The subsequent layers exists on the local
data base sites.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 31

Reference Architecture
(a) The first layer at the local database site is
the local mapping schema which helps in
identifying the global relation schema for any
local database relation schema. It is the local
mapping schema which facilitates the
integration of local database sites into one
single global database.
(b) Below this layer is the local schema of the
local DBMS. It is very much similar to the
three schema architecture of the centralized
data bases.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 32

Classification of DDBMS

• Homogeneous- All servers use same DBMS


• Heterogeneous – All servers use different DBMS

Examples of typical applications:

Type of DBMS LAN network WAN network

Data management Travel management


Homogenous and financial and financial
applications applications

Integrated banking
Heterogeneous Inter-divisional and inter-banking
information
systems systems

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 33

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

DDBMS
• Distributed database systems have been around since the
mid-1980s. As you might expect, a variety of distributed
database options exist. The diagram below shows the basic
distributed database environments.
Distributed database environments

Homogeneous Heterogeneous

Autonomous Non-autonomous Systems Gateways

Full DBMS functionality Partial-Multidatabase

Federated Unfederated

Loose integration Tight integration

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 34

Types of DDBMS
Homogeneous – Same DBMS is used at
each site.
Autonomous – Each DBMS works
independently, passing messages back
and forth to share data updates.
Non-Autonomous – A central, or master,
DBMS coordinates database access and
updates across the sites.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 35

Types of DDBMS

Heterogeneous – Potentially different DBMSs


are used at each site.
Systems – support some or all of the
functionality of one logical database.
Full DBMS functionality – supports all of
the functionality of a distributed
database.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 36

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Types of DDBMS
Partial-Multidatabase – supports some of the features of a
distributed database.
Federated – supports local databases for unique data
requests.
Loose integration – many schemas exist: each local
database and each local DBMS must communicate with
all local schemas.
Tight integration – one global schema exists that defines
all the data across all local databases.
Unfederated – requires all access to go through a central
coordinating module.
Gateways – simple paths are created to other databases, without the
benefits of one logical database.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 37

A Homogenous Distributed Database

• A typical homogeneous distributed


database environment is illustrated on the
following page.
• This environment is typically defined by the
following characteristics:
Data are distributed across all the nodes.
The same DBMS is used at each
location.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 38

A Homogenous Distributed Database

All data are managed by the distributed


DBMS. There is no exclusively local data.
All users access the database through one
global schema or database definition.
The global schema is simply the union of
all the local database schemas.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 39

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Homogenous Distributed Database System

Global
Global
user user

Global
schema
Distributed DBMS

Node 1 2 3 n

DBMS DBMS DBMS DBMS


Software Software Software Software

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 40

A Heterogeneous Distributed Database

• It is difficult in most organizations to force


a homogeneous environment, yet
heterogeneous environments are much
more difficult to manage.
• As the diagram illustrates, there are many
variations of heterogeneous distributed
database environments, however; a
typical heterogeneous distributed
database environment is defined by the
following characteristics:
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 41

A Heterogeneous Distributed Database

Data are distributed across all the nodes.


Different DBMSs may used at each
location.
Some users require only local access to
databases, which can be accomplished
using only the local DBMS and schema.
A global schema exists, which allows local
users to access “remote data”.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 42

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Heterogeneous Distributed Database System

Local Global Global Local


user user user user

Global
schema
Distributed DBMS

DBMS-1 DBMS-2 DBMS-3 DBMS-n

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 43

A Heterogeneous Distributed Database


Scenario

Figure 10.8

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 44

Distribution Transparency
• In any distributed system transparency is the
most central issue.
• Base of distributed data base management
system (DDBMS) emphasis‘s that a DDBMS
should work like a non-Distributed DBMS.
• The rule thus insists that the user should not
be aware of the distribution of data.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 45

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Levels of Distribution Transparency

1. Fragmentation Transparency: The user is


not aware of the existence of fragments and
work on global relations.

Update emp set empno=10 where


deptno=15; (level1)

2. Location Transparency: The user is aware


of the fragments but it is not aware of the site
of which they have been deployed.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 46

Levels of Distribution Transparency

Insert into emp1 values(1,’Amit’,20);


Insert into emp2 values(2,’Ajeet’,30);
Delete emp1 where empno=10;
Delete emp2 where empno=10; (level2)

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 47

Local Mapping Transparency


3. Local Mapping Transparency: The user is
aware of the fragments and the sites on
which they have been deployed BUT he is
insulated from the heterogeneity aspects.

emp1: site1 and site 5(In case of update


operation (replication))
emp2: site2 and site 6
emp3: site3 and site 7
emp4: site4 and site 8

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 48

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Local Mapping Transparency

Selecte name,esal,etax into


$ename,$esal,$etax from emp1 at site1
where empno.=10;
• Insert into emp3(empno,ename,deptno) at
site 3: (10,$ename,15);
• Insert into emp3(empno,ename,deptno) at
site 7: (10,$ename,15);
• Insert into emp4(empno,esal,etax) at site 4:
(10,$esal,$tax);

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 49

Data Replication

A relation or fragment of a relation is


replicated if it is stored redundantly in two or
more sites.
The two approach is given below:
• Full replication of a relation is the case
where the relation is stored at all sites.
• Fully redundant databases are those in
which every site contains a copy of the entire
database.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 50

Framework for Distributed Database Design

•Designing the conceptual schema- Which


describes the integrated database. (i.e., all the
data which are used by the database
applications)
• Designing the physical database (i.e.,
mapping the conceptual schema to storage
areas and determining appropriate access
methods)
•Designing the fragmentation
• Designing the allocation of fragments

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 51

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Objectives of data distribution design

•Processing locality-it ensures that those


units of data which are most frequently
accessed by any site are maintained locally as
far as possible.
• Availability - A high degree of availability for
read only applications is achieved by storing
multiple copies of the same information; the
system must be able to switch to an alternative
copy when the one that should be accessed
under normal condition is not available.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 52

Objectives of data distribution design

Reliability-Reliability is also achieved by storing


multiple copies of the same information, since it is
possible to recover from the crashes or from the
physical destruction.

Distribution of workload- Workload distribution is


done in order to take advantage of the different
powers or utilizations of computers at each site.

Storage costs- Storage cost is directly depend on


the how much information is locally required.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 53

Fragmentation

• Fragmentation is the process of


decomposition of global relations into
fragments.
Types of Fragmentation:
• Horizontal Fragmentation
• Derived horizontal Fragmentation
• Vertical Fragmentation
• Hybrid/Mixed

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 54

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Fragmentation
• Horizontal – Subset of rows
• Vertical – Subset of columns
Each fragment must contain primary key
Other columns can be replicated
• Mixed (hybrid) – both horizontal and vertical
• Derived – Derived from the horizontal
fragmentation of another relation.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 55

Fragmentation
e.g.
Natural join first to get additional
information required then fragment Must
be able to reconstruct original table Can
query and update through fragment

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 56

Correctness of Fragmentation

Three Correctness rules:


Completeness,
Reconstruction, and
Disjointness.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 57

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Rules/Correctness of Fragmentation

Completeness: Each data item found in a


relation R will be found in one or more of R’s
fragments R1, R2,….., Rn
Reconstruction: Must be possible to define a
relational operation that will reconstruct R from
the fragments.
Note: Reconstruction for horizontal
fragmentation is Union operation and Join
for vertical .

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 58

Rules/Correctness of Fragmentation

Disjointness
• If data item di appears in fragment Ri, then
it should not appear in any other fragment.
• Exception: vertical fragmentation, where
primary key attributes must be repeated to
allow reconstruction.
• For horizontal fragmentation, data item is
a tuple.
• For vertical fragmentation, data item is an
attribute.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 59

Horizontal Fragmentation
Horizontal fragmentation is based on the
selection operation. Some condition is
chosen and against this condition the tuples
are evaluated only those tuples which
satisfied the condition become the part of
that corresponding fragment.
Example: If there is an organization it may
fragment its global employees relation
horizontally by keeping the records of the
employee belonging to one particular country
in a separate horizontal fragment.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 60

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Horizontal Fragmentation
Condition can be
C1=country_name=“INDIA”
C2=country_name=“United States”
.
.
.
.CN=country_name=“Srilanka”

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 61

Horizontal Fragmentation
e.g. Let’s a global relation (table) Supplier.
Supplier (SNum,Name,City)
Then the horizontal fragmentation can be
defined as following:
Supplier1=SL city=“sf” Supplier
Supplier2=SL city=“la” Supplier

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 62

Horizontal Fragmentation
Completeness (The above fragmentation
satisfies the completeness condition if “sf”
and “la” are the only possible values of the
City attribute , otherwise we would not know
to which fragment the tuples with other City
Values belong.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 63

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Horizontal Fragmentation
Reconstruction (It is always possible to reconstruct
the Supplier global relation by using Union
operation )
Supplier=Supplier1 UN Supplier2

Disjointness (Call the predicate which is used in the


selection operation which define a fragment’s
qualification and qualification be mutually exclusive)
Q1=City=“sf”
Q2=City=“la”

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 64

Horizontal Fragmentation-example

Another Example

PERSON
NAME ADDRESS PHONE SAL
John Smith 44, Here St 3456 7890 34000
Alex Brown 1, High St 3678 1234 48000
Harry Potter 99, Magic St 9976 4321 98000
Jane Morgan 87, Riverview 8765 1237 65800
Peter Jennings 65, Flag Rd 9851 1238 23980

PERSON-FRAGMENT1
NAME ADDRESS PHONE SAL
John Smith 44, Here St 3456 7890 34000
Alex Brown 1, High St 3678 1234 48000

PERSON-FRAGMENT2
NAME ADDRESS PHONE SAL
Harry Potter 99, Magic St 9976 4321 98000
Jane Morgan 87, Riverview 8765 1237 65800
Peter Jennings 65, Flag Rd 9851 1238 23980

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 65

Derived Horizontal Fragmentation

• A Derived Horizontal fragmentation is based


on conditions which are built on the output
of some other Query. The Horizontal
fragment thus define is a derived horizontal
fragment.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 66

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Vertical Fragmentation
• It is based on Projection Operation
The Predicate of the projection operation is a
list of Attribute which are intended to
constitute that corresponding vertical
fragment.

• The Various predicate to carry out a vertical


fragmentation are selected so as to meet the
objectives of disjointness, completeness
and reconstruction.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 67

Vertical Fragmentation
• Vertical Fragmentation can never be
absolutely disjoint at least one column needs
to be common, so as maintains referential
integrity

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 68

Vertical Fragmentation-example

PERSON
NAME ADDRESS PHONE SAL
John Smith 44, Here St 3456 7890 34000
Alex Brown 1, High St 3678 1234 48000
Harry Potter 99, Magic St 9976 4321 98000
Jane Morgan 87, Riverview 8765 1237 65800
Peter Jennings 65, Flag Rd 9851 1238 23980

PERSON PERSON
NAME ADDRESS PHONE NAME SAL
John Smith 44, Here St 3456 7890 John Smith 34000
Alex Brown 1, High St 3678 1234 Alex Brown 48000
Harry Potter 99, Magic St 9976 4321 Harry Potter 98000
Jane Morgan 87, Riverview 8765 1237 Jane Morgan 65800
Peter Jennings 65, Flag Rd 9851 1238 Peter Jennings 23980

Primary Key [NAME] is included in all fragments


© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 69

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Vertical Fragmentation

• Consists of a subset of attributes (column) of a


relation.
• Defined using Projection operation of relational
algebra:
a1, ... ,an(R)
• For example:
S1 = staffNo, position, sex, DOB, salary(Staff)
S2 = staffNo, fName, lName, branchNo(Staff)

• Determined by establishing affinity of one


attribute to another.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 70

Degree of Fragmentation

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 71

Horizontal and Vertical Fragmentation

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 72

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Hybrid (Mixed) Fragmentation -example

PERSON
NAME ADDRESS PHONE SAL
John Smith 44, Here St 3456 7890 34000
Alex Brown 1, High St 3678 1234 48000
Harry Potter 99, Magic St 9976 4321 98000
Jane Morgan 87, Riverview 8765 1237 65800
Peter Jennings 65, Flag Rd 9851 1238 23980

PERSON PERSON
NAME ADDRESS PHONE NAME SAL
John Smith 44, Here St 3456 7890 Harry Potter 98000
Alex Brown 1, High St 3678 1234 Jane Morgan 65800
Peter Jennings 23980

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 73

Hybrid Fragmentation -example

R1 R2

v v
R11 R12 R21 R22

Similarly in reverse order; first vertical then horizontal


© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 74

Hybrid Fragmentation -example

Applying vertical fragmentation to horizontal fragmentation

Applying horizontal fragmentation to vertical fragmentation

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 75

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Fragment Allocation
In determining the allocation of fragments, it
is important to distinguish whether we
design a final non redundant or redundant
allocation.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 76

Fragment Allocation
In case of non redundant final allocation is
easier. The simplest method is a “best-fit”
approach; a measure is associated with each
possible allocation, and the site with the best
measure is selected.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 77

Fragment Allocation
Replication introduces further complexity
in the design, because:

1. Degree of Replication is one Problem


2. Maintaining Consistency is another issue.

For determining the redundant allocation of


fragments, either of the following two
methods can be used:

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 78

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Fragment Allocation
1.Determine the set of all sites where the
“benefit of allocating one copy of fragment is
higher than the cost”, and allocate a copy of
the fragment to each element of this set; this
method select “all beneficial sites”.

2.Determine first the solution of the non


replicated problem, and then progressively
introduce replicated copies starting from the
most beneficial; the process is terminated
when no “additional replication” is beneficial.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 79

Distributed Data Storage Mechanisms

Assume relational data model


Replication
System maintains multiple copies of data,
stored in different sites, for faster retrieval
and fault tolerance.
Fragmentation
Relation is partitioned into several
fragments stored in distinct sites

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 80

Fragment Allocation
Replication and fragmentation can be
combined
Relation is partitioned into several
fragments: system maintains several
identical replicas of each such fragment.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 81

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Translating of Global Query


• In distributed Data base management system
a single global relation is some times
fragmented and these fragments are deployed
on various distinct sites, more over to ensure
processing locality some times a relation or a
fragment gets replicated even.
• A query on the other hand is issued by a user
or an application which is not aware of the
existence of fragments, replicas and their
respected allocations. This global query for its
successful execution must get decomposed
into fragment Queries.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 82

Translating of Global Query


A global query can be

Select * from student;

This Query must be decomposed into certain


Queries which take as their operands the
fragments into which student relation has
been fragmented. These Queries are termed
as fragment queries

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 83

Translating of Global Query

Select * from student1;


Select * from student2;
Select * from student3;
.
.
.
Select * from studentN;

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 84

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Translating of Global Query


The above fragment Queries are than
executed at the respective sites and the
result of these Queries are combine using a
union operation and the result is to initiator
site.

For the execution of these fragments Query


it is imperative that the information about
the fragmentation, replication, and
allocation must be obtained. This
information can be derived from the
fragmentation and the allocation schema
and is kept in the system catalog.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 85

Translating of Global Query


To Decompose a global Queries into
fragment Queries this information is procured
from the catalog.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 86

Catalog Management

Catalog Management Or Distribute Data Dictionary


Management

System catalog Constitutes the data dictionary. It is the meta data i.e.
it holds data about data.

System catalog in any database it’s a small database itself and it


holds the description of the data objects from the query execution
and optimization purposes. System catalog needs to be referenced
by the Query Processor. In distributed data bases the management
of the catalog becomes a little more demanding as beyond managing
information that relates to any catalog it also needs to implement
some of the key objectives of any distributed data base
Transparency, Global and unique naming, processing locality
etc.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 87

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Approaches for Catalog Management

• Centralized Approach
• Distributed Approach.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 88

Centralized Approach for catalog Management.

• Under this approach the system catalog is


maintained at one of the participating sites in the
distributed database.

• This site the acts as the central coordinator of the


distributed data base management system.

• The basic advantage of this approach is that it is


simple and consistency is not a concern. BUT the
approach suffers from a major draw back
1)A single Point of failure
2)A performance bottle neck.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 89

Centralized Approach for catalog Management.

In case the coordinator site fails no site could


Progress as the catalog is maintain at only one
place and added to that Query processing for all
the sides can only be as efficient as the
coordinator site is.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 90

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Distributed Approach for catalog Management.

• Full replication Approach


• Partial replication Approach

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 91

Full Replication Approach

Under this approach the complete catalog is


maintained on all the sites this allows
processing locality to all the sites in a manner
that the system catalog being locally available
each site has a greater degree of Autonomy.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 92

Draw Backs
Draw Backs
BUT this approach has its own set of Draw
backs. One of the draw back is the storage
overhead owing to greater redundancy and
the other draw back is the consistency
problem that is how to keep the replicated
copies of the system catalog on various sites
synchronized.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 93

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Partial Replication Approach


• Under this approach each site maintains a
local catalog where information about the data
base objects for which the corresponding site
is the birth site is store.

• Additionally it also holds the information of the


replicas and each site maintains a set of links
to data base objects on the other site.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 94

Partial Replication Approach


When ever any site submits a Query if it can
be handled using the local catalog its OK
else the links are evaluated. If the information
is not available in the set of links than hunt
for the data base object is made and the set
of links is accordingly updated.

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 95

Global Query Optimization, Query


Execution and access plan

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 96

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Functionality of a DBMS

The programmer sees SQL, which has two


components:
• Data Definition Language - DDL
• Data Manipulation Language - DML
query language
Behind the scenes the DBMS has:
• Query engine
• Query optimizer
• Storage management
• Transaction Management (concurrency, recovery)
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 97

Query Processing

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 98

Query Processing Component

© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor U1. 99

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Query Optimization

Query Optimization- The activity of choosing an efficient


execution strategy for processing a query.

U1.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor 100

Queries

Find
Findall
allcourses
coursesthat
that“Mary”
“Mary”takes
takes
SELECT
SELECT C.name
C.name
FROM
FROM Students
Students S,
S, Takes
Takes T,
T, Courses
Courses CC
WHERE
WHERE S.name=“Mary”
S.name=“Mary” and and
S.ssn
S.ssn ==T.ssn
T.ssn and
and T.cid
T.cid== C.cid
C.cid

What happens behind the scene ?


Query processor figures out how to answer
the query efficiently.

U1.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor 101

Queries, behind the scene

Declarative SQL query Imperative query execution plan:


sname

SELECT
SELECT C.name
C.name
FROM
FROMStudents
StudentsS,
S,Takes
TakesT,
T,Courses
CoursesCC
WHERE
WHERES.name=“Mary”
S.name=“Mary”andand cid=cid

S.ssn
S.ssn==T.ssn
T.ssnand
andT.cid
T.cid==C.cid
C.cid
sid=sid

name=“Mary”

Students Takes Courses

The optimizer chooses the best execution plan for a query


U1.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor 102

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

-->Points to remember during Optimization

• Preprocess the relation or table.


• Perform selection as early as possible
• Compute common expression only once.
• Translate an expression involving a
Cartesian
• Product followed by a subsequent selection
into natural join.

U1.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor 103

Operator/Query Tree Representation of Query

Operator/Query Tree provides a more practical representation of queries, in which


expression manipulation is easier.
The leaves of the tree represents the relations and that each node represents a operation
Example: select snum from supply,dept where supply. deptnum=dept.deptnum and
area=‘North’;
Operator Tree Representation
Case1-Global Relation
PJsnum

SLArea=‘North’

JN deptnum=deptnum

Supply Dept
U1.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor 104

Operator Tree Representation

PJsnum
Case2-Fragments
SLArea=‘North’

JN deptnum=deptnum

UN UN

Supply1 Supply2 Supply N Dept1 Dept2 Dept N

U1.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor 105

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Operator Tree Representation

PJsnum
Case3-Fragments with
Optimized result.

JN deptnum=deptnum

UN UN

SLArea=‘North SLArea=‘North
’ ’
SLArea=‘North
Supply1 Supply2 Supply N Dept1’ Dept2 Dept N

U1.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor 106

Execution and Access Plan


In order to execute and access query a plan
is prepared by the programmer. This plan
determines how to navigate in the complete
data base as well as how the data base
must be accessed. In order to implement
these plan, this requires to implement
optimization both at global as well as
locally.

U1.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor 107

Global Optimization
Global Optimization consists of
determining which data must be accessed at
which sites and which data files must
consequently be transmitted between sites.
The main optimization parameter for global
optimization is communication cost.
While Local Optimization consists of
deciding how to perform the local database
accesses at each site.

U1.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor 108

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

Example of Access plan


At site 1

Send site 2 and 3 the supplier number SN

2) At sites 2 and 3
Execute in parallel, upon receipt of the supplier number,
the following program:

Select part_no where supp_no=SN;


Send result to site 1

3) At site 1
Merge results from sites 2 and 3;
Output the result.

U1.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor 109

Short Questions
• Explain the use of distributed DBMS over
Centralized DBMS?
• Discuss the transparency in terms of
transaction.
• Describe various fragmentation techniques
with examples?
• Explain the distribution of a Database on
various sites.
• What is distributed DBMS and write its
features?
U1.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor 110

Long Questions
• What are Global Optimization, Execution and
Access Plan, give an example for access plan?
• Differentiate between homogeneous and
heterogeneous DDBMS?
• Advantage and disadvantage of DDBMS,
Explain?
• Describe Distributed approach for catalog
management?
• What is fragmentation explain different type of
fragmentation?
U1.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor 111

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor
MCA 327, Distributed Data Base Management System

References
Book:
S. Ceri, G. Pelagatti, “Distributed Database:
Principles and Systems”, McGraw Hill, New York,
1985.
DISTRIBUTED DATABASES M. Tamer Özsu
University of Alberta

Web Sights:
Wikipedia.com
Google.com

U1.
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Divya Goel, Asst. Professor 112

© Bharati Vidyapeeth’s Institute of Computer Applications and


Management, New Delhi-63, by Divya Goel, Asst. Professor

Vous aimerez peut-être aussi