Académique Documents
Professionnel Documents
Culture Documents
20.1
Introduction
Distributed Data base system consists
20.2
Scope
distributed database are widely used in large data processing, todays
20.3
Research
Lots of research are going on in
20.4
20.5
Centralized Systems
Run on a single computer system and do not interact with other
computer systems.
General
General-purpose
purpose computer system: one to a few CPUs and a number
unit, single user, usually has only one CPU and one or two hard
disks; the OS may support only one user.
Multi-user system: more disks, more memory, multiple CPUs, and a
20.6
20.7
Client--Server Systems
Client
Server systems satisfy requests generated at m client systems, whose general
20.8
20.9
easier
i maintenance
i t
20.10
20.11
Transaction Servers
Also called query server systems or SQL server systems
z
transaction.
Open Database Connectivity (ODBC) is a C language
20.12
20.13
L writer
Log
it process outputs
t t log
l records
d tto stable
t bl storage.
t
Checkpoint process
z
Monitors other processes, and takes recovery actions if any of the other
processes fail
p
20.14
20.15
Buffer pool
z Lock table
z Log buffer
z Cached query plans (reused if same query submitted again)
All database processes can access shared memory
To ensure that no two processes are accessing the same data structure
at the same time, databases systems implement mutual exclusion
using either
z Operating system semaphores
z Atomic instructions such as test-and-set
To avoid overhead of interprocess communication for lock
20.16
Data Servers
Used in high-speed LANs, in cases where
z
Locking
Data Caching
Lock Caching
20.17
20.18
Lock Caching
z
Server calls back locks from clients when it receives conflicting lock
request. Client returns lock once no local transaction is using it.
20.19
Parallel Systems
Parallel database systems consist of multiple processors and multiple
powerful processors
A massively parallel or fine grain parallel machine utilizes
20.20
Measured by:
S
Speedup
d iis linear
li
if equation
ti equals
l N
N.
Scaleup: increase the size of both the problem and the system
z
Measured by:
20.21
Speedup
Speedup
Database System Concepts - 5th Edition
20.22
Scaleup
Scaleup
p
20.23
Transaction scaleup:
z
N-times
N
times as many users submitting requests (hence
(hence, N-times
N times as
many requests) to an N-times larger database, on an N-times
larger computer.
Well-suited
Well
suited to parallel execution
execution.
20.24
bus, disks, or locks) compete with each other, thus spending time
waiting
g on other p
processes, rather than p
performing
g useful work.
Skew: Increasing the degree of parallelism increases the variance in
20.25
n components
p
are connected to log(n)
g( ) other components
p
and can
reach each other via at most log(n) links; reduces communication
delays.
20.26
Interconnection Architectures
20.27
common disk
Hierarchical -- hybrid of the above architectures
20.28
20.29
Shared Memory
Processors and disks have access to a common memory, typically via
20.30
Shared Disk
All processors can directly access all disks via an interconnection
subsystem.
Shared-disk systems can scale to a somewhat larger number of
20.31
Shared Nothing
Node consists of a processor, memory, and one or more disks.
20.32
Hierarchical
Combines characteristics of shared-memory, shared-disk, and shared-
nothing architectures.
Top level is a shared
shared-nothing
nothing architecture nodes connected by an
few processors.
Alternatively, each node could be a shared-disk system, and each of
virtual-memory architectures
z
20.33
Distributed Systems
Data
D t spread
d over multiple
lti l machines
hi
((also
l referred
f
d tto as sites
it or
nodes).
Network interconnects the machines
Data shared by users on multiple machines
20.34
Distributed Databases
Homogeneous distributed databases
z
20.35
stored locally.
Higher system availability through redundancy data can be
replicated
p
at remote sites, and system
y
can function even if a site fails.
Disadvantage: added complexity required to ensure proper
Software development
p
cost.
20.36
Basic
B
i id
idea: each
h site
it executes
t ttransaction
ti until
til jjustt b
before
f
commit,
it
and the leaves final decision to a coordinator
20.37
Network Types
Local-area networks (LANs) composed of processors that are
20.38
discontinuous connection:
z
Data is replicated.
20.39
Parallel Databases
Introduction
I/O Parallelism
Interquery Parallelism
Intraquery Parallelism
Intraoperation Parallelism
Interoperation Parallelism
Design of Parallel Systems
20.40
Introduction
Parallel machines are becoming quite common and affordable
z
20.41
Parallelism in Databases
Data can be partitioned across multiple disks for parallel I/O.
Individual relational operations (e.g., sort, join, aggregation) can be
executed in parallel
z
relational algebra)
z
20.42
I/O Parallelism
Reduce the time required to retrieve relations from disk by partitioning
the relations on multiple disks.
Horizontal
H i
t l partitioning
titi i tuples
t l off a relation
l ti are di
divided
id d among many di
disks
k
Round-robin:
Send the ith tuple inserted in the relation to disk i mod n.
Hash partitioning:
z
z
z
20.43
Ch
Choose
an attribute
tt ib t as th
the partitioning
titi i attribute.
tt ib t
20.44
of data access:
1.Scanning the entire relation.
2 Locating a tuple associatively point queries.
2.Locating
queries
z
3.Locating all tuples such that the value of a given attribute lies within a
specified range range q
queries.
eries
z
20.45
B t suited
Best
it d ffor sequential
ti l scan off entire
ti relation
l ti on each
h query.
20.46
Hash partitioning:
20.47
be accessed.
For range queries on partitioning attribute, one to a few
f
disks may need
to be accessed
z
If many blocks are to be fetched, they are still fetched from one to a
few disks, and potential parallelism in disk access is wasted
20.48
disks.
If a relation consists of m disk blocks and there are n disks available in
the system,
y
, then the relation should be allocated min(m,n)
( , ) disks.
20.49
Handling of Skew
The distribution of tuples to disks may be skewed that is, some
disks have many tuples, while others may have fewer tuples.
Types of skew:
z
Attribute-value skew.
Partition skew.
skew
20.50
20.51
processor partitioning:
z
Basic idea:
z
20.52
Interquery Parallelism
Queries/transactions execute in parallel with one another.
Increases transaction throughput; used primarily to scale up a
p
parallel
database,, because even sequential
q
database systems
y
support
pp
concurrent processing.
More complicated to implement on shared-disk or shared-nothing
architectures
z
20.53
20.54
Intraquery Parallelism
Execution of a single query in parallel on multiple processors/disks;
20.55