Académique Documents
Professionnel Documents
Culture Documents
Introduction
Background
Distributed Database Design
Database Integration
Semantic Data Control
Distributed Query Processing
Multidatabase Query Processing
Distributed Transaction Management
Data Replication
Parallel Database Systems
Ch.14/1
Predictions
Moores law: processor speed growth (with multicore): 50 % per
year
DRAM capacity growth : 4
Distributed DBMS
Ch.14/2
The Solution
Increase the I/O bandwidth
Data partitioning
Parallel data access
Distributed DBMS
Ch.14/3
Multiprocessor Objectives
High-performance with better cost-performance than mainframe
or vector supercomputer
Trends
Distributed DBMS
Ch.14/4
Application
server
query parsing
data server interface
communication channel
Data application server interface
database functions
server
database
Distributed DBMS
Ch.14/5
Distributed DBMS
Ch.14/6
Potential problems
High-level interface
Distributed DBMS
Ch.14/7
Fortran, OPS5)
Augment an existing language with parallel constructs (e.g., C*,
Fortran90)
Offer a new language in which parallelism can be expressed or
automatically inferred
Critique
low-level
Can combine the advantages of both (1) and (2)
Distributed DBMS
Ch.14/8
Data-based
Parallelism
Inter-operation
op.3
op.2
op.1
Intra-operation
The same op in parallel
Distributed DBMS
op.
op.
op.
op.
op.
R1
R2
R3
R4
Ch.14/9
Parallel DBMS
Loose definition: a DBMS implemented on a tighly coupled
multiprocessor
Alternative extremes
Straighforward porting of relational DBMS (the software vendor
edge)
New hardware/software combination (the computer manufacturer
edge)
Distributed DBMS
Ch.14/10
Distributed DBMS
Ch.14/11
Linear Speed-up
Linear increase in performance for a constant DB size and
proportional increase of the system components (processor,
memory, disk)
new perf.
old perf.
ideal
components
Distributed DBMS
Ch.14/12
Linear Scale-up
Sustained performance for a linear increase of database size and
proportional increase of the system components.
new perf.
old perf.
ideal
Ch.14/13
Barriers to Parallelism
Startup
The time needed to start a parallel operation may dominate the
Interference
Skew
slowest one
Distributed DBMS
Ch.14/14
User
task 1
Session
Mgr
RM
task 1
DM
task
11
Distributed DBMS
DM
task
12
RM
task n
Request
Mgr
Data
Mgr
DM
task
n2
DM
task
n1
Ch.14/15
Host interface
Request manager
Execution control
Data manager
Execution of DB operations
Data management
Distributed DBMS
Ch.14/16
Hybrid architectures
Non-Uniform Memory Architecture (NUMA)
Cluster
Distributed DBMS
Ch.14/17
Shared-Memory
Ch.14/18
Shared-Disk
Distributed DBMS
Ch.14/19
Shared-Nothing
Distributed DBMS
Ch.14/20
Hybrid Architectures
Various possible combinations of the three basic architectures are
possible to obtain different trade-offs between cost, performance,
extensibility, availability, etc.
Distributed DBMS
Ch.14/21
NUMA
Shared-Memory vs. Distributed Memory
Mixes two different aspects : addressing and memory
Distributed DBMS
Ch.14/22
CC-NUMA
Principle
Main memory distributed as with shared-nothing
However, any processor has access to all other processors memories
cache interconnect
Distributed DBMS
Ch.14/23
Cluster
Distributed DBMS
Ch.14/24
SN cluster vs SD cluster
SN cluster can yield
distributed file system protocol such as NFS, relatively slow and not
appropriate for database management
Distributed DBMS
Ch.14/25
Discussion
For a small configuration (e.g., 8 processors), SM can provide the
highest performance because of better load balancing
Some years ago, SN was the only choice for high-end systems.
read-only, SN is used
Distributed DBMS
Ch.14/26
Transaction management
Distributed DBMS
Ch.14/27
Data Partitioning
Each relation is divided in n partitions (subrelations), where n is a
function of relation size and access frequency
Implementation
Round-robin
B-tree index
Hash function
Distributed DBMS
Ch.14/28
Partitioning Schemes
Round-Robin
Hashing
a-g
h-m
u-z
Interval
Distributed DBMS
Ch.14/29
Distributed DBMS
Ch.14/30
Interleaved Partitioning
Node
Primary copy
1
R1
2
R2
Backup copy
r1.1
r2.3
r3.2
r1.3
r2.2
R3
R4
r1.2
r2.1
r3.2
r3.1
Distributed DBMS
Ch.14/31
Chained Partitioning
Node
Primary copy
R4
Backup copy
r3
Distributed DBMS
R1
R2
R3
r4
r1
r2
Ch.14/32
Placement Directory
Performs two functions
F1 (relname, placement attval) = lognode-id
F2 (lognode-id) = phynode-id
and f2 should be
Distributed DBMS
Ch.14/33
Join Processing
Three basic algorithms for intra-operator parallelism
Parallel nested loop join: no special assumption
Parallel associative join: one relation is declustered on join attribute
and equi-join
Parallel hash join: equi-join
Distributed DBMS
Ch.14/34
node 2
R1:
R2:
send
partition
S2
S1
node 3
Distributed DBMS
node 4
Ch.14/35
node 2
R1:
R2:
S2
S1
node 3
Distributed DBMS
node 4
Ch.14/36
node
R2:
node
S1:
S2:
node 2
node 1
Distributed DBMS
node
Ch.14/37
Search space
Search strategy
Cost
Distributed DBMS
Ch.14/38
Result
j3
j6
R4
R4
j2
Right-deep
j5
R3
R3
j1
R1
Result
j4
R1
R2
R2
Result
Result
j9
Zig-zag
j12
R4
Distributed DBMS
R2
j11
j10
R3
j7
R1
Bushy
j8
R1
R2 R3
R4
Ch.14/39
Probe3
Build3
Build3
Probe3
Temp2
Temp2
R4
Build2
R4
Probe2
Build2
Probe2
Temp1
Temp1
R3
R3
Build1
R1
Distributed DBMS
Probe1
Build1
R2
R1
M. T. zsu & P. Valduriez
Probe1
R2
Ch.14/40
Load Balancing
Problems arise for intra-operator parallelism with skewed data
distributions
Solutions
Distributed DBMS
Ch.14/41
JPS
Res2
Res1
AVS/TPS
Join2
Join1
S1
S2
RS/SS
RS/SS
Scan1
AVS/TPS
AVS/TPS
Scan2
R1
R2
Distributed DBMS
AVS/TPS
Ch.14/42
Load Balancing in a DB
Cluster
Choose the node to execute
Q4
Q3
Q2
Q1
round robin
Fail over
Load balancing
In case of interference
Distributed DBMS
Q1
Q2
Q3
Q4
Ch.14/43
Oracle Transparent
Application Failover
Client
Node 1
Ping
Node 2
connect1
Distributed DBMS
Ch.14/44
Fibre Channel
Distributed DBMS
Ch.14/45
Main Products
Vendor
Product
Architecture Platforms
IBM
SD
SN
AIX on SP
Linux on cluster
Microsoft
SQL Server
SQL Server 2008 R2
Parallel Data
Warehouse
SD
SN
Windows on SMP
and cluster
Oracle
Windows, Unix,
Linux on SMP and
cluster
NCR
Teradata
SN
Bynet network
Oracle
MySQL
SN
Linux Cluster
Distributed DBMS
Ch.14/46
2 processors, 24 Go RAM
Distributed DBMS
Ch.14/47
Exadata Architecture
Real Application Cluster
Infiniband Switches
Storage cells
Distributed DBMS
Ch.14/48