Académique Documents
Professionnel Documents
Culture Documents
Carnegie Mellon
Problem definition
centralized DB:
CHICAGO
LA
NY
Carnegie Mellon
15-415 - C. Faloutsos
Problem definition
Distr. DB: DB stored in many places ... connected
LA
NY
Carnegie Mellon
15-415 - C. Faloutsos
Problem definition
now: connect to LA; exec sql select * from EMP; ... connect to NY; exec sql select * from EMPLOYEE; ...
LA
EMP
DBMS1 DBMS2
NY EMPLOYEE
Carnegie Mellon
15-415 - C. Faloutsos
Problem definition
ideally: connect to distr-LA; exec sql select * from EMPL; LA
D-DBMS D-DBMS
NY
EMPLOYEE
EMP
DBMS1 DBMS2
Carnegie Mellon
15-415 - C. Faloutsos
Pros + Cons
Pros
. . .
Cons
. . .
Carnegie Mellon 15-415 - C. Faloutsos 7
Pros + Cons
Pros
Data sharing reliability & availability speed up of query processing
Cons
software development cost more bugs may increase processing overhead (msg)
Carnegie Mellon 15-415 - C. Faloutsos 8
Overview
Problem motivation Design issues Query optimization semijoins transactions (recovery, conc. control)
Carnegie Mellon
15-415 - C. Faloutsos
Carnegie Mellon
15-415 - C. Faloutsos
10
Carnegie Mellon
15-415 - C. Faloutsos
11
horiz. fragm.
Carnegie Mellon
15-415 - C. Faloutsos
12
Carnegie Mellon
15-415 - C. Faloutsos
13
Problem definition
ideally: connect to distr-LA; exec sql select * from EMPL; LA
D-DBMS D-DBMS
NY
EMPLOYEE
EMP
DBMS1 DBMS2
Carnegie Mellon
15-415 - C. Faloutsos
14
Overview
Problem motivation Design issues Query optimization semijoins transactions (recovery, conc. control)
Carnegie Mellon
15-415 - C. Faloutsos
15
Carnegie Mellon
15-415 - C. Faloutsos
16
S1
s# s1 s2 s5 s11
...
S3
SUPPLIER Join SHIPMENT = ?
Carnegie Mellon 15-415 - C. Faloutsos 17
semijoins
choice of plans?
Carnegie Mellon
15-415 - C. Faloutsos
18
semijoins
choice of plans? plan #1: ship SHIP -> S2; join; ship -> S3 plan #2: ship SHIP->S3; ship SUP->S3; join ... others?
Carnegie Mellon 15-415 - C. Faloutsos 19
S1
s# s1 s2 s5 s11
...
S3
SUPPLIER Join SHIPMENT = ?
Carnegie Mellon 15-415 - C. Faloutsos 20
Semijoins
Idea: reduce the tables before shipping
SUPPLIER SHIPMENT
s# s1 s2 s3 s2 p# p1 p1 p5 p9
S1
s# s1 s2 s5 s11
...
S3
SUPPLIER Join SHIPMENT = ?
Carnegie Mellon 15-415 - C. Faloutsos 21
Semijoins
How to do the reduction, cheaply? Eg., reduce SHIPMENT:
Carnegie Mellon
15-415 - C. Faloutsos
22
Semijoins
Idea: reduce the tables before shipping
SUPPLIER
(s1,s2,s5,s11)
SHIPMENT
s# s1 s2 s3 s2 p# p1 p1 p5 p9
S1
s# s1 s2 s5 s11
...
S3
SUPPLIER Join SHIPMENT = ?
Carnegie Mellon 15-415 - C. Faloutsos 23
Semijoins
Formally: SHIPMENT = SHIPMENT SUPPLIER express semijoin w/ rel. algebra
Carnegie Mellon
15-415 - C. Faloutsos
24
Semijoins
Formally: SHIPMENT = SHIPMENT SUPPLIER express semijoin w/ rel. algebra
R ' R S R ( R S )
Carnegie Mellon 15-415 - C. Faloutsos 25
Semijoins eg:
suppose each attr. is 4 bytes Q: transmission cost (#bytes) for semijoin
SHIPMENT = SHIPMENT semijoin SUPPLIER
Carnegie Mellon
15-415 - C. Faloutsos
26
Semijoins
Idea: reduce the tables before shipping
SUPPLIER
(s1,s2,s5,s11)
SHIPMENT
s# s1 s2 s3 s2 p# p1 p1 p5 p9
S1
s# s1 s2 s5 s11
...
4 bytes
Carnegie Mellon
S3
15-415 - C. Faloutsos 27
Semijoins eg:
suppose each attr. is 4 bytes Q: transmission cost (#bytes) for semijoin
SHIPMENT = SHIPMENT semijoin SUPPLIER
A: 4*4 bytes
Carnegie Mellon
15-415 - C. Faloutsos
28
Semijoins eg:
suppose each attr. is 4 bytes Q1: give a plan, with semijoin(s) Q2: estimate its cost (#bytes shipped)
Carnegie Mellon
15-415 - C. Faloutsos
29
Semijoins eg:
A1:
reduce SHIPMENT to SHIPMENT SHIPMENT -> S3 SUPPLIER -> S3 do join @ S3
Q2: cost?
Carnegie Mellon
15-415 - C. Faloutsos
30
Semijoins
SUPPLIER
(s1,s2,s5,s11)
SHIPMENT
s# s1 s2 s3 s2 p# p1 p1 p5 p9
S1
s# s1 s2 s5 s11
...
4 bytes 4 bytes
Carnegie Mellon
S3
15-415 - C. Faloutsos
4 bytes 4 bytes
31
Semijoins eg:
A2:
4*4 bytes - reduce SHIPMENT to SHIPMENT 3*8 bytes - SHIPMENT -> S3 4*8 bytes - SUPPLIER -> S3 0 bytes - do join @ S3
72
Carnegie Mellon
bytes TOTAL
15-415 - C. Faloutsos 32
Other plans?
Carnegie Mellon
15-415 - C. Faloutsos
33
Other plans?
P2: reduce SHIPMENT to SHIPMENT reduce SUPPLIER to SUPPLIER SHIPMENT -> S3 SUPPLIER -> S3
Carnegie Mellon
15-415 - C. Faloutsos
34
Other plans?
P3: reduce SUPPLIER to SUPPLIER SUPPLIER -> S2 do join @ S2 ship results -> S3
Carnegie Mellon
15-415 - C. Faloutsos
35
Carnegie Mellon
15-415 - C. Faloutsos
36
Semijoins
SUPPLIER
(s1,s2,s5,s11)
SHIPMENT
s# s1 s2 s3 s2 p# p1 p1 p5 p9
S1
s# s1 s2 s5 s11
...
4 bytes 4 bytes
Carnegie Mellon
S3
15-415 - C. Faloutsos
4 bytes 4 bytes
37
Two-way Semijoins
S2
SUPPLIER
(s1,s2,s5,s11)
SHIPMENT
s# s1 s2 (s5,s11) s3 s2 p# p1 p1 p5 p9
S1
s# s1 s2 s5 s11
...
S3
Carnegie Mellon 15-415 - C. Faloutsos 39
Overview
Problem motivation Design issues Query optimization semijoins transactions (recovery, conc. control)
Carnegie Mellon
15-415 - C. Faloutsos
40
Transactions recovery
Problem: eg., a transaction moves
$100 from NY -> $50 to LA, $50 to Chicago
3 sub-transactions, on 3 systems, with 3 W.A.L.s how to guarantee atomicity (all-or-none)? Observation: additional types of failures (links, servers, delays, time-outs ....)
Carnegie Mellon 15-415 - C. Faloutsos 41
Transactions recovery
Problem: eg., a transaction moves
$100 from NY -> $50 to LA, $50 to Chicago
Carnegie Mellon
15-415 - C. Faloutsos
42
Distributed recovery
How? CHICAGO T1,2: +$50
LA
NY
Carnegie Mellon
15-415 - C. Faloutsos
43
Distributed recovery
Step1: choose coordinator CHICAGO T1,2: +$50 NY
LA
Carnegie Mellon
15-415 - C. Faloutsos
44
Distributed recovery
Step 2: execute a protocol, eg., 2 phase commit
Carnegie Mellon
15-415 - C. Faloutsos
45
2 phase commit
T1,1 (coord.) T1,2
T1,3 prepare to commit
time
Carnegie Mellon 15-415 - C. Faloutsos 46
2 phase commit
T1,1 (coord.) T1,2
T1,3 prepare to commit
Y
Y
time
Carnegie Mellon 15-415 - C. Faloutsos 47
2 phase commit
T1,1 (coord.) T1,2
T1,3 prepare to commit
Y
Y commit
time
Carnegie Mellon 15-415 - C. Faloutsos 48
time
Carnegie Mellon 15-415 - C. Faloutsos 49
2 phase commit
T1,1 (coord.) T1,2
T1,3 prepare to commit
Y
N
time
Carnegie Mellon 15-415 - C. Faloutsos 50
2 phase commit
T1,1 (coord.) T1,2
T1,3 prepare to commit
Y
N abort
time
Carnegie Mellon 15-415 - C. Faloutsos 51
Distributed recovery
Many, many additional details (what if the coordinator fails? what if a link fails? etc) and many other solutions (eg., 3-phase commit)
Carnegie Mellon
15-415 - C. Faloutsos
52
Overview
Problem motivation Design issues Query optimization semijoins transactions (recovery, conc. control)
Carnegie Mellon
15-415 - C. Faloutsos
53
Carnegie Mellon
15-415 - C. Faloutsos
54
Distributed deadlocks
T2,la T1,la CHICAGO NY T2,ny
LA
T1,ny
NY
Carnegie Mellon
15-415 - C. Faloutsos
55
Distributed deadlocks
T2,la T1,la T2,ny T1,ny
LA
NY
Carnegie Mellon
15-415 - C. Faloutsos
56
Distributed deadlocks
T2,la T1,la T2,ny T1,ny
LA
NY
Carnegie Mellon
15-415 - C. Faloutsos
57
LA
Distributed deadlocks
NY
T2,la T1,la
T2,ny T1,ny
Conclusions
Distr. DBMSs: not deployed BUT: produced clever ideas:
semijoins distributed recovery / conc. control