Académique Documents
Professionnel Documents
Culture Documents
X Introduction
– Alternative design strategies
X Distribution design issues
X Data fragmentation
X Data allocation
1
Introduction: Design Strategies (cont’d)
Requirements
Analysis
Objectives
User Input
Conceptual View Design
Design View Integration
Distribution
Design User Input
LCS
Physical
Design
Top-down design process
LIS
M.H. Kim, KAIST
3
Distribution design
» design the local conceptual schemas
9 by distributing entities over the sites of DCS
– fragmentation
– allocation
2
Distribution Design Issues (cont’d)
Disadvantage of fragmentation
– may require extra processing, e.g., join
» for views that cannot be defined on a single fragment
– semantic data control is more difficult
» especially, integrity enforcement
3
Distribution Design Issues (cont’d)
X Fragmentation alternatives
– horizontal fragmentation
– vertical fragmentation
4
Distribution Design Issues (cont’d)
(Example cont’d)
5
Distribution Design Issues (cont’d)
(Example cont’d)
PROJ1 PROJ2
X Degree of fragmentation
tuples relation
or
attributes
6
Distribution Design Issues (cont’d)
X Correctness of fragmentation
– Completeness
» decomposition of relation R into fragments R1, R2, …, Rn is
complete iff each data item in R can also be found in some Ri
– Disjointness
» if relation R is decomposed into fragments R1, R2, …, Rn, and data
item di is in Rj, then di should not be in any other fragment Rk (k≠j)
– Reconstruction
» if relation R is decomposed into fragments R1, R2, …, Rn, then
there should exist some relational operator ∇ such that
R = ∇1≤i≤n Ri
X Allocation alternatives
– non-replicated
» partitioned: each fragment resides at only one site
– replicated
y good for reliability and efficiency of read-only-queries
y may cause trouble in update
» fully replicated: each fragment at all sites
» partially replicated : each fragment at some of the sites
7
Distribution Design Issues (cont’d)
directory easy or
management non-existent same difficulty
concurrency moderate
control difficult easy
possible possible
Reality applications realistic applications M.H. Kim, KAIST
15
X Information requirements
– Database information
for for
– Application information fragmentation allocation
– Site information
» i.e., computer system information
8
Fragmentation
Fragmentation (cont’d)
9
Fragmentation (cont’d)
Database Information
– Join graph
» equi-join relationships among relations
PAY
TITLE, SAL Owner
1
L1
EMP PROJ n
ENO, ENAME, TITLE PNO, PNAME, BUDGET, LOC Member
L2 L3
ASG
ENO, PNO, RESP, DUR
Fragmentation (cont’d)
Application Information
– qualitative information
» minterm predicate
y denotes access patterns of user applications
4 quantitative information
» minterm selectivity
y given a minterm predicate, how much tuples are accessed
» access frequency of the query
y how frequently the query is issued
10
Fragmentation (cont’d)
80/20 rule
» analyzing all the user applications may not be possible
– the most active 20% of user queries account for
» 80 % of the total data accesses
– may be used as a guideline
Fragmentation (cont’d)
z Simple predicate
– given relation R(A1, A2, …, An),
» a simple predicate pj has the form
Ai θ Value
» where θ ∈ {=,<,≤,>,≥,≠} and Value is a value in attribute Ai.
(ex) a single condition in the SQL-WHERE clause
11
Fragmentation (cont’d)
z Minterm predicate
– given relation R and Pr = {p1, p2, …, pm},
» define M = {m1,m2,…,mr} as
M = {mi | mi = ∧pj∈Pr pj*}, 1≤j≤m, 1≤i≤z
» where pj* = pj or pj* = ¬(pj).
y i.e., each simple predicate occurs in a minterm predicate
either in its natural form or its negated form
Fragmentation (cont’d)
12
Fragmentation: PHF
Minterm fragment
» horizontal fragment defined by a minterm predicate
13
Fragmentation: PHF (cont’d)
z Outline of PHF
– given
» a relation R, and the set of simple predicates Pr
– output
» the set of fragments of R = {R1, R2, . . . , Rw}
y which obey the fragmentation rule
14
Fragmentation: PHF (cont’d)
15
Fragmentation: PHF (cont’d)
– application 1
» Find the budgets of projects at “Montreal.
» Find the budgets of projects at “New York.
» Find the budgets of projects at “Paris.
– application 2
» Find projects with budgets less than $200000.
» Find projects with budgets greater than or equal to $200000.
(Example cont’d)
– according to application1,
» Pr = {LOC=“Montreal”,LOC=“New York”,LOC =“Paris”}
y but, this is not complete with respect to application 2
– thus, modify
» Pr = {LOC=“Montreal”,LOC=“New York”,LOC =“Paris”,
BUDGET≤200000, BUDGET>200000}
– then, it is complete.
16
Fragmentation: PHF (cont’d)
However, if we add
PNAME = “Instrumentation” to Pr,
» then Pr is not minimal.
y because there is no application that would access the
resulting fragments any differently
17
Fragmentation: PHF (cont’d)
Algorithm COM_MIN
– input
» a relation R and a set of simple predicates Pr
– output
» a complete and minimal set of simple predicates Pr´ for Pr
{ Initialization
– find a pi ∈ Pr such that pi partitions R
– according to Rule 1
– set Pr´ ← pi; Pr ← Pr - pi ; F ← fi
/* fi : fragment fi defined according to a minterm predicate defined
over the predicates of Pr´ */
| Iteratively add predicates to Pr´ until it is complete
– find a pj ∈ Pr such that pj partitions some fk
» according to Rule 1
– set Pr´ ← Pr´ ∪ pi; Pr ← Pr - pi; F ← F ∪ fi
18
Fragmentation: PHF (cont’d)
19
Fragmentation: PHF (cont’d)
(Example cont’d)
– Minterm predicates
m1 : (SAL ≤ 30000)
m2 : (SAL > 30000) , i.e., NOT(SAL ≤ 30000)
PAY1 PAY2
TITLE SAL T IT LE SAL
Mech. Eng. 27000 E lect. E ng. 4000 0
Program mer 24000 S yts. A nal. 3400 0
20
Fragmentation: PHF (cont’d)
(Example cont’d)
(Example cont’d)
– simple predicates
» for application 1,
p1 : LOC = “Montreal”
p2 : LOC = “New York”
p3 : LOC = “Paris”
» for application 2,
p4 : BUDGET ≤ 200000
p5 : BUDGET > 200000
21
Fragmentation: PHF (cont’d)
(Example cont’d)
– Minterm predicates
9 left after eliminating meaningless ones
» m1 : (LOC = “Montreal”) ∧ (BUDGET ≤ 200000)
» m2 : (LOC = “Montreal”) ∧ (BUDGET > 200000)
» m3 : (LOC = “New York”) ∧ (BUDGET ≤ 200000)
» m4 : (LOC = “New York”) ∧ (BUDGET > 200000)
» m5 : (LOC = “Paris”) ∧ (BUDGET ≤ 200000)
» m6 : (LOC = “Paris”) ∧ (BUDGET > 200000)
(Example cont’d)
PROJ1 PROJ2
PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC
Database
P1 Instrumentation 150000 Montreal P2 135000 New York
Develop.
PROJ4 PROJ6
PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC
22
Fragmentation: PHF (cont’d)
Fragmentation: DHF
PAY Owner
TITLE, SAL
1
L1
EMP PROJ n
ENO, ENAME, TITLE PNO, PNAME, BUDGET, LOC Member
L2 L3
ASG
ENO, PNO, RESP, DUR
23
Fragmentation: DHF (cont’d)
EMP1 EMP2
ENO ENAM E T IT L E ENO ENAM E T IT L E
E3 A. Lee M ech. Eng. E1 J. D oe E le c t. E n g .
E4 J . M ille r P ro g ra m m e r E2 M . S m ith S y st. A n a l.
E7 R . D a v is M ech. Eng. E5 B. C asey S y st. A n a l.
E6 L. Chu E le c t. E n g .
E8 J. Jones S y st. A n a l.
24
Fragmentation: DHF (cont’d)
Complication in DHF
» there can be multiple links on the target (i.e. member) relation
y i.e., there can be several ways of DHF
– Criteria to decide which DHF
» fragmentation used on more applications
y try to focus on the heavy users
» fragmentation with better join characteristics
y joins can be performed on smaller relations
y joins can be performed in a distributed fashion
9 i.e., distributed join
Distributed join
» sub-joins between horizontally fragmented relations
– efficiency of distributed join:
» affected by the nature of a join graph
y simple join graph between fragments
y complex join graph between fragments
25
Fragmentation: DHF (cont’d)
26
Fragmentation: DHF (cont’d)
R1 S1 R1 S1
R2 S2 R2 S1
R3 S3 R3 S2
R4 S4 R4 S3
27
Fragmentation: VF
X Vertical fragmentation
– has been studied within the centralized context
» physical clustering for the most active sub-relations
– number of alternatives is very large
» for m non-primary key attributes, the possible number of
fragments is B(m), i.e, m-th Bell number
» for large m, B(m) ≈ mm
y e.g., B(10) ≈ 115,000, B(15) ≈ 109, B(30) ≈ 1023
Fragmentation: VF (cont’d)
28
Fragmentation: VF (cont’d)
Fragmentation: VF (cont’d)
Information requirements
– attribute affinity
y a measure indicating how closely the attributes are related
9 can be obtained from more primitive usage data
29
Fragmentation: VF (cont’d)
Fragmentation: VF (cont’d)
30
Fragmentation: VF (cont’d)
(Example cont’d)
– Let A1 = PNO, A2 = PNAME, A3 = BUDGET, A4 = LOC.
A1 A2 A3 A4
q1 1 0 1 0
q2 0 1 1 0 attribute usage matrix
q3 0 1 0 1
q4 0 0 1 1
Fragmentation: VF (cont’d)
(Example cont’d)
– assume
» each query accesses the attributes once during each execution.
» following frequencies of queries at three sites
S1 S2 S3
q1 15 20 10
q2 matrix for
5 0 0
query frequencies
q3 25 25 25 at three sites
q4 3 0 0
31
Fragmentation: VF (cont’d)
(Example cont’d)
– Then, the attribute affinity matrix AA is A1 A2 A3 A4
A1 45 0 45 0
» e.g., aff(A1, A3) = 15*1 + 20*1 + 10*1 = 45
A2 0 80 5 75
A3 45 5 53 3
A4 0 75 3 78
A1 A2 A3 A4 S1 S2 S3 S
Fragmentation: VF-Clustering
32
Fragmentation: VF-Clustering (cont’d)
33
Fragmentation: VF-Clustering (cont’d)
where
– definition of AM
n n
AM =
∑∑aff ( A , A )[aff ( A , A
i =1 j =1
i j i j −1 ) + aff ( Ai , Aj +1 )]
n n n
=
∑[∑aff ( A , A )aff ( A , A
j =1 i =1
i j i j −1 ) + ∑aff ( Ai , Aj )aff ( Ai , Aj +1)]
i =1
n
– then, AM = ∑[bond( A , A
j =1
j j −1 ) + bond( Aj , Aj +1 )]
34
Fragmentation: VF-Clustering (cont’d)
l=i: bond(Ai-1, Ai) + bond(Ai, Ai+1)
– Consider the following n attributes
l=i+1: bond(Ai, Ai+1) + bond(Ai+1, Ai+2)
A1 A2 L Ai-1 Ai Aj Aj+1 L An
i −1
AMold = ∑[bond( A , A
l =1
l l −1 ) + bond( Al , Al +1 )] +
A1 A2 L Ai-1 Ai Ak Aj Aj+1 L An
35
Fragmentation: VF-Clustering (cont’d)
A1 A2 A3 A4 A1 A2
A1 45 0 45 0 45 0
A2 0 80 5 75 0 80
AA = CA =
A3 45 5 53 3 45 5
A4 0 75 3 78 0 75
(Example cont’d)
Ordering (0-3-1) :
cont(A0, A3, A1) = 2bond(A0, A3) + 2bond(A3, A1) - 2bond(A0, A1)
= 0 + 2(45*45 + 45*53) - 0 = 2*4410 = 8820
Ordering (1-3-2) :
cont(A1, A3, A2) = 2bond(A1, A3) + 2bond(A3, A2) - 2bond(A1, A2)
= 2*4410 + 2(80*5 + 5*53 + 75*3) - 2(45*5)
= 2*4410 + 2*890 + 2*225 = 10150
Ordering (2-3-4) :
cont(A2, A3, A4) = 2bond(A2, A3) + 2bond(A3, A4) - 2bond(A2, A4)
= 890 + 0 - 0 = 1780
36
Fragmentation: VF-Clustering (cont’d)
(Example cont’d)
A1 A3 A2
A1 45 45 0
A3 0 5 80
A2 45 53 5
A4 0 3 75
(Example cont’d)
A1 A3 A2 A4
A1 45 45 0 0
A2 0 5 80 75
A3 45 53 5 3
A4 0 3 75 78
M.H. Kim, KAIST
74
37
Fragmentation: VF-Clustering (cont’d)
(Example cont’d)
– Row ordering
» the final form of the CA matrix (after row ordering) is
A1 A3 A2 A4
A1 45 45 0 0
A3 45 53 5 3
A2 0 5 80 75
A4 0 3 75 78
Fragmentation: VF-Partitioning
X Partitioning algorithm
– divide a set of clustered attributes {A1, A2, …,An} into two (or
more) sets {A1, A2, …, Ai} and {Ai+1, …, An}
» such that these sets of attributes are accessed
y solely, or
y for the most part, by distinct applications
38
Fragmentation: VF-Partitioning (cont’d)
A1 A2 A3 • • • Ai Ai+1 • • • An
A1
A2 TA: top attributes
A3 TA
• BA: bottom attributes
•
•
Ai
Ai+1
• BA
•
•
An
z Sets of applications
TQ = set of applications that access only TA
BQ = set of applications that access only BA
OQ = set of applications that access both TA and BA
39
Fragmentation: VF-Partitioning (cont’d)
40
Fragmentation: VF-Partitioning (cont’d)
41
Fragmentation: VF-Partitioning (cont’d)
Fragmentation: HF
X Hybrid fragmentation
– VF may be followed by HF, or vice versa
» producing a tree-structured partitioning
R
HF HF
R1 R2
VF VF VF VF VF
42
Fragment Allocation
X Allocation problem
» allocation of resources across the network has been much studied
y however, most of this work is about placing files
9 rather than DDB design
– Given
F = {F1, F2, …, Fn} fragments
S = {S1, S2, …, Sm} network sites
Q = {q1, q2, …, qq} applications
– Find the “optimal” distribution of F to S.
Definition of optimality
1. Minimal cost
» communication cost +
» storage cost +
» processing cost (read & update)
2. Performance
» response time and/or
» throughput
43
Fragment Allocation (cont’d)
X Information requirements
– Database information
» selectivity of fragment Fj with respect to query qi
y # of tuples in Fj that need to be accessed for qi
» size of a fragment
– Application information
» number of read accesses of a query to a fragment
» number of update accesses of a query to a fragment
» a matrix indicating which queries updates which fragments
» a similar matrix for retrievals
» originating site of each query
44
Fragment Allocation (cont’d)
– Site information
» storage capacity
» processing capacity
» unit cost of storing data at a site
» unit cost of processing at a site
– Network information
» communication cost per frame between two sites
» frame size
X Allocation model
» minimize the total cost of processing and storage
y while trying to meet response time constraints
– min(Total cost)
» subject to
y response time constraint
y storage constraint
y processing constraint
decision variable
1 if fragment Fi is stored at site Sj
xij =
0 otherwise
M.H. Kim, KAIST
90
45
Fragment Allocation (cont’d)
Total cost
{ Processing component
» access cost + integrity enforcement cost + concurrency control cost
– access cost
Σ Σ
all sites all fragments (# of read accesses + # of update accesses) *
xij * (local processing cost at a site)
4 simple assumption
y read cost = update cost
46
Fragment Allocation (cont’d)
| Transmission component
» cost for updates + cost for retrievals
Σ Σ
all sites all fragments(cost of update message) * xij +
Constraints
– response time constraint
» (execution time of a query) ≤ (maximum allowable response time
for that query)
47
Fragment Allocation (cont’d)
X Solution methods
y FAP is NP-complete
y DAP is also NP-complete
» has to look for heuristic methods
– heuristics commonly adopted for FAP and DAP
» knapsack problem solutions
» branch and bound techniques
» network flow problem solutions
48