Distributed Database Design: Data Placement and Fragmentation

Distributed Database Design
TOPICS
Distributed database design concept,
objective of Data Distribution,
Data Fragmentation,
The allocation of fragment ,
Transparencies in Distributed Database Design
Design Problem
In the general setting :
Making decisions about the placement of data

and programs across the sites of a computer
network as well as possibly designing the
network itself.
In Distributed DBMS, the placement of
applications entails
placement of the distributed DBMS software; and
placement of the applications that run on the
database
Dimensions of the Problem

Access pattern behavior
dynamic
static
data
data +
program
Level of sharing
partial
information
Level of knowledge
complete
information
Distribution Design
Top-down
mostly in designing systems from scratch
mostly in homogeneous systems
Bottom-up
when the databases already exist at a
number of sites
Top-Down Design
Requirements
Analysis
Objectives
User Input
Conceptual
Design
View Integration
View Design
Access
Information
GCS
Distribution
Design
LCSs
Physical
Design
LISs
ESs
User Input
Distribution Design Issues

Why fragment at all?
How to fragment?
How much to fragment?
How to test correctness?
How to allocate?
Information requirements?
Fragmentation
Can't we just distribute relations?
What is a reasonable unit of distribution?
relation
views are subsets of relations locality
extra communication
fragments of relations (sub-relations)

concurrent execution of a number of transactions
that access different portions of a relation
views that cannot be defined on a single fragment
will require extra processing
semantic data control (especially integrity
enforcement) more difficult
Fragmentation Alternatives
Horizontal
PROJ
PROJ1 : projects with budgets less

than $200,000
PROJ2 : projects with budgets
greater than or equal to
$200,000
PROJ1
PNO
PNO
PNAME
BUDGET
P1 Instrumentation 150000
P2 Database Develop.135000
P3 CAD/CAM
250000
P4 Maintenance
310000
P5 CAD/CAM
500000
LOC
Montreal
New York
New York
Paris
Boston
PROJ2
PNAME
P1 Instrumentation
BUDGET
LOC
15000 Montreal
0
P2 Database Develop.135000 New York
PNO
P3
PNAME
CAD/CAM
BUDGET
LOC
250000 New York
P4 Maintenance
310000 Paris
P5
500000 Boston
CAD/CAM
Fragmentation Alternatives
Vertical
PROJ
PROJ1: information about

project budgets
PROJ2: information about
project names and
locations
PNO
PNAME
BUDGET
P1 Instrumentation 150000
P2 Database Develop.135000
P3
CAD/CAM
250000
P4 Maintenance
310000
P5
CAD/CAM
500000
PROJ1
PROJ2
PNO
BUDGET
PNO
P1
P2
P3
P4
P5
150000
135000
250000
310000
500000
PNAME
LOC
P1 Instrumentation Montreal
P2 Database Develop. New York
P3
CAD/CAM
New York
P4 Maintenance
Paris
P5 CAD/CAM
Boston
LOC
Montreal
New York
New York
Paris
Boston
Degree of Fragmentation
finite number of alternatives
tuples
or
attributes
relations
Finding the suitable level of partitioning within

this range
Correctness of Fragmentation
Completeness
Decomposition of relation R into fragments R1, R2, ...,
Rn is complete if and only if each data item in R can

also be found in some Ri
Reconstruction
If relation R is decomposed into fragments R1, R2, ...,
Rn, then there should exist some relational operator

such that
R = 1inRi
Disjointness
If relation R is decomposed into fragments R1, R2, ...,
Rn, and data item di is in Rj, then di should not be in

any other fragment Rk (k j ).
Allocation Alternatives
Non-replicated
partitioned : each fragment resides at only
one site
Replicated
fully replicated : each fragment at each site

partially replicated : each fragment at some
of the sites
If read-only queries << 1, replication is advantageous,
update queries
otherwise replication may cause problems
Rule of thumb:
Comparison of Replication
Alternatives
Full-replication
Partial-replication
Partitioning
QUERY
PROCESSING
Easy
Same Difficulty
DIRECTORY
MANAGEMENT
Easy or
Non-existant
Same Difficulty
CONCURRENCY
CONTROL
Moderate
Difficult
Easy
RELIABILITY
Very high
High
Low
Possible
application
Realistic
Possible
application
REALITY
Information Requirements
Four categories:
Database information
Application information
Communication network information
Computer system information
PHF Information Requirements

Database Information
relationship
SKILL
TITLE, SAL
L1
EMP
ENO, ENAME, TITLE
PROJ
PNO, PNAME, BUDGET, LOC
ASG
ENO, PNO, RESP, DUR
cardinality of each relation: card(R)
Application Information
1. Qualitative Information
. The fundamental qualitative information consists of the
predicates used in user queries.
. Analyze user queries based on 80/20 rule: 20% of user
queries account for 80% of the total data access.
One should investigate the more important queries
2. Quantitative Information
. Minterm Selectivity sel(mi): number of tuples that
would be accessed by a query specified according to
a given minterm predicate.
. Access Frequency acc(mi): the access frequency of a
given minterm predicate in a given period.
Fragmentation
Horizontal Fragmentation (HF)
Primary Horizontal Fragmentation
(PHF)
Derived Horizontal Fragmentation
(DHF)
Vertical Fragmentation (VF)
Hybrid Fragmentation (HF)
Primary Horizontal
Fragmentation
EMP table
Three branch
offices, with each

employee
working at only
one office
Create table MPLS_EMPS as

Select * From EMP Where Loc =
Minneapolis;
Create table LA_EMPS as
Select *From EMP Where Loc = LA;
Create table NY_EMPS as Select *
From EMP Where Loc = New York;
After fragmentation
Select * from MPLS_EMPS
Union
Select * from LA_EMPS)
Union
Select * from NY_EMPS;
Example 1
AP1:looking for those employees who work in Los
Angeles (LA).
Pr = {p1: Loc= LA}
M= {m1: Loc = LA, m2: Loc<>LA}
is a minimal and complete set of minterm predicates for AP1

Fragment F1: Create table LA_EMPS as
Select * from EMP Where Loc = "LA";

Fragment F2: Create table NON_LA_EMPS as
Select * from EMP Where Loc <> "LA";
Minimal :the rows are accessed differently by at least one
application.
Complete :the rows have the same probability of being
accessed by any application.
AP1:Exclude any employee whose salary was less
than or equal to 30000

Pr = {p1: Loc = "LA,p2: salary > 30000}
M = {m1: Loc = "LA" Sal > 30000,
m2: Loc = "LA" Sal <= 30000,

m3: Loc <>"LA" Sal > 30000,
m4: Loc <>"LA" Sal <= 30000}
Any fragmentation must satisfy the following rules as defined :
Rule 1: Completeness. Decomposition of R into R1, R2, . . . , Rn is complete if and only if
each data item in R can also be found in some Ri.

Rule 2: Reconstruction. If R is decomposed into R1, R2, . . . , Rn, then there should exist
some relational operator, , such that R = 1in Ri.
Rule 3: Disjointness. If R is decomposed into R1, R2, . . . , Rn, and di is a tuple in Rj, then
di should not be in any other fragment, such as Rk, where k = j.
NOTE: N simple predicates in Pr, M will have 2N minterm

predicates.
PHF - Information
Requirements
simple predicates : Given R[A1, A2, , An], a simple
predicate pj is
pj : Ai Value
where {=,<,,>,,}, Value Di and Di is the domain

of Ai.
For relation R we define Pr = {p1, p2, ,pm}
Example :
PNAME = "Maintenance"
BUDGET 200000
minterm predicates : Given R and Pr = {p1, p2, ,pm}
define M = {m1,m2,,mr} as
M = { mi | mi =
pjPrpj* }, 1jm, 1iz
where pj* = pj or pj* = (pj).
PHF Information Requirements

minterm selectivities: sel(mi)
The number of tuples of the relation that would be
accessed by a user query which is specified

according to a given minterm predicate mi.
access frequencies: acc(qi)

The frequency with which a user application qi
accesses data.
Access frequency for a minterm predicate can also be
defined.
Primary Horizontal Fragmentation

Definition :
Rj = Fj(R), 1 j w
where Fj is a selection formula, which is (preferably) a
minterm predicate.
Therefore,
A horizontal fragment Ri of relation R consists of all the
tuples of R which satisfy a minterm predicate mi.
Given a set of minterm predicates M, there are as many

horizontal fragments of relation R as there are minterm
predicates.
Set of horizontal fragments also referred to as minterm
fragments.
PHF Algorithm
Given: A relation R, the set of simple
predicates Pr
Output:The set of fragments of R = {R1, R2,
,Rw} which obey the fragmentation
rules.
Preliminaries :
Pr should be complete
Pr should be minimal
Completeness of Simple
Predicates
A set of simple predicates Pr is said to be
complete if and only if the accesses to the tuples

of the minterm fragments defined on Pr requires
that two tuples of the same minterm fragment
have the same probability of being accessed by
any application.
Example 2:
Assume PROJ[PNO,PNAME,BUDGET,LOC] has two
applications defined on it.

Find the budgets of projects at each location. (1)
Find projects with budgets less than $200000. (2)
Completeness of Simple Predicates

According to (1),
Pr={LOC=Montreal,LOC=New York,LOC=Paris}
which is not complete with respect to (2).

Modify
Pr ={LOC=Montreal,LOC=New York,LOC=Paris,
BUDGET200000,BUDGET>200000}
which is complete.
Find projects with budgets less than $200000. (2)
Minimality of Simple
Predicates
If a predicate influences how fragmentation
is performed, (i.e., causes a fragment f to

be further fragmented into, say, fi and fj)
then there should be at least one
application that accesses fi and fj
differently.
In other words, the simple predicate should
be relevant in determining a fragmentation.
If all the predicates of a set Pr are relevant,
then Pr is minimal.
acc(mi ) acc(mj )
card( fi ) card( fj )
Minimality of Simple
Predicates
Example :
Pr ={LOC=Montreal,LOC=New York,
LOC=Paris,
BUDGET200000,BUDGET>200000}
is minimal (in addition to being complete).

However, if we add
PNAME = Instrumentation
then Pr is not minimal.
COM_MIN Algorithm
Given: a relation R and a set of simple
predicates Pr
Output: a complete and minimal set of
simple predicates Pr' for Pr
Rule 1: a relation or fragment is partitioned
into at least two parts which are
accessed differently by at least one
application.
COM_MIN Algorithm
Initialization :
find a pi Pr such that pi partitions R according to
Rule 1
set Pr' = pi ; Pr Pr {pi} ; F {fi}
Iteratively add predicates to Pr' until it is complete
find a pj Pr such that pj partitions some fk defined
according to minterm predicate over Pr' according to
Rule 1
set Pr' = Pr' {pj }; Pr Pr {pj }; F F {fi}
if pk Pr' which is nonrelevant then
Pr' Pr {pk}
F F {fk}
PHORIZONTAL Algorithm
Makes use of COM_MIN to perform fragmentation.
Input: a relation R and a set of simple predicates
Pr
Output: a set of minterm predicates M according to
which relation R is to be fragmented
Pr' COM_MIN (R,Pr)
determine the set M of minterm predicates
determine the set I of implications among pi Pr
eliminate the contradictory minterms from M
PHF Example 3
Two candidate relations : PAY and PROJ.
Fragmentation of relation PAY
Application: Check the salary info and determine
raise.
Employee records kept at two sites application
run at two sites
Simple predicates
p1 : SAL 30000
p2 : SAL > 30000
Pr = {p1,p2} which is complete and minimal Pr'=Pr
Minterm predicates
m1 : (SAL 30000)
m2 : NOT(SAL 30000) (SAL > 30000)
PHF Example 3
PAY1
TITLE
PAY2
SAL
TITLE
SAL
Mech. Eng. 27000
Elect. Eng.
40000
Programmer 24000
Syst. Anal.
34000
PHF Example 3
Fragmentation of relation PROJ
Applications:
Find the name and budget of projects given their no.
Issued
at three sites
Access project information according to budget

one
site accesses 200000 other accesses >200000
Simple predicates
For application (1)
p1 : LOC = Montreal
p2 : LOC = New York
p3 : LOC = Paris
For application (2)

p4 : BUDGET 200000
p5 : BUDGET > 200000
Pr = Pr' = {p1,p2,p3,p4,p5}
PHF Example 3
Fragmentation of relation PROJ continued
Minterm fragments left
m1 : (LOC = Montreal) (BUDGET

200000)
m2 : (LOC = Montreal) (BUDGET > 200000)
m3 : (LOC = New York) (BUDGET 200000)
m4 : (LOC = New York) (BUDGET > 200000)
m5 : (LOC = Paris) (BUDGET 200000)
m6 : (LOC = Paris) (BUDGET > 200000)
PHF Example
PROJ2
PROJ1
PNO
P1
PNAME
BUDGET
Instrumentation150000
LOC
Montrea
l
PROJ4
PNO
PNAME
P3
CAD/CAM
PNO
P2
PNAME
BUDGET
LOC
Database
Develop.
135000 New York
PROJ6
BUDGET
250000
LOC
New
York
PNO
P4
PNAME
BUDGET
LOC
Maintenance
310000
Paris
PHF Correctness
Completeness
Since Pr' is complete and minimal, the selection predicates
are complete
Reconstruction
If relation R is fragmented into FR = {R1,R2,,Rr}
R =
Ri FR Ri
Disjointness
Minterm predicates that form the basis of fragmentation
should be mutually exclusive.
Derived Horizontal
Fragmentation
Defined on a member relation of a link according to a
selection operation specified on its owner.

Each link is an equijoin.
Equijoin can be implemented by means of semijoins.
SKILL
TITLE, SAL
L1
EMP
PROJ
ENO, ENAME, TITLE

L2
PNO, PNAME, BUDGET, LOC

L3
ASG
ENO, PNO, RESP, DUR
DHF Definition
Given a link L where owner(L)=S and
member(L)=R, the derived horizontal
fragments of R are defined as
Ri = R F Si, 1iw
where w is the maximum number of

fragments that will be defined on R and
Si = Fi(S)
where Fi is the formula according to which

the primary horizontal fragment Si is defined.
DHF Example
Given link L1 where owner(L1)=SKILL/PAY and member(L1)=EMP
Group engineers into two groups according to their salary: those making less
than or equal to $30,000, and those making more than $30,000.
EMP1 = EMP SKILL/PAY1
EMP2 = EMP SKILL/PAY2
where
SKILL/PAY1 = SAL30000(SKILL/PAY)
SKILL/PAY2 = SAL>30000(SKILL/PAY)
EMP1
EMP2
ENO
ENAME
E3
E4
E7
A. Lee
J. Miller
R. Davis
TITLE
Mech. Eng.
Programmer
Mech. Eng.
ENO
ENAME
TITLE
E1
E2
E5
E6
E8
J. Doe
M. Smith
B. Casey
L. Chu
J. Jones
Elect. Eng.
Syst. Anal.
Syst. Anal.
Elect. Eng.
Syst. Anal.
DHF Correctness
Completeness
Referential integrity
Let R be the member relation of a link whose owner is relation S which is fragmented
as FS = {S1, S2, ..., Sn}. Furthermore, let A be the join attribute between R and S. Then,
for each tuple t of R, there should be a tuple t' of S such that
t[A] = t' [A]
Reconstruction
Same as primary horizontal fragmentation.
Disjointness
Simple join graphs between the owner and the member fragments.
Vertical Fragmentation
Group the columns of a table into fragments.
Because each fragment contains a subset of the total set of columns in the table, VF can be
used to enforce security and/or privacy of data.

More difficult than horizontal, because more alternatives exist.
In the case of vertical partitioning, if a relation has m non-primary key attributes, the
number of possible fragments is equal to B(m), which is the mth Bell number
Two approaches :
grouping
attributes to fragments
first step creates as many vertical fragments as the number of non-key columns in the
table. Then grouping approach uses joins across the primary key, to group some of
these fragments together, and continues as needed
Not usually considered a valid approach
splitting
relation to fragments
placing each non-key column in one and only one fragment
Need to design affinity or closeness

If a table has 15 columns, then the number of
possible vertical fragments is 109 and

If the number of vertical fragments for a table
with 30 columns is 1023.
Evaluating is not practical.
Sol : Find the closeness/affinity between the
attributes to decide whether to group them into
same fragment or not
VF Information
Requirements
Attribute affinities
a measure that indicates how closely related the attributes are
This is obtained by: access frequency + usage pattern
Access freq: how many times an application/query runs in a given
period of time at different sites
Usage pattern:Indicates whether a column is used by an
application/query.
Attribute usage values
Given a set of queries Q = {q1, q2,, qq} that will run on the
relation
R[A1 , A2,, An],
1 if attribute Aj is referenced by query qi
use(qi,Aj) =
0 otherwise
use(qi,) can be defined accordingly
VF Definition of use(qi,Aj)
Consider the following 4 queries for relation PROJ
q1: SELECT BUDGET q2: SELECT PNAME,BUDGET
FROM PROJ
FROM PROJ
WHERE
PNO=Value
q3: SELECT PNAMEq4: SELECT SUM(BUDGET)
FROM PROJ
FROM PROJ
WHERE
LOC=Value
WHERE
LOC=Value
Let A1= PNO, A2= PNAME, A3= BUDGET, A4=

LOC
A1
A2
A3
A4
q1
q2
q3
q4
VF Affinity Measure
af(Ai,Aj)
The attribute affinity measure between two
attributes Ai and Aj of a relation R[A1, A2, , An]
with respect to the set of applications Q = (q1, q2,
, qq) is defined as follows :
af (Ai, Aj)
(query access)
all queries that access A and A

i
query access
access
access frequency of a query
execution
all sites
VF Calculation of af(Ai, Aj)

Example: Assume each query in the previous example
accesses the attributes once during each execution.
S1 S2 S3
Also assume the access frequencies
q1
q1
q2
q2
q3
q3
q4
q4
15 20
5
10
25 25
25
Then
af(A1, A3) = 15*1 + 20*1+10*1
= 45
and the attribute affinity matrix AA is
A1
A2
A3
A4
A1 A2 A3 A4
45 0 45 0
5 75
0 80
45 5 53 3
3 78
0 75
VF Clustering Algorithm
Take the attribute affinity matrix AA and
reorganize the attribute orders to form

clusters where the attributes in each
cluster demonstrate high affinity to one
another.
Bond Energy Algorithm (BEA) has been
used for clustering of entities. BEA finds an
ordering of entities (in our case attributes)
such that the global affinity measure is
maximized.
Bond Energy Algorithm

Input: The AA affinity matrix
Output: The clustered affinity matrix CA which
is a perturbation of AA
Initialization: Place and fix one of the columns of
AA in CA.
Iteration: Place the remaining n-i columns in the
remaining i+1 positions in the CA matrix. For
each column, choose the placement that makes
the most contribution to the global affinity
measure.
Row order: Order the rows according to the
column ordering.
Bond Energy Algorithm

Best placement? Define contribution of a
placement:
cont(Ai, Ak, Aj) = 2bond(Ai, Ak)+2bond(Ak, Al)
2bond(Ai, Aj)
n
where
af(Az,Ax)af(Az,Ay)
bond(Ax,Ay
)=
z 1
BEA Example
Consider the following AA matrix and the corresponding CA matrix
where A1 and A2 have been placed. Place A3:
Ordering (0-3-1) :
cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)2bond(A0 , A1)
= 2* 0 + 2* 4410 2*0 = 8820
Ordering (1-3-2) :
cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)2bond(A1,A2)
= 2* 4410 + 2* 890 2*225 = 10150
Ordering (2-3-4) :
cont (A2,A3,A4)= 1780
BEA Example
Therefore, the CA matrix has the form
A1 A3 A2
45 45
0
5 80
45 53
0
3 75
When A is placed, the final form of the CA

4
matrix (after row
A1 A3 A2 A4
organization)
is 45 0 0
A1 45
A3 45 53
A2
5 80 75
A4
3 75 78
VF Algorithm
How can you divide a set of clustered attributes {A1, A2, , An} into two (or more) sets {A1, A2, , Ai} and {Ai, , An} such that there are no (or minimal)
applications that access both (or more than one) of the sets.
the function will produce fragments that are balanced.
For Best partitioning split the columns into a one-column BC and n 1 column TC first , and then repeatedly add columns from TC to BC until TC
is
left with only one column.
choose the splitting that has the highest Z value.
Z can be positive if total accesses to only one fragment are maximized while the total accesses to both fragments are minimized
Disadvantage :Not able to carve out an embedded or inner block of columns as a partition.
Sol: Overcome by adding a shift operation(moves the topmost row of the matrix to the bottom and then it moves the leftmost column of the matrix to the
extreme right)
A1 A2 A3 Ai Ai+1. . .Am
...
A1
A2
TA
Ai
...
Ai+1
Am
BA
SHIFT OPERATION
VF ALgorithm
Define
TQ = set of applications that access only TA(Top corner
attributes)
BQ = set of applications that access only BA
OQ = set of applications that access both TA and BA
and
CTQ = total number of accesses to attributes by applications
that access only TA
CBQ = total number of accesses to attributes by applications
that access only BA
COQ = total number of accesses to attributes by applications
that access both TA and BA
Then find the point along the diagonal that maximizes

Goal Function: Z =CTQCBQCOQ2
Example
1)
TC: all the applications that access one of the TC columns (C4, C1, or
C3) but do not access any BC
[i.e AP1, AP2, and AP4]
No application is BC-only.
AP3 accesses both TC and BC columns,
TCW = AFF(AP1) + AFF(AP2) + AFF(AP4) = 3 + 7 + 3 = 13

BCW = none = 0
BOCW = AFF(AP3) = 4
Z = 13*0 42 = 16
2) TC
TCW = AFF(AP1) + AFF(AP2) = 3 + 7 = 10
BCW = AFF(AP3) = 4
BOCW = AFF(AP4) = 3
Z = 4*10 32 = 40 9 = 31
Result :The two vertical fragments will be defined as VF1(C, C4) and
VF2(C, C1, C2, C3).
3)
TCW = AFF(AP2) = 7
BCW = AFF(AP3) + AFF(AP4) = 4 + 3 = 7
BOCW = AFF(AP1) = 3
Z = 7*7 32 = 49 9 = 40
4)
TCW = AFF(AP4) = 3
BCW = AFF(AP2) = 7
BOCW = AFF(AP1) + AFF(AP3) = 3 + 4 =
7
Z = 3*7 72 = 21 49 = 28
VF Algorithm
Two problems :
Cluster forming in the middle of the CA matrix
Shift a row up and a column left and apply the
algorithm to find the best partitioning point

Do this for all possible shifts
Cost O(m2)
More than two clusters

m-way partitioning
try 1, 2, , m1 split points along diagonal and try to
find the best point for each of these

Cost O(2m)
VF Correctness
A relation R, defined over attribute set A and key K,
generates the vertical partitioning FR = {R1, R2, , Rr}.
Completeness
The following should be true for A:
A=
AR i
Reconstruction
Reconstruction can be achieved by
R=
K Ri, Ri FR
Disjointness
TID's are not considered to be overlapping since they are
maintained by the system

Duplicated keys are not considered to be overlapping
Hybrid Fragmentation
R
HF
HF
R1
R2
VF
VF
R11
R12
VF
R21
VF
R22
VF
R23
Fragment Allocation
Problem Statement
Given
F = {F1, F2, , Fn}
fragments
S ={S1, S2, , Sm}
network sites
Q = {q1, q2,, qq}
applications
Find the "optimal" distribution of F to S.

Optimality
Minimal cost
Communication + storage + processing (read & update)
Cost in terms of time (usually)
Performance
Response time and/or throughput
Constraints
Per site constraints (storage & processing)
Information Requirements
Database information
selectivity of fragments
size of a fragment
Application information
access types and numbers

access localities
Communication network information

unit cost of storing data at a site
unit cost of processing at a site
Computer system information

bandwidth
latency
communication overhead
Allocation
File Allocation (FAP) vs Database Allocation
(DAP):
Fragments are not individual files
relationships have to be maintained
Access to databases is more complicated

remote file access model not applicable
relationship between allocation and query processing
Cost of integrity enforcement should be
considered
Cost of concurrency control should be
considered
Allocation Information
Requirements
Database Information
selectivity of fragments
size of a fragment
number of read accesses of a query to a fragment
number of update accesses of query to a fragment
A matrix indicating which queries updates which fragments
A similar matrix for retrievals
originating site of each query
Site Information
unit cost of storing data at a site
unit cost of processing at a site
Network Information
communication cost/frame between two sites
frame size
Allocation Model
General Form
min(Total Cost)
subject to
response time constraint
storage constraint
processing constraint
Decision Variable
xij
1 if fragment Fi is stored at site Sj

0 otherwise
Allocation Model
Total Cost
query processing cost
all queries
cost of storing a fragment at a site
all sites all fragments
Storage Cost (of fragment Fj at Sk)

(unit storage cost at Sk) (size of Fj) xjk
Query Processing Cost (for one query)
processing component + transmission component
Allocation Model
Query Processing Cost
Processing component
access cost + integrity enforcement cost +
concurrency control cost
Access cost
(no. of update accesses+ no. of read accesses)
xij local processing cost at a site
Integrity enforcement and concurrency
control costs
Can be similarly calculated
Allocation Model
Query Processing Cost
Transmission component
cost of processing updates + cost of processing
retrievals
Cost of updates
update message cost

acknowledgment cost
Retrieval Cost
minall sites
(cost of retrieval command
all fragments
cost of sending back the result)
Allocation Model
Constraints
Response Time
execution time of query max. allowable response
time for that query
Storage Constraint (for a site)
storage requirement of a fragment at that

site at that site
storage capacity
all fragments
Processing constraint (for a site)
processing load of a query at that site
all queries
processing capacity of that site
Allocation Model
Solution Methods
FAP is NP-complete
DAP also NP-complete
Heuristics based on
single commodity warehouse location (for
FAP)
knapsack problem
branch and bound techniques
network flow
Allocation Model
Attempts to reduce the solution space
assume all candidate partitionings known;
select the best partitioning

ignore replication at first
sliding window on fragments

Distributed Database Design: Data Placement and Fragmentation

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Distributed Database Design: Data Placement and Fragmentation

Transféré par

Droits d'auteur :

Formats disponibles

Distributed Database Design

Making decisions about the placement of data

Dimensions of the Problem

Distribution Design Issues

fragments of relations (sub-relations)

PROJ1 : projects with budgets less

250000 New York

PROJ1: information about

Finding the suitable level of partitioning within

Rn is complete if and only if each data item in R can

Rn, then there should exist some relational operator

Rn, and data item di is in Rj, then di should not be in

partitioned : each fragment resides at only

fully replicated : each fragment at each site

otherwise replication may cause problems

Communication network information

Computer system information

PHF Information Requirements

cardinality of each relation: card(R)

offices, with each

Create table MPLS_EMPS as

M= {m1: Loc = LA, m2: Loc<>LA}

is a minimal and complete set of minterm predicates for AP1

Select * from EMP Where Loc = "LA";

AP1:Exclude any employee whose salary was less

than or equal to 30000

m2: Loc = "LA" Sal <= 30000,

each data item in R can also be found in some Ri.

NOTE: N simple predicates in Pr, M will have 2N minterm

where {=,<,,>,,}, Value Di and Di is the domain

minterm predicates : Given R and Pr = {p1, p2, ,pm}

pjPrpj* }, 1jm, 1iz

where pj* = pj or pj* = (pj).

PHF Information Requirements

accessed by a user query which is specified

access frequencies: acc(qi)

Primary Horizontal Fragmentation

Given a set of minterm predicates M, there are as many

complete if and only if the accesses to the tuples

applications defined on it.

Completeness of Simple Predicates

which is not complete with respect to (2).

Find projects with budgets less than $200000. (2)

is performed, (i.e., causes a fragment f to

is minimal (in addition to being complete).

then Pr is not minimal.

Mech. Eng. 27000

Access project information according to budget

site accesses 200000 other accesses >200000

For application (2)

m1 : (LOC = Montreal) (BUDGET

135000 New York

should be mutually exclusive.

selection operation specified on its owner.

ENO, ENAME, TITLE

PNO, PNAME, BUDGET, LOC

where w is the maximum number of

where Fi is the formula according to which

used to enforce security and/or privacy of data.

Need to design affinity or closeness

possible vertical fragments is 109 and

use(qi,) can be defined accordingly