Lecture12 - Vertical Fragmentation - II
Lecture12 - Vertical Fragmentation - II
where
n
bond(Ax,Ay) = aff(Az,Ax)aff(Az,Ay)
z 1
Ordering (0-3-1) :
cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)–2bond(A0 , A1)
= 2* 0 + 2* 4410 – 2*0 = 8820
Ordering (1-3-2) :
cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)–2bond(A1,A2)
= 2* 4410 + 2* 890 – 2*225 = 10150
Ordering (2-3-4) :
cont (A2,A3,A4) = 1780
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/4
BEA – Example
• Therefore, the CA matrix has the form A1 A3 A2
45 45 0
0 5 80
45 53 5
0 3 75
• When A4 is placed, the final form of the CA matrix (after row organization)
is A1 A3 A 2 A4
A1 45 45 0 0
A3 45 53 5 3
A2 0 5 80 75
A4 0 3 75 78
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/5
VF – Algorithm
How can you divide a set of clustered attributes {A1, A2, …, An}
into two (or more) sets {A1, A2, …, Ai} and {Ai, …, An} such that
there are no (or minimal) applications that access both (or more
than one) of the sets.
A1 A2 A3 … Ai Ai+1 . . A
. m
A1
A2
TA
Ai
Ai+1
BA
Am
CTQCBQCOQ2
A = ARi
• Reconstruction
➡ Reconstruction can be achieved by
R = ⋈•
K Ri, Ri FR
• Disjointness
➡ TID's are not considered to be overlapping since they are maintained by the
system
➡ Duplicated keys are not considered to be overlapping
ii. The BEA then converts this similarity matrix to a BOND matrix in which the
entries represent a type of nearest neighbor bonding based on probability of
co-access. The BEA algorithm rearranges rows or columns so that similar
attributes appear close together in the matrix.
iii. Finally, the designer draws boxes around regions in the matrix with high
similarity.
The resulting matrix, modified from, is illustrated in Figure. The two shaded
boxes represent the attributes that have been grouped together into two
clusters.
Given the following access characteristics and access frequencies for Q1,...,Q4, calculate the optimal
vertical splitting using the Bond Energy Algorithm (BEA),
Steps :
1. Prepare an affinity matrix.
2. Apply BEA algorithm.
3. Perform vertical splitting by maximizing the split quality.
q1: A1 A2 A3 21
q2: A3 A4 24
q3: A2 A4 A5 90
q4: A3 A5 11
A1 A2 A3 A4 A5
A1 21 21 21 0 0
A2 21 111 21 9 9
A3 21 21 56 0 0
A4 0 2
90 24 114 1
A5 0 90 11 904 1
9
0
1
Place attributes: 0
place A1 1
contributioco at pos 0 = 2121
ntributionn at pos 1 = -1598
contribution at pos 2 = 2058
contribution is placed at pos 0: [A1, A5, A3]
attribute A1
place A2
contribution at pos 0 = 3213
contribution at pos 1 = 28503
contribution at pos 2 = 28732
contribution at pos 3 = 7098
attribute A2 is pos 2:
placed at [A1, A5, A2, A3]
place A4
contribution at 0 = 2394
pos
contribution at 1 = 27987
pos
contribution at pos
2 = 29157
contribution at pos
3 = 28716
contribution at pos
4 = 6960
attribute A4 is placed at pos 2: [A1, A5, A4, A2, A3]
find fragments:
split at [A1, A2, A3, A4] | [A5]
accesses frag1 alone: 45
accesses frag2 alone: 0
accesses frag1 and frag2: 101
split quality = -10201