0% found this document useful (0 votes)

339 views

Lecture12 - Vertical Fragmentation - II

Features and uses

Uploaded by

Butt

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

339 views

Lecture12 - Vertical Fragmentation - II

Features and uses

Uploaded by

Butt

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 15

Vertical Fragmentation

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/1

Bond Energy Algorithm
Input: The AA matrix
Output: The clustered affinity matrix CA which is a perturbation of AA
 Initialization: Place and fix one of the columns of AA in CA.
 Iteration: Place the remaining n-i columns in the remaining i+1 positions in
the CA matrix. For each column, choose the placement that makes the most
contribution to the global affinity measure.
 Row order: Order the rows according to the column ordering.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/2

Bond Energy Algorithm
“Best” placement? Define contribution of a placement:

cont(Ai, Ak, Aj) = 2bond(Ai, Ak)+2bond(Ak, Aj) –2bond(Ai, Aj)

where

n
bond(Ax,Ay) =  aff(Az,Ax)aff(Az,Ay)
z 1

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/3

BEA – Example
Consider the following AA matrix and the corresponding CA matrix where
A1 and A2 have been placed. Place A3:

Ordering (0-3-1) :
cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)–2bond(A0 , A1)
= 2* 0 + 2* 4410 – 2*0 = 8820
Ordering (1-3-2) :
cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)–2bond(A1,A2)
= 2* 4410 + 2* 890 – 2*225 = 10150
Ordering (2-3-4) :
cont (A2,A3,A4) = 1780
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/4
BEA – Example
• Therefore, the CA matrix has the form A1 A3 A2

45 45 0
0 5 80
45 53 5
0 3 75

• When A4 is placed, the final form of the CA matrix (after row organization)
is A1 A3 A 2 A4
A1 45 45 0 0
A3 45 53 5 3
A2 0 5 80 75
A4 0 3 75 78
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/5
VF – Algorithm
How can you divide a set of clustered attributes {A1, A2, …, An}
into two (or more) sets {A1, A2, …, Ai} and {Ai, …, An} such that
there are no (or minimal) applications that access both (or more
than one) of the sets.

A1 A2 A3 … Ai Ai+1 . . A
. m
A1
A2
TA
Ai

Ai+1
BA
Am

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/6

VF – Algorithm
Define
TQ = set of applications that access only TA
BQ = set of applications that access only BA
OQ = set of applications that access both TA and BA
and
CTQ =total number of accesses to attributes by applications
that access only TA
CBQ =total number of accesses to attributes by applications
that access only BA
COQ = total number of accesses to attributes by applications
that access both TA and BA
Then find the point along the diagonal that maximizes

CTQCBQCOQ2

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/7

VF – Algorithm
Two problems :
Cluster forming in the middle of the CA matrix
➡ Shift a row up and a column left and apply the algorithm to find the “best”
partitioning point
➡ Do this for all possible shifts
➡ Cost O(m2)

More than two clusters

➡ m-way partitioning
➡ try 1, 2, …, m–1 split points along diagonal and try to find the best point for
each of these
➡ Cost O(2m)

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/8

VF – Correctness
A relation R, defined over attribute set A and key K, generates the vertical
partitioning FR = {R1, R2, …, Rr}.
• Completeness
➡ The following should be true for A:

A =  ARi
• Reconstruction
➡ Reconstruction can be achieved by

R = ⋈•
K Ri, Ri  FR

• Disjointness
➡ TID's are not considered to be overlapping since they are maintained by the
system
➡ Duplicated keys are not considered to be overlapping

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/9

Extra Stuff

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/10

The basic steps of this clustering
algorithm are:
i. Create an attribute affinity matrix in which each entry indicates the affinity
between the two associate attributes. The entries in the similarity matrix are
based on the frequency of common usage of attribute pairs.

ii. The BEA then converts this similarity matrix to a BOND matrix in which the
entries represent a type of nearest neighbor bonding based on probability of
co-access. The BEA algorithm rearranges rows or columns so that similar
attributes appear close together in the matrix.

iii. Finally, the designer draws boxes around regions in the matrix with high
similarity.
The resulting matrix, modified from, is illustrated in Figure. The two shaded
boxes represent the attributes that have been grouped together into two
clusters.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/11

Vertical Splitting
Bond Energy
Algorithm

Given the following access characteristics and access frequencies for Q1,...,Q4, calculate the optimal
vertical splitting using the Bond Energy Algorithm (BEA),
Steps :
1. Prepare an affinity matrix.
2. Apply BEA algorithm.
3. Perform vertical splitting by maximizing the split quality.

Name Family Age Position Location

Q1 1 1 1 0 0
Q2 0 0 1 1 0
Q3 0 1 0 1 1
Q4 0 0 1 0 1

q1: A1 A2 A3 21
q2: A3 A4 24
q3: A2 A4 A5 90
q4: A3 A5 11

A1 A2 A3 A4 A5
A1 21 21 21 0 0
A2 21 111 21 9 9
A3 21 21 56 0 0
A4 0 2
90 24 114 1
A5 0 90 11 904 1
9
0
1
Place attributes: 0
place A1 1
contributioco at pos 0 = 2121
ntributionn at pos 1 = -1598
contribution at pos 2 = 2058
contribution is placed at pos 0: [A1, A5, A3]
attribute A1
place A2
contribution at pos 0 = 3213
contribution at pos 1 = 28503
contribution at pos 2 = 28732
contribution at pos 3 = 7098
attribute A2 is pos 2:
placed at [A1, A5, A2, A3]

place A4
contribution at 0 = 2394
pos
contribution at 1 = 27987
pos
contribution at pos
2 = 29157
contribution at pos
3 = 28716
contribution at pos
4 = 6960
attribute A4 is placed at pos 2: [A1, A5, A4, A2, A3]

resulting order: [A1, A5, A4, A2, A3]

find fragments:
split at [A1, A2, A3, A4] | [A5]
accesses frag1 alone: 45
accesses frag2 alone: 0
accesses frag1 and frag2: 101
split quality = -10201

split at [A1, A2, A3] | [A4, A5]

accesses frag1 alone: 21
accesses frag2 alone: 0
accesses frag1 and frag2: 125
split quality = -15625

split at [A1, A3] | [A2, A4, A5]

accesses frag1 alone: 0
accesses frag2 alone: 90
accesses frag1 and frag2: 56
split quality = -3136

split at [A1] | [A2, A3, A4, A5]

accesses frag1 alone: 0
accesses frag2 alone: 125
accesses frag1 and frag2: 21
split quality = -441

split at [A1, A2, A3, A5] | [A4]

accesses frag1 alone: 32
accesses frag2 alone: 0
accesses frag1 and frag2: 114
split quality = -12996

split at [A1, A3, A5] | [A2, A4]

accesses frag1 alone: 11
accesses frag2 alone: 0
accesses frag1 and frag2: 135
split quality = -18225

split at [A1, A5] | [A2, A3, A4]

accesses frag1 alone: 0
accesses frag2 alone: 24
accesses frag1 and frag2: 122
split quality = -14884

split at [A1, A3, A4, A5] | [A2]

accesses frag1 alone: 35
accesses frag2 alone: 0
accesses frag1 and frag2: 111
split quality = -12321
split at [A1, A4, A5] | [A2, A3]
accesses frag1 alone: 0
accesses frag2 alone: 0
accesses frag1 and frag2: 146
split quality = -21316

split at [A1, A2, A4, A5] | [A3]

accesses frag1 alone: 90
accesses frag2 alone: 0
accesses frag1 and frag2: 56
split quality = -3136

optimal split(s) (sq = -441):

[A1] | [A2, A3, A4, A5]

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
CN101A Timer Manual
50% (2)
CN101A Timer Manual
2 pages
Aspnet The Complete Reference by Matthew Macdonald Robert Standefer 0072195134 PDF
No ratings yet
Aspnet The Complete Reference by Matthew Macdonald Robert Standefer 0072195134 PDF
5 pages
B. Discuss Key Enabling Technologies in Cloud Computing Systems
No ratings yet
B. Discuss Key Enabling Technologies in Cloud Computing Systems
3 pages
Karnataka PGCET MCA Syllabus PDF
No ratings yet
Karnataka PGCET MCA Syllabus PDF
2 pages
Change Management: - Introduction - SCM Repository - The SCM Process
No ratings yet
Change Management: - Introduction - SCM Repository - The SCM Process
27 pages
Unit-3 Part 1 Normalization
No ratings yet
Unit-3 Part 1 Normalization
31 pages
MSC Datascience Unit1
No ratings yet
MSC Datascience Unit1
20 pages
RMT 2 Marks
100% (3)
RMT 2 Marks
22 pages
Railway Reservation System
0% (1)
Railway Reservation System
15 pages
AI
0% (1)
AI
7 pages
BSC Mpcs 2nd Sem CPP Notes
No ratings yet
BSC Mpcs 2nd Sem CPP Notes
97 pages
JNTUA MCA V Semester R17 Syllabus
No ratings yet
JNTUA MCA V Semester R17 Syllabus
24 pages
Module 4. Planning Projects - PM
100% (1)
Module 4. Planning Projects - PM
39 pages
DBMS Unit 4
No ratings yet
DBMS Unit 4
71 pages
Chapter IIb Arithmetic Circuits
No ratings yet
Chapter IIb Arithmetic Circuits
9 pages
An Intelligent Approach For Food Standards Prediction Using Machine Learning
100% (1)
An Intelligent Approach For Food Standards Prediction Using Machine Learning
11 pages
Unit 5 PHP Question Bank
No ratings yet
Unit 5 PHP Question Bank
14 pages
Example 1: Simplify The Following Boolean Expression. Using Boolean Algebra Postulates and
No ratings yet
Example 1: Simplify The Following Boolean Expression. Using Boolean Algebra Postulates and
10 pages
DS Assignment 3rd Sem IPU
No ratings yet
DS Assignment 3rd Sem IPU
6 pages
Java Model Paper Answer
No ratings yet
Java Model Paper Answer
18 pages
Computer Architecture 2 Marks
0% (1)
Computer Architecture 2 Marks
32 pages
C Programming and Assembly Language: Instructions
No ratings yet
C Programming and Assembly Language: Instructions
4 pages
ML Lab Programs (1-12)
No ratings yet
ML Lab Programs (1-12)
35 pages
Modeling and Detection of Camouflaging Worm
No ratings yet
Modeling and Detection of Camouflaging Worm
37 pages
Unit4 Datascience
No ratings yet
Unit4 Datascience
43 pages
OBJECT ORIENTED SYSTEM DESIGN Question Paper 21 22
No ratings yet
OBJECT ORIENTED SYSTEM DESIGN Question Paper 21 22
3 pages
Unit No 4 Slides Full
No ratings yet
Unit No 4 Slides Full
133 pages
Continuous System Simulation
No ratings yet
Continuous System Simulation
40 pages
E Business Management QP 2 23.11.23
No ratings yet
E Business Management QP 2 23.11.23
3 pages
Project Management Foundation: Subject Incharge: Dr. Rahul V. Dandage
100% (1)
Project Management Foundation: Subject Incharge: Dr. Rahul V. Dandage
74 pages
Mis PowerPoint Presentation
100% (1)
Mis PowerPoint Presentation
16 pages
Dbms Unit 1 Notes
0% (1)
Dbms Unit 1 Notes
14 pages
Debre Tabor University: Network and System Administration
No ratings yet
Debre Tabor University: Network and System Administration
33 pages
Unit - V Packages & Gui
No ratings yet
Unit - V Packages & Gui
41 pages
Query Language
No ratings yet
Query Language
44 pages
CSE322 Formal Languages and Automation Theory 150+ MCQ Questions
No ratings yet
CSE322 Formal Languages and Automation Theory 150+ MCQ Questions
70 pages
Database of Blood Bank1
67% (3)
Database of Blood Bank1
29 pages
B.B.A (C.a) 2019 Pattern
No ratings yet
B.B.A (C.a) 2019 Pattern
75 pages
DS Unit I
100% (1)
DS Unit I
40 pages
ML Interview Questions
No ratings yet
ML Interview Questions
7 pages
Training and Placement Wing Circular
No ratings yet
Training and Placement Wing Circular
35 pages
CSC 371 DB-I Ver3.6
No ratings yet
CSC 371 DB-I Ver3.6
3 pages
HTML
No ratings yet
HTML
8 pages
Anna University OOPS Question Bank Unit 2
No ratings yet
Anna University OOPS Question Bank Unit 2
6 pages
Hbase PPT PDF
No ratings yet
Hbase PPT PDF
100 pages
Error Detection and Recovery in Compiler Design PDF
No ratings yet
Error Detection and Recovery in Compiler Design PDF
2 pages
HTML Block and Inline Elements
No ratings yet
HTML Block and Inline Elements
3 pages
Web Technology Lesson Plan
No ratings yet
Web Technology Lesson Plan
4 pages
Distributed File Systems: Unit - V Essay Questions
No ratings yet
Distributed File Systems: Unit - V Essay Questions
10 pages
GWA - Lab Workbook
50% (2)
GWA - Lab Workbook
70 pages
Technical Seminar Report and PPT Format
100% (1)
Technical Seminar Report and PPT Format
2 pages
Chapter 8 - Aspect-Oriented Software Engineering (Lecture 11)
100% (1)
Chapter 8 - Aspect-Oriented Software Engineering (Lecture 11)
20 pages
P131 CMP506
0% (1)
P131 CMP506
2 pages
Data Warehousing & Data Mining
No ratings yet
Data Warehousing & Data Mining
1 page
Introduction To ICT MCQ Exercise - CH20
No ratings yet
Introduction To ICT MCQ Exercise - CH20
2 pages
OOSE Lab Report
No ratings yet
OOSE Lab Report
30 pages
Taxonomy
No ratings yet
Taxonomy
30 pages
Hand Written Notes-DC
No ratings yet
Hand Written Notes-DC
51 pages
Shift Micro-Operations
100% (5)
Shift Micro-Operations
8 pages
Handling Big Tabular Data of ICT Supply Chains A Multi Task Machine Interpretable Approach
No ratings yet
Handling Big Tabular Data of ICT Supply Chains A Multi Task Machine Interpretable Approach
13 pages
BusMgt Case 1 - Charlotte's Designer and Tailoring Shop
No ratings yet
BusMgt Case 1 - Charlotte's Designer and Tailoring Shop
13 pages
Problem Solving
No ratings yet
Problem Solving
3 pages
1vi - Unit A6a Unit 2
No ratings yet
1vi - Unit A6a Unit 2
2 pages
Buletin de Analiza Nr. 21326L0189: Din 26/03/2021 Stoian Irimie Dionisie Dumitru
No ratings yet
Buletin de Analiza Nr. 21326L0189: Din 26/03/2021 Stoian Irimie Dionisie Dumitru
8 pages
Lesson Plan: Heat and Mass Transfer
No ratings yet
Lesson Plan: Heat and Mass Transfer
6 pages
BU Exam-III Advanced Shots
No ratings yet
BU Exam-III Advanced Shots
13 pages
PDF Nivea
No ratings yet
PDF Nivea
6 pages
PD200
No ratings yet
PD200
5 pages
PUC B2 Monologue Topics
No ratings yet
PUC B2 Monologue Topics
7 pages
90 9
No ratings yet
90 9
4 pages
Harsnet: 4. Thermal Screening
No ratings yet
Harsnet: 4. Thermal Screening
10 pages
Special - NCLEX - Exam - Jakarta - Students - PDF Filename UTF-8''Special NCLEX Exam - Jakarta Students
100% (1)
Special - NCLEX - Exam - Jakarta - Students - PDF Filename UTF-8''Special NCLEX Exam - Jakarta Students
23 pages
The Story of My LifeBeing Reminiscences of Sixty Years' Public Service in Canada by Ryerson, Egerton, 1803-1882
No ratings yet
The Story of My LifeBeing Reminiscences of Sixty Years' Public Service in Canada by Ryerson, Egerton, 1803-1882
484 pages
Avaliação e Mapeamento de Recursos Geotérmicos EGS No Brasil, Profa. Suze
No ratings yet
Avaliação e Mapeamento de Recursos Geotérmicos EGS No Brasil, Profa. Suze
18 pages
HYDROPOWER STRUCTURES EMBANKMENT Dams
No ratings yet
HYDROPOWER STRUCTURES EMBANKMENT Dams
29 pages
Burning The House
100% (1)
Burning The House
47 pages
Opencpn Raspberry Pi4 Plotter V1e
No ratings yet
Opencpn Raspberry Pi4 Plotter V1e
9 pages
OpenStack Networking Essentials - Sample Chapter
No ratings yet
OpenStack Networking Essentials - Sample Chapter
23 pages
Experiment 13: Data Structure & Algorithm Lab
No ratings yet
Experiment 13: Data Structure & Algorithm Lab
7 pages
Statistical Causal Inferences and Their Applications in Public Health Research-Springer International Publishing (2016)
100% (1)
Statistical Causal Inferences and Their Applications in Public Health Research-Springer International Publishing (2016)
324 pages
User's Guide Filterinspector
No ratings yet
User's Guide Filterinspector
57 pages
Textile Mill: - Spinning Mill - Weaving - Loom Motors
No ratings yet
Textile Mill: - Spinning Mill - Weaving - Loom Motors
11 pages
The Creature From Jekyll Island by G. Edward Griffin
No ratings yet
The Creature From Jekyll Island by G. Edward Griffin
77 pages
Shock Accelerometer Calibration K9525C Datasheet (DS 0053) 2
No ratings yet
Shock Accelerometer Calibration K9525C Datasheet (DS 0053) 2
2 pages
Afghan CDP Cso 2011 14
No ratings yet
Afghan CDP Cso 2011 14
137 pages
The Location of Culture
No ratings yet
The Location of Culture
5 pages
Chap.8 Shepsle
No ratings yet
Chap.8 Shepsle
31 pages
Operation Manual YALE
No ratings yet
Operation Manual YALE
56 pages
XARA Manual
100% (1)
XARA Manual
370 pages