Advance Database Management Systems :24
Prof Neeraj Bhargava
Vaibhav Khanna
Department of Computer Science
School of Engineering and Systems Sciences
Maharshi Dayanand Saraswati University Ajmer
Data Fragmentation
• Distributed Database systems provide distribution transparency of the data over the
DBs. This is achieved by the concept called Data Fragmentation.
• That means, fragmenting the data over the network and over the DBs.
• Initially all the DBs and data are designed as per the standards of any database
system – by applying normalization and denormalization.
• But the concept of distributed system makes these normalized data to be divided
further. That means the main goal of DDBMS is to provide the data to the user from
the nearest location to them and as fast as possible.
• Hence the data in a table are divided according their location or as per user’s
requirement.
Distributed Database
• A distributed database (DDB) is an integrated collection of databases that
is physically distributed across sites in a computer network.
• A distributed database management system (DDBMS) is the software
system that manages a distributed database such that the distribution
aspects are transparent to the users.
• To form a distributed database system (DDBS), the files must be
structured, logically interrelated, and physically distributed across multiple
sites. In addition, there must be a common interface to access the
distributed data.
Slide 25- 4
Data Fragmentation, Replication and Allocation
• Data Fragmentation
– Split a relation into logically related and correct
parts. A relation can be fragmented in two ways:
• Horizontal Fragmentation
• Vertical Fragmentation
Slide 25- 5
Data Fragmentation, Replication and Allocation
• Horizontal fragmentation
– It is a horizontal subset of a relation which contain those of
tuples which satisfy selection conditions.
– Consider the Employee relation with selection condition (DNO =
5). All tuples satisfy this condition will create a subset which will
be a horizontal fragment of Employee relation.
– A selection condition may be composed of several conditions
connected by AND or OR.
– Derived horizontal fragmentation: It is the partitioning of a
primary relation to other secondary relations which are related
with Foreign keys.
Slide 25- 6
Data Fragmentation, Replication and Allocation
• Vertical fragmentation
– It is a subset of a relation which is created by a subset of
columns. Thus a vertical fragment of a relation will contain
values of selected columns. There is no selection condition
used in vertical fragmentation.
– Consider the Employee relation. A vertical fragment of can be
created by keeping the values of Name, Bdate, Sex, and
Address.
– Because there is no condition for creating a vertical fragment,
each fragment must include the primary key attribute of the
parent relation Employee. In this way all vertical fragments of a
relation are connected.
Slide 25- 7
Data Fragmentation, Replication and Allocation
• Representation
– Horizontal fragmentation
• Each horizontal fragment on a relation can be specified by a sCi (R)
operation in the relational algebra.
• Complete horizontal fragmentation
• A set of horizontal fragments whose conditions C1, C2, …, Cn
include all the tuples in R- that is, every tuple in R satisfies (C1 OR
C2 OR … OR Cn).
• Disjoint complete horizontal fragmentation: No tuple in R satisfies
(Ci AND Cj) where i ≠ j.
• To reconstruct R from horizontal fragments a UNION is applied.
Slide 25- 8
Data Fragmentation, Replication and Allocation
• Representation
– Vertical fragmentation
• A vertical fragment on a relation can be specified by a Li(R)
operation in the relational algebra.
• Complete vertical fragmentation
• A set of vertical fragments whose projection lists L1, L2, …, Ln
include all the attributes in R but share only the primary key of R.
In this case the projection lists satisfy the following two conditions:
• L1  L2  ...  Ln = ATTRS (R)
• Li  Lj = PK(R) for any i j, where ATTRS (R) is the set of attributes of
R and PK(R) is the primary key of R.
• To reconstruct R from complete vertical fragments a OUTER
UNION is applied.
Slide 25- 9
Data Fragmentation, Replication and Allocation
• Representation
– Mixed (Hybrid) fragmentation
• A combination of Vertical fragmentation and Horizontal
fragmentation.
• This is achieved by SELECT-PROJECT operations which is
represented by Li(sCi (R)).
• If C = True (Select all tuples) and L ≠ ATTRS(R), we get a
vertical fragment, and if C ≠ True and L ≠ ATTRS(R), we
get a mixed fragment.
• If C = True and L = ATTRS(R), then R can be considered a
fragment.
Slide 25- 10
Data Fragmentation, Replication and Allocation
• Fragmentation schema
– A definition of a set of fragments (horizontal or vertical or
horizontal and vertical) that includes all attributes and tuples in
the database that satisfies the condition that the whole
database can be reconstructed from the fragments by applying
some sequence of UNION (or OUTER JOIN) and UNION
operations.
• Allocation schema
– It describes the distribution of fragments to sites of distributed
databases. It can be fully or partially replicated or can be
partitioned.
Slide 25- 11
Data Fragmentation, Replication and Allocation
• Data Replication
– Database is replicated to all sites.
– In full replication the entire database is replicated and in partial
replication some selected part is replicated to some of the sites.
– Data replication is achieved through a replication schema.
• Data Distribution (Data Allocation)
– This is relevant only in the case of partial replication or partition.
– The selected portion of the database is distributed to the
database sites.
Advantages of Fragmentation
• Easy usage of Data: It makes most frequently accessed set of data near to
the user. Hence these data can be accessed easily as and when required by
them.
• Efficiency : It in turn increases the efficiency of the query by reducing the
size of the table to smaller subset and making them available with less
network access time.
• Security : It provides security to the data. That means only valid and useful
records will be available to the actual user. The DB near to the user will not
have any unwanted data in their DB. It will contain only those
informations, which are necessary for them
Advantages of Fragmentation
• Parallelism : Fragmentation allows user to access the
same table at the same time from different locations.
Users at different locations will be accessing the same
table in the DB at their location, seeing the data that
are meant for them. If they are accessing the table at
one location, then they have to wait for the locks to
perform their transactions.
Advantages of Fragmentation
• Reliability : It increases the reliability of fetching the
data. If the users are located at different locations
accessing the single DB, then there will be huge
network load. This will not guarantee that correct
records are fetched and returned to the user.
Accessing the fragment of data in the nearest DB will
reduce the risk of data loss and correctness of data.
• Balanced Storage : Data will be distributed evenly
among the databases in DDB.
Assignment
• Explain the concept of data fragmentation
• What are the advantages of fragmentation

Adbms 24 data fragmentation

  • 1.
    Advance Database ManagementSystems :24 Prof Neeraj Bhargava Vaibhav Khanna Department of Computer Science School of Engineering and Systems Sciences Maharshi Dayanand Saraswati University Ajmer
  • 2.
    Data Fragmentation • DistributedDatabase systems provide distribution transparency of the data over the DBs. This is achieved by the concept called Data Fragmentation. • That means, fragmenting the data over the network and over the DBs. • Initially all the DBs and data are designed as per the standards of any database system – by applying normalization and denormalization. • But the concept of distributed system makes these normalized data to be divided further. That means the main goal of DDBMS is to provide the data to the user from the nearest location to them and as fast as possible. • Hence the data in a table are divided according their location or as per user’s requirement.
  • 3.
    Distributed Database • Adistributed database (DDB) is an integrated collection of databases that is physically distributed across sites in a computer network. • A distributed database management system (DDBMS) is the software system that manages a distributed database such that the distribution aspects are transparent to the users. • To form a distributed database system (DDBS), the files must be structured, logically interrelated, and physically distributed across multiple sites. In addition, there must be a common interface to access the distributed data.
  • 4.
    Slide 25- 4 DataFragmentation, Replication and Allocation • Data Fragmentation – Split a relation into logically related and correct parts. A relation can be fragmented in two ways: • Horizontal Fragmentation • Vertical Fragmentation
  • 5.
    Slide 25- 5 DataFragmentation, Replication and Allocation • Horizontal fragmentation – It is a horizontal subset of a relation which contain those of tuples which satisfy selection conditions. – Consider the Employee relation with selection condition (DNO = 5). All tuples satisfy this condition will create a subset which will be a horizontal fragment of Employee relation. – A selection condition may be composed of several conditions connected by AND or OR. – Derived horizontal fragmentation: It is the partitioning of a primary relation to other secondary relations which are related with Foreign keys.
  • 6.
    Slide 25- 6 DataFragmentation, Replication and Allocation • Vertical fragmentation – It is a subset of a relation which is created by a subset of columns. Thus a vertical fragment of a relation will contain values of selected columns. There is no selection condition used in vertical fragmentation. – Consider the Employee relation. A vertical fragment of can be created by keeping the values of Name, Bdate, Sex, and Address. – Because there is no condition for creating a vertical fragment, each fragment must include the primary key attribute of the parent relation Employee. In this way all vertical fragments of a relation are connected.
  • 7.
    Slide 25- 7 DataFragmentation, Replication and Allocation • Representation – Horizontal fragmentation • Each horizontal fragment on a relation can be specified by a sCi (R) operation in the relational algebra. • Complete horizontal fragmentation • A set of horizontal fragments whose conditions C1, C2, …, Cn include all the tuples in R- that is, every tuple in R satisfies (C1 OR C2 OR … OR Cn). • Disjoint complete horizontal fragmentation: No tuple in R satisfies (Ci AND Cj) where i ≠ j. • To reconstruct R from horizontal fragments a UNION is applied.
  • 8.
    Slide 25- 8 DataFragmentation, Replication and Allocation • Representation – Vertical fragmentation • A vertical fragment on a relation can be specified by a Li(R) operation in the relational algebra. • Complete vertical fragmentation • A set of vertical fragments whose projection lists L1, L2, …, Ln include all the attributes in R but share only the primary key of R. In this case the projection lists satisfy the following two conditions: • L1  L2  ...  Ln = ATTRS (R) • Li  Lj = PK(R) for any i j, where ATTRS (R) is the set of attributes of R and PK(R) is the primary key of R. • To reconstruct R from complete vertical fragments a OUTER UNION is applied.
  • 9.
    Slide 25- 9 DataFragmentation, Replication and Allocation • Representation – Mixed (Hybrid) fragmentation • A combination of Vertical fragmentation and Horizontal fragmentation. • This is achieved by SELECT-PROJECT operations which is represented by Li(sCi (R)). • If C = True (Select all tuples) and L ≠ ATTRS(R), we get a vertical fragment, and if C ≠ True and L ≠ ATTRS(R), we get a mixed fragment. • If C = True and L = ATTRS(R), then R can be considered a fragment.
  • 10.
    Slide 25- 10 DataFragmentation, Replication and Allocation • Fragmentation schema – A definition of a set of fragments (horizontal or vertical or horizontal and vertical) that includes all attributes and tuples in the database that satisfies the condition that the whole database can be reconstructed from the fragments by applying some sequence of UNION (or OUTER JOIN) and UNION operations. • Allocation schema – It describes the distribution of fragments to sites of distributed databases. It can be fully or partially replicated or can be partitioned.
  • 11.
    Slide 25- 11 DataFragmentation, Replication and Allocation • Data Replication – Database is replicated to all sites. – In full replication the entire database is replicated and in partial replication some selected part is replicated to some of the sites. – Data replication is achieved through a replication schema. • Data Distribution (Data Allocation) – This is relevant only in the case of partial replication or partition. – The selected portion of the database is distributed to the database sites.
  • 12.
    Advantages of Fragmentation •Easy usage of Data: It makes most frequently accessed set of data near to the user. Hence these data can be accessed easily as and when required by them. • Efficiency : It in turn increases the efficiency of the query by reducing the size of the table to smaller subset and making them available with less network access time. • Security : It provides security to the data. That means only valid and useful records will be available to the actual user. The DB near to the user will not have any unwanted data in their DB. It will contain only those informations, which are necessary for them
  • 13.
    Advantages of Fragmentation •Parallelism : Fragmentation allows user to access the same table at the same time from different locations. Users at different locations will be accessing the same table in the DB at their location, seeing the data that are meant for them. If they are accessing the table at one location, then they have to wait for the locks to perform their transactions.
  • 14.
    Advantages of Fragmentation •Reliability : It increases the reliability of fetching the data. If the users are located at different locations accessing the single DB, then there will be huge network load. This will not guarantee that correct records are fetched and returned to the user. Accessing the fragment of data in the nearest DB will reduce the risk of data loss and correctness of data. • Balanced Storage : Data will be distributed evenly among the databases in DDB.
  • 15.
    Assignment • Explain theconcept of data fragmentation • What are the advantages of fragmentation