Fragmentation in Distributed DBMS
Last Updated :
29 May, 2023
Fragmentation is a process of dividing the whole or full database into various subtables or sub relations so that data can be stored in different systems. The small pieces or sub relations or subtables are called fragments. These fragments are called logical data units and are stored at various sites. It must be made sure that the fragments are such that they can be used to reconstruct the original relation (i.e, there isn’t any loss of data).
In the fragmentation process, let’s say, If a table T is fragmented and is divided into a number of fragments say T1, T2, T3….TN. The fragments contain sufficient information to allow the restoration of the original table T. This restoration can be done by the use of UNION or JOIN operation on various fragments. This process is called data fragmentation. All of these fragments are independent which means these fragments can not be derived from others. The users needn’t be logically concerned about fragmentation which means they should not concerned that the data is fragmented and this is called fragmentation Independence or we can say fragmentation transparency.
Advantages :
- As the data is stored close to the usage site, the efficiency of the database system will increase
- Local query optimization methods are sufficient for some queries as the data is available locally
- In order to maintain the security and privacy of the database system, fragmentation is advantageous
Disadvantages :
- Access speeds may be very high if data from different fragments are needed
- If we are using recursive fragmentation, then it will be very expensive
We have three methods for data fragmenting of a table:
- Horizontal fragmentation
- Vertical fragmentation
- Mixed or Hybrid fragmentation
Let’s discuss them one by one.
Horizontal fragmentation –
Horizontal fragmentation refers to the process of dividing a table horizontally by assigning each row (or a group of rows) of relation to one or more fragments. These fragments can then be assigned to different sites in the distributed system. Some of the rows or tuples of the table are placed in one system and the rest are placed in other systems. The rows that belong to the horizontal fragments are specified by a condition on one or more attributes of the relation. In relational algebra horizontal fragmentation on table T, can be represented as follows:
σp(T)
where, σ is relational algebra operator for selection
p is the condition satisfied by a horizontal fragment
Note that a union operation can be performed on the fragments to construct table T. Such a fragment containing all the rows of table T is called a complete horizontal fragment.
For example, consider an EMPLOYEE table (T) :
Eno |
Ename |
Design |
Salary |
Dep |
101 |
A |
abc |
3000 |
1 |
102 |
B |
abc |
4000 |
1 |
103 |
C |
abc |
5500 |
2 |
104 |
D |
abc |
5000 |
2 |
105 |
E |
abc |
2000 |
2 |
This EMPLOYEE table can be divided into different fragments like:
EMP 1 = σDep = 1 EMPLOYEE
EMP 2 = σDep = 2 EMPLOYEE
These two fragments are: T1 fragment of Dep = 1
Eno |
Ename |
Design |
Salary |
Dep |
101 |
A |
abc |
3000 |
1 |
102 |
B |
abc |
4000 |
1 |
Similarly, the T2 fragment on the basis of Dep = 2 will be :
Eno |
Ename |
Design |
Salary |
Dep |
103 |
C |
abc |
5500 |
2 |
104 |
D |
abc |
5000 |
2 |
105 |
E |
abc |
2000 |
2 |
Now, here it is possible to get back T as T = T1 ∪ T2 ∪ …. ∪ TN
Vertical Fragmentation
Vertical fragmentation refers to the process of decomposing a table vertically by attributes or columns. In this fragmentation, some of the attributes are stored in one system and the rest are stored in other systems. This is because each site may not need all columns of a table. In order to take care of restoration, each fragment must contain the primary key field(s) in a table. The fragmentation should be in such a manner that we can rebuild a table from the fragment by taking the natural JOIN operation and to make it possible we need to include a special attribute called Tuple-id to the schema. For this purpose, a user can use any super key. And by this, the tuples or rows can be linked together. The projection is as follows:
πa1, a2,…, an (T)
where, π is relational algebra operator
a1…., an are the attributes of T
T is the table (relation)
For example, for the EMPLOYEE table we have T1 as :
Eno |
Ename |
Design |
Tuple_id |
101 |
A |
abc |
1 |
102 |
B |
abc |
2 |
103 |
C |
abc |
3 |
104 |
D |
abc |
4 |
105 |
E |
abc |
5 |
For the second. sub table of relation after vertical fragmentation is given as follows :
Salary |
Dep |
Tuple_id |
3000 |
1 |
1 |
4000 |
2 |
2 |
5500 |
3 |
3 |
5000 |
1 |
4 |
2000 |
4 |
5 |
This is T2 and to get back to the original T, we join these two fragments T1 and T2 as πEMPLOYEE (T1 ⋈ T2)
Mixed Fragmentation
The combination of vertical fragmentation of a table followed by further horizontal fragmentation of some fragments is called mixed or hybrid fragmentation. For defining this type of fragmentation we use the SELECT and the PROJECT operations of relational algebra. In some situations, the horizontal and the vertical fragmentation isn’t enough to distribute data for some applications and in that conditions, we need a fragmentation called a mixed fragmentation.
Mixed fragmentation can be done in two different ways:
- The first method is to first create a set or group of horizontal fragments and then create vertical fragments from one or more of the horizontal fragments.
- The second method is to first create a set or group of vertical fragments and then create horizontal fragments from one or more of the vertical fragments.
The original relation can be obtained by the combination of JOIN and UNION operations which is given as follows:
σP(πa1, a2..,an(T))
πa1,a2….,an (σp(T))
For example, for our EMPLOYEE table, below is the implementation of mixed fragmentation is πEname, Design (σEno < 104(EMPLOYEE))
The result of this fragmentation is:
Ename |
Design |
A |
abc |
B |
abc |
C |
abc |
Similar Reads
Types of Distributed DBMS
A system that is used for managing the storage and retrieval of data across multiple interconnected databases is called a Distributed Database Management System(DDBMS). In this case, the interconnected databases are situated in different geographical areas. In DDBMS, one can access and store data tr
4 min read
Query Processing in Distributed DBMS
Query processing in a distributed database management system requires the transmission of data between the computers in a network. A distribution strategy for a query is the ordering of data transmissions and local data processing in a database system. Generally, a query in Distributed DBMS requires
5 min read
Functions of Distributed Database System
Distributed database systems play an important role in modern data management by distributing data across multiple nodes. This article explores their functions, including data distribution, replication, query processing, and security, highlighting how these systems optimize performance, ensure avail
10 min read
Distributed Database System
A distributed database is basically a database that is not limited to one system, it is spread over different sites, i.e, on multiple computers or over a network of computers. A distributed database system is located on various sites that don't share physical components. This may be required when a
5 min read
Decomposition In DBMS
Decomposition refers to the division of tables into multiple tables to produce consistency in the data. In this article, we will learn about the Database concept. This article is related to the concept of Decomposition in DBMS. It explains the definition of Decomposition, types of Decomposition in D
4 min read
Disadvantages of Distributed DBMS
Distributed Database Systems is a kind of DBMS where databases are present at different locations and connected via a network. Each site in a Distributed Database is capable of accessing and processing local data as well as remote data. Although, distributed DBMS is capable of effective communicatio
2 min read
Starvation in DBMS
Starvation in DBMS is a problem that happens when some processes are unable to get the resources they need because other processes keep getting priority. This can happen in situations like locking or scheduling, where some processes keep getting the resources first, leaving others waiting indefinite
8 min read
Data Distribution in Cassandra
In this article we will discussed Data distribution in Cassandra and how data distributes over cluster. So, let's have a look. In Cassandra data distribution and replication go together. In Cassandra distribution and replication depending on the three thing such that partition key, key value and Tok
2 min read
What is "Transparent DBMS"?
A transparent Database Management System (DBMS) has been designed to give users seamless access to data while hiding the complexities of data storage, management, and retrieval. Such a system offers various forms of transparency such as location, fragmentation, replication, concurrency and failure t
6 min read
Single-Master and Multi-Master Replication in DBMS
Prerequisite - Data Replication Data replication is simply process of copying data from one database server to another database server so that all users can share same data without any inconsistency. Before understanding various models of replicating data it is very important to know why we need to
4 min read