PRESENTATION
ON
DISTRIBUTED DATABASE
Submitted BY
Rosemelyne Wartde
MTech IT 1st Semester
Roll No- 20MTechIT02
What is a Distributed database?
A collection of multiple interconnected databases,
which are spread physically across various locations that
communicate via a computer network.
Advantages of distributed database:
it is easier to expand
Can have data arranged according to different levels of
transparency
It is cheaper to create a network of system containing a
part of the database
Even if some of the data nodes go offline, the rest of the
database can continue its normal function.
Disadvantages of Distributed Database Systems
It is quite complex
It is more expensive
It is difficult to provide security
There can also be data redundancy in the
database.
Types of Distributed Databases
Distributed Database Environment
Homogeneous Heterogeneous
Non-
Autonomous Federated Multidatabase
Autonomous
Homogeneous Distributed Databases
All the sites use identical DBMS and operating systems
Properties:
The sites use very similar software.
The sites use identical DBMS or DBMS from the same vendor.
Each site is aware of all other sites and cooperates with other
sites to process user requests.
The database is accessed through a single interface as if it is a
single database.
Types of Homogeneous Distributed Database
Autonomous − Each database is independent that functions on
its own. They are integrated by a controlling application and
use message passing to share data updates.
Non-autonomous − Data is distributed across the homogeneous
nodes and a central or master DBMS co-ordinates data updates
across the sites.
Heterogeneous Distributed Databases
different sites have different operating systems, DBMS products and data
models.
Properties:
Different sites use dissimilar schemas and software.
The system may be composed of a variety of DBMSs like relational, network,
hierarchical or object oriented.
Query processing is complex due to dissimilar schemas.
Transaction processing is complex due to dissimilar software.
A site may not be aware of other sites and so there is limited co-operation in
processing user requests.
Types of Heterogeneous Distributed Databases
Federated − The heterogeneous database systems are
independent in nature and integrated together so that they
function as a single database system.
Un-federated − The database systems employ a central
coordinating module through which the databases are accessed.
Distributed Data Storage
Consider a relation r that is to be stored in the database. There are two
approaches to storing this relation in the distributed database:
1. Data Replication- Data replication is the process of storing separate
copies of the database at two or more sites.
2. Fragmentation- Fragmentation is the task of dividing a table into a set
of smaller tables.
The subsets of the table are called fragments.
types: horizontal, vertical, and hybrid
Horizontal Fragmentation:
Divides a relation horizontally into the group of rows to create
subset of tables.
Example:
Account (Acc_No, Balance, Branch_Name). In this e.g if values are
inserted in table Branch_Name as Pune, Baroda, Delhi.
The query can be written as:
SELECT * FROM ACCOUNT
WHERE Branch_Name=“Baroda”
Vertical Fragmentation
Divides a relation vertically into groups of column to create subsets
of tables.
Acc_No Balance Branch-
Name
A_101 50000 Pune
A-102 40000 Baroda
Fragmentation1:
SELECT * FROM Acc_No
Fragmentation2:
SELECT * FROM Balance
Hybrid Fragmentation:
a combination of horizontal and vertical fragmentation techniques
are used.
Consider the following table which consist of employee information
Emp_ID Emp_Name Emp_Address Emp_Age Em_Salary
101 Raj Pune 37 15000
102 Maya Baroda 40 12000
Fragmentation1:
SELECT * FROM Emp_Name where Emp_Age<40
Fragmentation2:
SELECT * FROM Emp_Name where Emp_Address=‘Pune’ AND
Salary<14000
Distributed query processing
It is the procedure of answering queries in a distributed
environment where data is managed at multiple site.
Transformation a high level query into a query execution plan as
well as the execution of this plan
The goal is to produce a plan which is equivalent to the original
query and efficient I,e to minimize resource consumption like
total cost or response time.
Transaction
A transaction is a program including a collection of database
operations, executed as a logical unit of data processing.
The operations performed like insert, delete, update or retrieve
data.
Each high level operation can be divided into a number of low
level tasks or operations. For example, a data update operation
can be divided into three tasks −
read_item()
modify_item()
write_item()
Transaction Operations
The low level operations performed in a transaction are −
begin_transaction
read_item or write_item
end_transaction
commit − A signal to specify that the transaction has been successfully completed
in its entirety and will not be undone.
rollback − A signal to specify that the transaction has been unsuccessful and so all
temporary changes in the database are undone. A committed transaction cannot
be rolled back.
Transaction States
Active − The initial state where the transaction enters is the active state. The transaction
remains in this state while it is executing read, write or other operations.
Partially Committed − The transaction enters this state after the last statement of the
transaction has been executed.
Committed − The transaction enters this state after successful completion of the
transaction and system checks have issued commit signal.
Failed − The transaction goes from partially committed state or active state to failed
state when it is discovered that normal execution can no longer proceed or system checks
fail.
Aborted − This is the state after the transaction has been rolled back after failure and
the database has been restored to its state that was before the transaction began.
Desirable Properties of Transactions
ACID Properties:
Atomicity − This property states that a transaction is an atomic unit of processing, that
is, either it is performed in its entirety or not performed at all. No partial update should
exist.
Consistency − A transaction should take the database from one consistent state to
another consistent state. It should not adversely affect any data item in the database.
Isolation − A transaction should be executed as if it is the only one in the system. There
should not be any interference from the other concurrent transactions that are
simultaneously running.
Durability − If a committed transaction brings about a change, that change should be
durable in the database and not lost in case of any failure.
Deadlock
What are Deadlocks?
A deadlock occurs when two or more processes need some
resource to complete their execution that is held by the
other process.
Coffman Condition
A deadlock will only occur if the four conditions hold true:
1. Mutual Exclusion-There should be a resource that can only be held
by one process at a time.
2. Hold and Wait-A process can hold multiple resources and still
request more resources from other processes which are holding them.
3. No Preemption-A resource cannot be preempted from a process by force. A
process can only release a resource voluntarily.
4. Circular Wait-A process is waiting for the resource held by the second
process, which is waiting for the resource held by the third process and so on,
till the last process is waiting for a resource held by the first process.
THANK YOU