0% found this document useful (0 votes)
177 views

Chapter - 7 Distributed Database System

Here are the fragments accessed: New York fragment of EMPLOYEE Atlanta fragment of EMPLOYEE Miami fragment of EMPLOYEE The results are integrated transparently. - User need not know about fragments. - DBMS handles fragmentation details. - Highest level of transparency. - Most complex for DBMS to implement. - Slowest performance. - Not commonly supported. Distribution Transparency • Case 2: DB Supports Location Transparency SELECT * FROM EMP WHERE LOCATION = 'New York'

Uploaded by

dawod
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views

Chapter - 7 Distributed Database System

Here are the fragments accessed: New York fragment of EMPLOYEE Atlanta fragment of EMPLOYEE Miami fragment of EMPLOYEE The results are integrated transparently. - User need not know about fragments. - DBMS handles fragmentation details. - Highest level of transparency. - Most complex for DBMS to implement. - Slowest performance. - Not commonly supported. Distribution Transparency • Case 2: DB Supports Location Transparency SELECT * FROM EMP WHERE LOCATION = 'New York'

Uploaded by

dawod
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 58

Chapter - 7

Distributed Databases system

1
Outline
1 Distributed Database Concepts

2 Data Fragmentation, Replication and Allocation

3 Types of Distributed Database Systems

4 Query Processing

5 Concurrency Control and Recovery

6 3-Tier Client-Server Architecture


Distributed Database Concepts
It can be defined as
 A distributed database (DDB) is a collection of multiple
logically related database distributed over a computer
network.
 A distributed database management system is a software
system that manages a distributed database while
making the distribution transparent to the user.
Advantages DDS
1. Management of distributed data with different levels of
transparency:
– Distribution transparency:
• This refers to the physical placement of data (files, relations,
etc.) is not known to the user.

Site 5
Site 1

Site 4 Communications neteork

Site 3 Site 2
Examples
The EMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented
horizontally and stored with possible replication as shown below.
EMPLOYEES - All
PROJECTS - All
WORKS_ON - All
EMPLOYEES - New York
Chicago PROJECTS - All
(headquarters) WORKS_ON - New York Employees

EMPLOYEES - San Francisco and LA New York


PROJECTS - San Francisco
WORKS_ON - San Francisco Employees

San Francisco Communications neteork

Los Angeles Atlanta


EMPLOYEES - LA EMPLOYEES - Atlanta
PROJECTS - LA and San Francisco PROJECTS - Atlanta
WORKS_ON - LA Employees WORKS_ON - Atlanta Employees
Advantages DDS
− Network transparency: Users do not have to worry about operational details of
the network.
− Location transparency: refers to freedom of issuing command from any location
without affecting its working.
− Naming transparency: allows access to any named object (files, relations, etc.)
from any location.
− Replication transparency: Allows to store copies of a data at multiple sites.

• This is done to minimize access time to the required data.

− Fragmentation transparency: Allows to segment a relation horizontally


(create a subset of tuples of a relation) or vertically (create a subset of columns of a

relation).
Advantages DDS
2. Increase reliability and availability:
− Reliability refers to system live time, that is, system is running efficiently most
of the time.
− Availability is the probability that the system is continuously available (usable
or accessible) during a time interval.
− A distributed database system has multiple nodes (computers) and if one fails
then others are available to do the job.
3. Improved performance:
− A DDBMS fragments the database to keep data closer to where it is needed
most.
− This reduces data management (access and modification) time significantly.
4. Easier expansion (scalability):
− Allows new nodes (computers) to be added anytime without chaining the entire
configuration.
Disadvantages DDS
– Complexity

– Cost

– Security

– Integrity control more difficult

– Lack of standards

– Lack of experience

– Database design more complex


Types of Distributed Database Systems
Homogeneous
• All sites of the database system have identical setup, i.e., same database
system software.
• The underlying operating systems can be a mixture of Linux, Window,
Unix, etc.
• For example, all sites run Oracle or DB2, or Sybase or some other database
system.
Window
Site 5 Unix
Advantages Oracle Site 1
 Easy to use Oracle
 Easy to mange Window
Site 4 Communications
 Easy to Design neteork
Disadvantages
 Difficult for most organizations to Oracle

force a homogeneous environment Site 3 Site 2


Linux Oracle Linux Oracle
Heterogeneous
 Different data center may run different DBMS products, with possibly different underlying data models.

 Translations required to allow for:


o Different hardware.
o Different DBMS products.
o Different hardware and different DBMS products.

Object Unix Relational


Oriented Site 5 Unix
Site 1
Hierarchical
Window
Site 4 Communications
network

Network
Object DBMS
Oriented Site 3 Site 2 Relational
Linux Linux
Heterogeneous
 Advantages
 Huge data can be stored in one Global center from different data center
 Remote access is done using the global schema.
 Different DBMSs may be used at each node

 Disadvantages
 Difficult to mange
 Difficult to design.

.
Federated Database Management Systems

• A federated database system (FDBS) is a collection of cooperating


database systems that are autonomous and possibly heterogeneous.
• Differences in data models: Relational, Objected oriented,
hierarchical, network, etc.
• Differences in constraints: Each site may have their own data
accessing and processing constraints.
• Differences in query language: Some site may use SQL, some may
use SQL-89, some may use SQL-92, and so on.

Multidatabase system (MDBS): A distributed DBMS in which each site


maintains complete autonomy.
Distributed Processing and Distributed Database
Centralized Database Management System
Fully Distributed Database Management System
DDBMS Components
Computer workstations
 To form the network system.
Network hardware and software
 Components that reside in each workstation.
Communications media
 Carry the data from one workstation to another.
Transaction processor (TP)
 Receives and Processes the application’s data requests.
Data processor (DP)
 Stores and Retrieves data located at the site.
 Also Known as data manager (DM).
Distributed Database System Components
DDBMS protocol
• determines how the DDBMS will:
– Interface with the network to transport data and commands
between DPs and TPs.
– Synchronize all data received from DPs (TP side) and route
retrieved data to the appropriate TPs (DP side).
– Ensure common database functions in a distributed system --
security, concurrency control, backup, and recovery.
Levels of Data & Process Distribution

• Single-Site Processing, Single-Site Data (SPSD)


– All processing is done on a single CPU or host computer.
– All DBMS are stored on the host computer’s local disk.
– The DBMS is accessed by dumb terminals.
– Typical of most mainframe and minicomputer DBMSs.
– Typical of the 1st generation of single-user microcomputer database.
Non distributed (Centralized) DBMS
Levels of Data & Process Distribution

Multiple-Site Processing, Single-Site Data (MPSD)


− Typically, MPSD requires a network file server on which
conventional applications are accessed through a LAN.
− A variation of the MPSD approach is known as a
client/server architecture.

Figure 6.7
Levels of Data & Process Distribution

 Multiple-Site Processing, Multiple-Site Data (MPMD)


– Fully distributed DBMS with support for multiple DPs and
TPs at multiple sites.
– Homogeneous DDMS
 Integrate only one type of centralized DBMS over the network.

– Heterogeneous DDBMS
 Integrate different types of centralized DBMSs over a network.
Distributed DB Transparency

 DDBMS transparency features have the common property of


allowing the end users to think that he is the database’s only
user.
– Distribution transparency
– Transaction transparency
– Failure transparency
– Performance transparency
– Heterogeneity transparency
Distribution Transparency
• Distribution transparency allows us to manage a physically
dispersed database as though it were a centralized database.
• Three Levels of Distribution Transparency
– Fragmentation transparency
– Location transparency
– Local mapping transparency

Table 6.2
Distribution Transparency
• Example :
Employee data (EMPLOYEE) are distributed over three locations: New York,
Atlanta, and Miami.
Depending on the level of distribution transparency support, three different cases of
queries are possible:

Figure 6.9 Fragment Locations


Distribution Transparency
• Case 1: DB Supports Fragmentation Transparency
SELECT * FROM EMPLOYEE WHERE EMP_DOB < '01-JAN-1940';

• Case 2: DB Supports Location Transparency


SELECT * FROM E1 WHERE EMP_DOB < '01-JAN-1940';
UNION
SELECT * FROM E2 WHERE EMP_DOC < '01-JAN-1940';
UNION
SELECT * FROM E3 WHERE EMP_DOC < '01-JAN-1940';

• Case 3: DB Supports Local Mapping Transparency


SELECT * FROM E1 NODE NY WHERE EMP_DOB < '01-JAN-1940';
UNION
SELECT * FROM E2 NODE ATL WHERE EMP_DOB < '01-JAN-1940';
UNION
SELECT * FROM E3 NODE MIA WHERE EMP_DOB < '01-JAN-1940';
Transaction Transparency
• Transaction transparency - ensures that database transactions

will maintain the database’s integrity and consistency.

• Related Concepts:
– Remote Requests

– Remote Transactions

– Distributed Transactions

– Distributed Requests
A Remote Request
 Allows us to access data to be processed by a single remote database
processor.
A Remote Transaction
 Composed of several requests, may access data at only a single
site.
A Distributed Transaction

 Allows a transaction to reference several different (local or


remote) DP sites.
A Distributed Request
 Reference data from several remote DP sites.
 Allows a single request to reference a physically partitioned table.

Example2:
Distributed Request
Transaction Transparency

 Two-Phase Commit Protocol

 The two-phase commit protocol requires a


 DO-UNDO-REDO protocol and

 write-ahead protocol.
 The DO-UNDO-REDO protocol is used by the DP to roll back
and/or roll forward transactions with the help of the system’s
transaction log entries.
Transaction Transparency
 Two-Phase Commit Protocol

 DO performs the operation and records the “before” and “after” values
in the transaction log.
 UNDO reverses an operation, using the log entries written by the DO
portion of the sequence.
 REDO redoes an operation, using the log entries written by DO
portion of the sequence.

– The write-ahead protocol forces the log entry to be written to permanent


storage before the actual operation takes place.
Two-Phase Commit Protocol
• Two-phase commit protocol defines the operations between two
nodes;
• Coordinator and

• Subordinates or cohorts - one or more


Two-Phase Commit Protocol
• The protocol is implemented in two phases:
• Phase 1: Preparation

• The coordinator sends a PREPARE TO COMMIT message to all


subordinates.
• The subordinates receive the message, write the transaction log
using the write-ahead protocol, and send an acknowledgement
message to the coordinator.
• The coordinator makes sure that all nodes are ready to commit, or
it aborts the transaction.
Two-Phase Commit Protocol
– Phase 2: The Final Commit

– The coordinator broadcasts a COMMIT message to all


subordinates and waits for the replies.

– Each subordinate receives the COMMIT message then updates


the database, using the DO protocol.
– The subordinates reply with a COMMITTED or NOT COMMITTED
message to the coordinator.
 If one or more subordinates uncommitted, the coordinator sends
an ABORT message, thereby forcing them to UNDO all
changes.
Query Optimization

• The objective of a query optimization routine is to minimize the


total cost associated with the execution of a request.
• The costs associated with a request are a function of the:

– Access time (I/O) cost - involved in accessing the physical


data stored on disk.
– Communication cost - associated with the transmission of
data among nodes in distributed database systems.
– CPU time cost - associated with the processing overhead of
managing distributed transactions.
Performance Transparency and
Query Optimization

• Query optimization must provide distribution transparency as well


as replica transparency.

• Replica transparency refers to the DDBMSs ability to hide the


existence of multiple copies of data from the user.

• Query optimization algorithms are based on two principles:

• Selection of the optimum execution order

• Selection of sites to be accessed to minimize communication


costs
Performance Transparency and
Query Optimization
• Operation Modes of Query Optimization

– Automatic query optimization means that the DDBMS finds


the most cost-effective access path without user intervention.
– Manual query optimization requires that the optimization be
selected and scheduled by the end user or programmer.

• Timing of Query Optimization

– Static query optimization takes place at compilation time.


– Dynamic query optimization takes place at execution time.
Performance Transparency and
Query Optimization
• Optimization Techniques Information -

– Statistically based query optimization

• uses statistical information about the database.

– Rule-based query optimization algorithm

• based on a set of user-defined rules to determine the best


query access strategy.
Distributed Database Design

 The design of a distributed database introduces three new issues:


– How to partition the database into fragments.
– Which fragments to replicate.
– Where to locate those fragments and replicas.
Data Fragmentation
 Data fragmentation allows us to break a single object
into two or more segments or fragments.
 Three Types of Fragmentation Strategies:

 Horizontal fragmentation

 Vertical fragmentation

 Mixed fragmentation
Data Fragmentation
 Horizontal Fragmentation - Consists of a subset of the tuples
of a relation.
 Fragment represents the equivalent of a SELECT statement, with
the WHERE clause on a single attribute.
Data Fragmentation
 Vertical fragment Consists of a subset of the attributes of a
relation.
 Equivalent to the PROJECT statement.
Data Fragmentation

 Mixed fragment - Consists of a horizontal


fragment that is subsequently vertically
fragmented, or a vertical fragment that is
then horizontally fragmented.
 A mixed fragment is defined using the
Selection and Projection operations of the
relational algebra.
Data Replication

 Data replication refers to the storage of data copies at multiple


sites served by a computer network.
– Enhance data availability and response time, reducing
communication and total query costs.
Data Replication
• Mutual Consistency Rule
– All copies of data fragments be identical.
– DDBMS must ensure that a database update is performed at all
sites where replicas exist.
• Replication Conditions
– Fully Replicated database stores multiple copies of all database
fragments at multiple sites.
– Partially Replicated database stores multiple copies of some
database fragments at multiple sites.
• Factors for Data Replication Decision
– Database Size
– Usage Frequency
Data Allocation
 Data allocation describes the processing of deciding where to locate
data.
 Data Allocation Strategies
– Centralized
The entire database is stored at one site.
– Partitioned
The database is divided into several disjoint parts (fragments) and
stored at several sites.
– Replicated
Copies of one or more database fragments are stored at several
sites.
• Data allocation algorithms

• Data allocation algorithm take into consideration a variety of


factors:

– Performance and data availability goals

– Size, number of rows, the number of relations that an entity


maintains with other entities.

– Types of transactions to be applied to the database, the


attributes accessed by each of those transactions.
Database system architectures
 Parallel versus Distributed Architectures

– There are two main types of multiprocessor system architectures :


■ Shared memory (tightly coupled) architecture. Multiple processors share secondary
(disk) storage and also share primary memory.
– Shared disk (loosely coupled) architecture. Multiple processors share secondary
(disk) storage but each has their own primary memory.
– Shared nothing(parallel processing (MPP)) architecture - multiple processor
architecture in which each processor is part of a complete system, with its own memory
and disk storage.
Some different database system architectures.

 Parallel database architectures:


(a) shared memory;
(b) shared disk;
(c) shared nothing.
 Centralized database architecture

 A truly distributed database architecture.


Shared nothing architecture
Centralized database
Distributed database

Site 1
Client/Server vs. DDBMS
• Client/server architecture refers to the way in which computers
interact to form a system.
Reference architecture for a DDBMS
Questions ?

You might also like