BTech(CSE) III Year I Semester Academic Year 2025-26
Advanced Databases
UNIT I ASSIGNMENT-1 QUESTIONS
I. Multiple Choice Questions (MCQs)
1. In a Distributed Database System (DDBS), data is:
a) Stored in a single central location
b) Always replicated at every site
c) Stored across multiple sites and accessed via a network
d) Stored only on client machines
Ans : c) Stored across multiple sites and accessed via a network
2. Which of the following is a promise of DDBMS?
a) Increased data redundancy
b) Improved reliability and availability
c) Guaranteed zero communication cost
d) Centralized control
Ans: b) Improved reliability and availability
3. Which architectural model has no central coordinating site?
a) Client-server architecture
b) Peer-to-peer architecture
c) Multi-tier architecture
d) Centralized server model
Ans: b) Peer-to-peer architecture
4. Fragmentation in DDBMS refers to:
a) Breaking large tables into smaller parts stored at different sites
b) Making multiple copies of a table
c) Allocating all data to one site
d) Combining multiple databases into one
Ans : a) Breaking large tables into smaller parts stored at different sites
5. Allocation in DDBMS is concerned with:
a) Deciding where to store each fragment
b) Deciding how to combine fragments
c) Deciding the query optimization strategy
d) Deciding data formats
Ans: a) Deciding where to store each fragment
II. Fill in the Blanks
1. A Distributed Database System is a collection of multiple databases logically
interrelated but physically ___________ across a network
Answer: distributed
2. Allocation refers to deciding the _______ where each data fragment should be stored
Answer: site(s)
3. In client-server architecture, the ______ manages the database while _______
request services
Answer: server, clients
4. The ___________defines the logical view of the entire distributed database.
Answer: global schema
5. ___________ means storing copies of the same data at multiple sites
Answer: copies
III. Write short answers the following questions
1) a)Define Distributed Database b)Define Distributed DBMS
Distributed Database
A distributed database is a collection of multiple, logically interrelated
databases distributed over a computer network
OR
A distributed database is a collection of data which belong logically to
the same system but are spread over the sites of a computer network .
Figure:Current Distribution – Geographically Distributed Data
Centers
Distributed DBMS Environment
Two equally emphasizes important aspects of a distributed database
1. Distribution
2.Logical Correlation
1. Distribution : i.e the fact that data are not resident at the same
site(processor)
2. Logical Correlation i.e the fact that data have some properties tied
them together
OR
Working Definition of distributed database
A distributed database is a collection of data which are distributed over
different computers of a computer network.
Each site of the network has autonomous processing capability and can
perform Local applications.
Each site also participates in the execution of at least one global
application, which requires accessing data at several sites using a
communication subsystem .
Figure: Distributed Database System Architecture
Figure: Distributed Database System Architecture
b) Define Distributed DBMS
A distributed database management system (Distributed DBMS) is the software
that manages the DDB and provides an access mechanism that makes this
distribution transparent to the users.
Detailed DDBMS
DISTRIBUTED DATABASE MANAGEMENT SYSTEMS (DDBMSs)
A distributed database management system supports the creation and
maintenance of distributed databases.
Several commercially available distributed systems were developed by the
vendors of centralized database management systems.
They contain additional components which extend the capabilities of centralized
DBMSs by supporting communication and cooperation between several
instances of DBMSs which are installed at different sites of a computer
network.
The software components which are typically necessary for building a
distributed database in this case are:
1. The database management component (DB)
2. The data communication component (DC)
3. The data dictionary (DD), which is extended to represent information
about the distribution of data in the network
4. The distributed database component (DDB)
These components are connected as shown in Figure 1.6 for a two-site
network.
We will use the term “distributed database management system” to
refer to the above set of four components, while DDB is only the
specialized distributed
database component. In a similar way, we will use the term “database
management system” to refer to the set of components which serve to
manage a nondistributed database, i.e., the DB, DC, and DD
components.
2) a)What is transparency in DDBMS b)State fragmentation
Transparency
a) What is transparency in DDBMS
Transparency
• Refers to the separation of the higher-level semantics of the system
from the lower-level implementation issues.
• A transparent system “hides” the implementation details from the
users.
• A fully transparent DBMS provides high-level support for the
development of complex applications.
b)State fragmentation transparency
Fragmentation transparency ensures that the user is not aware of
and is not involved in the fragmentation of the data.
• The user is not involved in finding query processing strategies over
fragments or formulating queries over fragments.
– The evaluation of a query that is specified over an entire relation but
now has to be performed on top of the fragments requires an
appropriate query evaluation strategy
• Fragmentation is commonly done for reasons of performance,
availability, and reliability
• Two fragmentation alternatives
– Horizontal fragmentation: divide a relation into a subsets of tuples
– Vertical fragmentation: divide a relation by columns
3) What is Fragmentation and list out the fragmentation rules.
Fragmentation is the process of dividing a global relation (table) into smaller
pieces called fragments, which are stored at different sites in a distributed
system.
Each global relation can be split into several nonoverlapping portions which are
called fragments.
Fragments are logical portions of global relations which are physically located
at one or several sites of the network.
The goal is to improve performance, reliability, and manageability while
ensuring that the database still satisfies correctness conditions (completeness,
reconstruction, disjointness).
Example:
Global Relation
Selection operation on the global relation.
Then the horizontal fragmentation can be defined in the following way:
Reconstruction
Completeness condition: All the data of the global relation must be
mapped into the fragments; i.e., it must not happen that a data item
which belongs to a global relation does not belong to any fragment.
Completeness. If a relation instance R is decomposed into fragments FR ={R1, R2, . . . ,
Rn}, each data item that is in R can also be found in one or more of Ri’s.
This property, which is identical to the lossless decomposition property of normalization ,
is also important in fragmentation since it ensures that the data in a global relation is
mapped into fragments without any loss.
Reconstruction condition: It must. always be possible to reconstruct
each global relation from its fragments. The necessity of this
condition is obvious: in fact, only fragments are stored in the
distributed database, and global relation have to be built through this
reconstruction operation if necessary.
should be possible to define a relational operator ▽ such that
Reconstruction. If a relation R is decomposed into fragments FR = {R1, R2, . . . ,Rn}, it
The operator ▽ will be different for different forms of fragmentation; it is important,
however, that it can be identified.
The reconstructability of the relation from its fragments ensures that constraints defined
on the data in the form of dependencies are preserved.
Example:
Disjointness condition: it is convenient that fragments be disjoint, so
that the replication of data can be controlled explicitly at the
allocation level.
However, this condition is useful mainly with horizontal
fragmentation, while for vertical fragmentation we will sometimes
allow this condition to be violated.
Disjointness. If a relation R is horizontally decomposed into fragments FR = {R1, R2, . . . ,
Rn} and data item di is in Rj, it is not in any other fragment Rk (k ∕= j). This criterion
ensures that the horizontal fragments are disjoint. If relation R is vertically decomposed,
its primary key attributes are typically repeated in all its fragments (for reconstruction).
Therefore, in case of vertical partitioning, disjointness is defined only on the nonprimary
key attributes of a relation.
4) State Replication
Replication means storing copies of the same data at multiple sites (nodes).
This ensures reliability, fault-tolerance, and faster local access to data.
Types of Replication
1. Full Replication
o Entire database is replicated at every site.
o High availability & reliability.
o Expensive to update (because every copy must be synchronized).
2. Partial Replication
o Only some relations or fragments are replicated at selected sites.
o Balance between performance and storage.
3. No Replication (Pure Fragmentation)
o Each fragment is stored at exactly one site.
o Saves space
o less fault-tolerant.
Advantages of Replication
Increased availability (system keeps working even if some sites fail).
Improved reliability (no single point of failure).
Faster query performance (local access to replicated data).
Disadvantages
Update overhead (must keep all copies consistent).
Storage cost (extra space for replicas).
Complex concurrency control in case of multiple updates.
5) List out the types of Distributed Databases
Types of Distributed Databases
o Homogeneous Distributed Databases
o Heterogeneous Distributed Databases
Homogeneous Distributed Databases
In a homogeneous distributed database, all the sites use identical
(with same type of DBMS software) DBMS and operating
systems.
All sites have identical software and share a common global
schema(The overall design of a database is called schema)
System data can be accessed and modified simultaneously on
several databases in the network
Homogeneous distributed system are easy to handle
Properties
The sites use very similar software.
The sites use identical DBMS or DBMS from the same vendor.
Each site is aware of all other sites and cooperates with other
sites to process user requests.
The database is accessed through a single interface as if it is a
single database.
2. Heterogeneous Distributed Databases
In a heterogeneous distributed database, different sites have
different operating systems, DBMS products and data models.
Different sites can different Schema and software.
In this system data can be accessible to several databases in the
network with the help of generic connectivity(ODBC and
JDBC).
Potentially different DBMS are used at different site.
Properties
Different sites use dissimilar schemas and software.
The system may be composed of a variety of DBMSs like
relational, network, hierarchical or object oriented.
Query processing is complex due to dissimilar schemas.
Transaction processing is complex due to dissimilar software.
A site may not be aware of other sites and so there is limited
co-operation in processing user requests
IV Write long answers the following questions
1)Explain Architectural Models for DDBMS with diagrams.
i)client-server ii)peer to peer iii)Multi-database.
Architecture: The architecture of a system defines its structure:
The components of the system are identified
The function of each component is specified
The interrelationships and interactions among the components are defined.
• Autonomy: Refers to the distribution of control (not of data) and indicates the
degree to which individual DBMSs can operate independently.
Autonomy has different dimensions
Design autonomy : each individual DBMS is free to use the data models and
transaction management techniques that it prefers.
Communication autonomy : each individual DBMS is free to decide what information to
provide to the other DBMSs
Execution autonomy : each individual DBMS can execture the transactions that are submitted
to it in any way that it wants to
Distribution: Refers to the physical distribution of data over multiple sites.
-No distribution: No distribution of data at all
∗ Data are concentrated on the server, while clients provide application environment/user
Client/Server distribution:
∗ First attempt to distribution
interface
∗ No distinction between client and server machine
Peer-to-peer distribution (also called full distribution):
∗ Each machine has full DBMS functionality
Client-Server Architecture for DDBMS (Data-based)
This is a two-level architecture where the functionality is divided into
servers and clients.
The server functions primarily encompass data management, query
processing, optimization and transaction management.
Client functions include mainly user interface.
However, they have some functions like consistency checking and
transaction management.
The two different client - server architecture are -
General idea: Divide the functionality into two classes:
∗ Mainly
server functions
data management, including query processing, optimization, transaction
management, etc.
∗ Might also include some data manage- ment functions (consistency checking,
client functions
transaction management, etc.)not just user interface
Provides a two-level architecture
More efficient division of work
Different types of client/server architecture
– Multiple client/single server
– Multiple client/multiple server
Figure:Multiple client/multiple server
Peer-to-Peer Architecture for DDBMS (Data-based)
In these systems, each peer acts both as a client and a server for
imparting database services.
The peers share their resource with other peers and co-ordinate their
activities.
This architecture generally has four levels of schemas -
Local internal schema (LIS)
Describes the local physical data or ganization (which might be different on each
machine)
Local conceptual schema (LCS)
Describes logical data organization at each site
Required since the data are frag- mented and replicated
Global conceptual schema (GCS)
Describes the global logical view of the data
Union of the LCSs
External schema (ES)
Describes the user/application view on the data
Figure: Peer-to-Peer Architecture for DDBMS (Data-based)
Multi - DBMS Architectures
This is an integrated database system formed by a collection of two or
more autonomous database systems.
Multi-DBMS can be expressed through six levels of schemas -
• Multi-database View Level - Depicts multiple user views comprising
of subsets of the integrated distributed database.
• Multi-database Conceptual Level - Depicts integrated multi-
database that comprises of global logical multi-database structure
definitions.
• Multi-database Internal Level - Depicts the data distribution across
different sites and multi-database to local data mapping.
• Local database View Level - Depicts public view of local data.
• Local database Conceptual Level - Depicts local data organization
at each site.
• Local database Internal Level - Depicts physical data organization at
each site.
There are two design alternatives for multi-DBMS
• Model with multi-database conceptual level.
Model without multi-database conceptual level
Figure: Model with multi-database conceptual level
2) Explain Referenced Architecture in distributed databases.
The reference architecture of a Distributed Database Management System
(DDBMS) explains how data is logically and physically organized, fragmented,
and allocated across multiple sites, while ensuring transparency, redundancy
control, and independence from local DBMSs
Figure : A reference architecture for distributed databases.
In the above figure, there is ‘Global schema’ at the top label. Global schema defines all the
data Which are contained in the distributed database as if the database were not distributed at
all.
The Global schema consists of the definition of a set of global relations.
Each global relation can be split into several non overlapping portions which are called
fragments. The mapping between global relations and fragments is defined in the
fragmentation schema. This mapping is one to many.
Fragments are logical portions of global relations which are physically located at one or
several sites of the network. The allocation schema defines at which site(s) a fragment is
located.
The Local mapping schema maps fragments in the allocation schema onto external objects in
the local database.
The mapping between global relations and fragments is defined in the fragmentation schema.
This mapping is one to many; i.e., several fragments correspond to one global relation,but
only one global relation corresponds to one fragment. Fragments are indicated by a global
relation name with an index (fragment index);
Example, R; indicates the i th fragment of global relation R.
Fragments are logical portions of global relations which are physically located at one or
several sites of the network.
The allocation schema defines at which site(s) a fragment is located.
The type of mapping defined in the allocation schema determines whether the distributed
database is redundant or nonredundant.
All the fragments which correspond to the same global relation R and are located at the same
site j constitute the physical image of global relation R at site j. There is therefore a one to
one mapping between a physical image and a pair (global relation, site); physical images can
be indicated by a global relation name and a site index.
To distinguish them from fragments, we will use a superscript; for example, Rj indicates the
physical image of the global relation R at site j.
An example of the relationship between the object types defined above is shown in Figure
below. A global relation R is split into four fragments R1, R2, R3, and R4.
These four fragments are allocated redundantly at the three sites of a computer network, thus
building three physical images .
A copy of a fragment at a given site, and denote it using the global relation name and two
indexes (a fragment index and a site index).
For example, in Figure 3.2, the notation R3 indicates the copy of fragment R2 which is
located at site 3. Finally, two physical images can be identical. In this case a physical image
is a copy of another physical image. For example, in Figure below
The three most important objectives of this architecture are the separation of data
fragmentation and allocation, the control of redundancy, and the independence from local
DBMSs.
1. Separating the concept of data fragmentation from the concept of data allocation. This
separation allows us to distinguish two different levels of distribution transparency, namely
fragmentation transparency and location transparency.
Fragmentation transparency is the highest degree of transparency and consists of the fact that
the user or application programmer works on global relations.
Location transparency is a lower degree of transparency and requires the user or application
programmer to work on fragments instead of global relations
2. Explicit control of redundancy. The reference architecture provides explicit control of
redundancy at the fragment level.
Example, in Figure the two physical images are overlapping; i.e., they contain
common data.
The definition of disjoint fragments as building blocks of physical images allows us to refer
explicitly to this overlapping part: the replicated fragment R2.
3. Independence from local DBMSs. This feature, called local mapping transparency, allows
us to study several problems of distributed database management without having to take into
account the specific data models of local DBMSs.
In a homogeneous system it is possible that the siteindependent schemata are defined using
the same data model as the local DBMSs, thus reducing the complexity of this mapping.
Another type of transparency which is strictly related to location transparency is replication
transparency.
Replication transparency means that the user is unaware of the replication of fragments.
3) Discuss the various alternative design strategies for distributed databases.
Distributed Database (DDB), the major task is to decide how data should be fragmented,
replicated, and allocated across multiple sites.
Different design strategies exist, each focusing on different goals
o Performance
o Reliability
Transparency
Two major strategies that have been identified for designing distributed databases
are the top-down approach and the bottom-up approach.
Topdown approach is more suitable for tightly integrated, homogeneous distributed
DBMSs, while bottom-up design is more suited to multidatabases
.
Design Strategies
Top-down approach
Designing systems from scratch
Homogeneous systems
Bottom-up approach
The databases already exist at a number of sites
The databases should be connected to solve common tasks
Top-down design process
A framework for top-down design process is shown in Figure . The activity begins with a
requirements analysis that defines the environment of the system and “elicits
both the data and processing needs of all potential database users”.
The requirements study also specifies where the final system is expected to stand with respect
to the objectives of a distributed DBMS as identified. These objectives are defined with
respect to performance, reliability and availability, economics, and expandability (flexibility).
The requirements document is input to two parallel activities: view design and conceptual
design.
The view design activity deals with defining the interfaces for end users. The conceptual
design, on the other hand, is the process by which the enterprise is examined to determine
entity types and relationships among these entities. One can possibly divide this process into
two related activity groups :entity analysis and functional analysis.
Entity analysis is concerned with determining the entities, their attributes, and the
relationships among them.
Functional analysis, on the other hand, is concerned with determining the fundamental
functions with which the modeled enterprise is involved.
The results of these two steps need to be cross-referenced to get a better understanding of
which functions deal with which entities.
There is a relationship between the conceptual design and the view design. In one sense, the
conceptual design can be interpreted as being an integration of user views.
Even though this view integration activity is very important, the conceptual model should
support not only the existing applications, but also future applications. View
integration should be used to ensure that entity and relationship requirements for all the views
are covered in the conceptual schema.
In conceptual design and view design activities the user needs to specify the data entities and
must determine the applications that will run on the database as well as
statistical information about these applications.
The global conceptual schema (GCS) and access pattern information collected as a result of
view design are inputs to the distribution design step.
The objective is to design the local conceptual schemas (LCSs) by distributing the entities
over the sites of the distributed system.
Given that we use the relational model , the entities correspond to relations. it is quite
common to divide them into subrelations, called fragments, which are then distributed.
Thus, the distribution design activity consists of two steps: fragmentation and allocation.
The last step in the design process is the physical design, which maps the local conceptual
schemas to the physical storage devices available at the corresponding
sites. The inputs to this process are the local conceptual schema and the access pattern
information about the fragments in them.
The result is some form of feedback, which may result in backing up to one of the earlier
steps in the design.
• Bottom-up design strategy
Bottom-up approach
The databases already exist at a number of sites
The databases should be connected to solve common tasks
In a bottom-up approach for distributed databases, the design process starts with existing
databases at individual sites, and the goal is to connect and integrate these databases to solve
common tasks or address shared requirements. This approach acknowledges the decentralized
nature of the databases and aims to establish connectivity and coordination among them.
The bottom-up approach recognizes the autonomy of existing databases at individual sites
and focuses on creating a distributed solution that addresses specific collaboration or task
requirements. This approach often involves practical integration steps and connectivity
measures to bring together databases that were not initially designed to work collaboratively.
4. Consider the global relations:
PATIENT(NUMBER, NAME, SSN, AMOUNT-DUE, DEPT,DOCTOR,
MED-TREATMENT)
DEPARTMENT(DEPT, LOCATION, DIRECTOR)
STAFF(STAFFNUM, DIRECTOR, TASK)
Define their fragmentation as follows:
(a) DEPARTMENT has a horizontal fragmentation by LOCATION, with two locations; each
department is conducted by one DIRECTOR.
(b) There are several staffs members for each department, led by the department’s director.
STAFF has a horizontal fragmentation derived from that of DEPARTMENT and a semi-join
on the DIRECTOR attribute. Which assumption is required in order to assure completeness?
And disjointness?
(c) PATIENT has a mixed fragmentation: attributes NUMBER, NAME, SSN, and
AMOUNT-DUE constitute a vertical fragment used for accounting purposes; attributes
NUMBER, NAME, DEPT, DOCTOR, and MED-TREATMENT constitute a vertical
fragment used for describing cares.
This last fragment has a horizontal fragmentation derived from that of DEPARTMENT and a
semi-join on the DEPT attribute.
Which assumption is required in order to assure completeness? And disjointness?
Give also the reconstruction of global relations from fragments.
Solution
The assumptions needed for completeness & disjointness, and the reconstruction
formulas.
1) Global relations
PATIENT(NUMBER, NAME, SSN, AMOUNT-DUE, DEPT,
DOCTOR, MED-TREATMENT)
DEPARTMENT(DEPT, LOCATION, DIRECTOR)
STAFF(STAFFNUM, DIRECTOR, TASK)
These formulas, plus the listed assumptions (FKs and FDs), ensure
completeness and disjointness of the fragments and allow exact
reconstruction of the global relations.
OR
5. Consider the following global, fragmentation, and allocation schemata:
Global schema : STUDENT(NUMBER, NAME, DEPT)
Fragmentation schema : STUDENT1 = SLDept=”EE” STUDENT
STUDENT2 = SLDept=“CS” STUDENT
Allocation schema : STUDENT1, at sites 1,2
STUDENT2 at sites 3,4
(Assume that “EE” and “CS” are the only possible values for DEPT).
(a) Write an application that requires the student number from the terminal and
outputs the name and department, at levels 1, 2, and 3 of transparency.
(b) Write an application that moves the student having number 232 from department “EE” to
department “CS”, at levels 1, 2, and 3 of transparency.
(c) Write an application that moves a student whose number and department are given at the
terminal to the other department, at level 2 of transparency.
(d) Consider the case in which application 1 is repeated for many possible values of the
student number.
Write the application
Accessing the database for each student number given at the terminal
Accessing the database after having collected several inputs from the terminal
Accessing the database before collecting inputs from the terminal
Conceptual Schema
EMP(ENO,ENAME,TITLE)
PROJ(PNO , PNAME,BUDGET,LOC )
PAY(TITLE , SAL)
ASG(ENO , PNO ,RESP,DUR)
Problem 6(2.1) Given relation EMP as in Fig. 2.2, let p1: TITLE < "Programmer" and p2: TITLE >
“Programmer” be two simple predicates. Assume that character strings have an order among them, based on the
alphabetical order.
(a) Perform a horizontal fragmentation of relation EMP with respect to {p1, p2}.
(b) Explain why the resulting fragmentation (EMP1, EMP2) does not fulfill the correctness rules of
fragmentation.
(c) Modify the predicates p1 and p2 so that they partition EMP obeying the correctness rules of fragmentation.
To do this, modify the predicates, compose all minterm predicates and deduce the corresponding implications,
and then perform a horizontal fragmentation of EMP based on these minterm predicates.
Finally, show that the result has completeness, reconstruction, and disjointness properties.
OR
Problem 7. (2.2) Consider relation ASG in Fig. 2.2. Suppose there are two applications that access ASG.
The first is issued at five sites and attempts to find the duration of assignment of employees given their numbers.
Assume that managers, consultants, engineers, and programmers are located at four different sites.
The second application is issued at two sites where the employees with an assignment duration of less than 20
months are managed at one site, whereas those with longer duration are managed at a second site.
Derive the primary horizontal fragmentation of ASG using the foregoing information.