501 Data Base Exercises Solution
501 Data Base Exercises Solution
i. Object Id: A unique identifier assigned to each object in an object-oriented database that
remains constant throughout the object's lifetime, regardless of changes to the object's
attributes or state. The OID is used by the system to locate and reference objects.
ii. Atomic Attributes: Attributes that are considered indivisible units of data within an
object. These are the simplest form of attributes that cannot be broken down into smaller
components. Examples include integers, strings, and boolean values.
iii. Polymorphism: The ability of different classes to respond to the same message or method
invocation in different ways. It allows objects of different classes to be treated as objects of a
common superclass, enabling a single interface to represent different underlying
implementations.
1. Complex Data Representation: Modern applications deal with complex data structures that
are difficult to represent in traditional relational models.
2. Real-world Modeling: Object-oriented systems better represent real-world entities and
relationships through inheritance, encapsulation, and polymorphism.
3. Application Integration: They provide a seamless integration between application programs
and database systems by using the same data model.
4. Performance: For complex operations and queries, OODBMS can outperform relational
systems by avoiding complex joins.
5. Extensibility: They allow for easier extension of data types and operations.
6. Multimedia Support: Better handling of multimedia and large binary objects.
7. Software Engineering Benefits: Reduced impedance mismatch between programming
languages and database systems.
i. Operator Overload: The ability to redefine the behavior of operators like +, -, *, / for
custom objects. This allows operators to perform different operations based on the types of
operands, enhancing code readability and expressiveness.
ii. Entry Point to a Database: A designated object or collection that serves as the starting
point for navigating the database. From this entry point, applications can traverse
relationships to access other objects. Examples include named root objects or extent
collections.
iii. Substitutability: The principle that allows an object of a derived class to be used
wherever an object of its base class is expected. This is a key aspect of polymorphism that
enables flexible design and implementation of object-oriented systems.
Question 5/8: What is the shortcoming of using an object's attribute for OID?
What is object persistence? Explain ways by which they are handled in a
typical OOD system.
1. Attributes may change during the object's lifetime, while OIDs should remain constant.
2. Attributes may not be unique across all objects in the system.
3. If composite attributes are used, complexity increases in accessing and managing OIDs.
4. Performance issues can arise when searching for objects by attribute-based OIDs.
Object Persistence refers to the ability of objects to continue existing beyond the execution
of the program that created them, allowing them to be stored in non-volatile memory and
later retrieved.
1. Persistence by Reachability: Objects that are reachable from designated persistent roots are
automatically made persistent.
2. Explicit Persistence: Objects are made persistent through explicit operations (save, store).
3. Persistent Classes: Some systems designate certain classes as persistent, making all their
instances automatically persistent.
4. Orthogonal Persistence: Persistence is independent of object type and is handled
transparently by the system.
5. Serialization: Converting objects to a storable format and deserializing them when needed.
6. Object-Relational Mapping: Mapping objects to relational database tables.
7. Native Storage: Using specialized storage engines designed for object data models.
Question 6/32: Rationalize the necessity for object-orientation in state-of-the-
art integrated data management system.
1. Collection: The base interface for all collection types, providing common operations
like add, remove, contains, and iteration over elements. It represents a group of
objects without specifying order or uniqueness constraints.
2. Set: A collection that cannot contain duplicate elements. It models the mathematical
concept of a set where each element occurs exactly once. Set operations like union,
intersection, and difference are supported.
3. Bag: A collection that can contain duplicate elements, similar to a multiset in
mathematics. It allows multiple occurrences of the same element and tracks the count
of each element.
4. Array: An ordered collection where elements are accessed by their position (index).
Arrays have a fixed length and support random access to elements based on their
numerical index.
5. Dictionary: A collection that maps keys to values, also known as an associative array
or map. Each key in a dictionary can map to exactly one value, and keys must be
unique. Elements are accessed by their keys rather than by position.
Question 10: Describe how collection and large object types are handled in
object-relational databases.
1. Array Types: Supported through specialized array data types that can be stored in a single
column.
2. Nested Tables: Collections implemented as separate tables linked to the parent table.
3. VARRAYs: Variable-length arrays with a maximum size constraint.
4. Multi-valued Attributes: Represented using specialized collection types or through
normalization.
5. User-Defined Collection Types: Custom collection types defined using the database's type
system.
1. BLOBs (Binary Large Objects): For storing unstructured binary data like images or videos.
2. CLOBs (Character Large Objects): For storing large text data.
3. Reference Mechanisms: Storing references to external files rather than the objects
themselves.
4. Streaming Interfaces: APIs for efficiently accessing and manipulating portions of large
objects.
5. Specialized Storage Engines: Using optimized storage techniques for large objects.
6. Chunking: Breaking large objects into manageable chunks for efficient storage and retrieval.
7. Compression: Automatic compression and decompression of large objects.
8. Caching Mechanisms: Special caching strategies for large object data.
Question 11/29: What are the differences and similarities between objects and
literals in the ODMG Object Model? / Give three differences between objects
and literals.
Differences:
1. Identity: Objects have a unique identity (OID) independent of their state, while literals do
not have identity and are identified solely by their value.
2. Reference: Objects can be referenced by other objects, whereas literals cannot be
referenced—they are always embedded within objects.
3. Mutability: Objects are typically mutable (their state can change while maintaining the same
identity), while literals are immutable.
4. Storage: Objects are stored independently and can be shared, while literals are stored as
part of the objects that contain them.
5. Lifecycle: Objects have independent lifecycles and can persist independently, while literals
exist only as part of their containing objects.
Similarities:
Question 12.
Speedup:
Definition: Measures how much faster a system performs when additional processors are
added while keeping the workload constant.
Formula: Speedup(n) = Time(1) / Time(n), where Time(1) is execution time on a single
processor and Time(n) is execution time on n processors.
Focus: Reducing response time for a fixed-size problem.
Ideal case: Linear speedup, where doubling the processors halves the execution time.
Scaleup:
1. Startup Costs: Time spent initializing parallel tasks, setting up environments, and
distributing work.
2. Interference: Contention for shared resources (memory, disks, network) causes
processes to wait, reducing efficiency.
3. Skew: Uneven distribution of workload among processors, causing some to finish
early while others become bottlenecks.
4. Communication Overhead: Time and resources spent on message passing and
coordination between parallel processes.
5. Serialization: Portions of algorithms that cannot be parallelized, limiting maximum
potential speedup (Amdahl's Law).
6. System Overhead: Additional OS and management tasks required to maintain the
parallel environment.
7. Data Dependencies: When tasks must wait for results from other tasks, limiting
parallelization opportunities.
8. Diminishing Returns: As more processors are added, the incremental benefit per
processor decreases due to coordination costs.
9. Hardware Limitations: Memory bandwidth, interconnect capacity, and network
latency can all become bottlenecks.
10. Software Limitations: Database management systems may not be optimized for
highly parallel execution.
Question 13
Question 14
1. Storage Management: Organizing and managing physical data storage, including file
structures, indices, and storage allocation.
2. Buffer Management: Coordinating the use of memory buffers to minimize disk I/O
operations and improve performance.
3. Access Method Implementation: Providing efficient ways to retrieve data through
various access paths like indexing, hashing, and sequential scans.
4. Data Replication: Managing copies of data across multiple locations to improve
availability and performance.
5. Physical Data Organization: Optimizing how data is physically stored on disk,
including clustering related data, partitioning large tables, and managing data
compression.
1. User-Defined Types (UDTs): SQL was extended to allow the definition of custom
data types, complete with attributes and methods, similar to classes in object-oriented
programming. This enables the representation of complex data structures within the
relational model.
2. Table Inheritance: SQL extensions allow tables to inherit from other tables,
mirroring the inheritance concept in object-oriented programming. Child tables inherit
columns from parent tables and can add their own specific columns.
3. Reference Types and Collections: SQL extensions include the ability to define
reference types (REF) that point to row objects in tables, similar to object references
in OO languages. Additionally, collection types like arrays, nested tables, and varying
arrays were added to support multi-valued attributes.
Question 16:
Data Fragmentation in Distributed Databases
Data fragmentation is a technique used in distributed database systems where relations
(tables) are divided into smaller pieces called fragments, which are then distributed across
different sites in a network.
1. Horizontal Fragmentation
Fragment 1 (New York office): SELECT * FROM Employee WHERE Location = 'New
York'
2. Vertical Fragmentation
There's also a third type called Hybrid/Mixed Fragmentation, which combines both
horizontal and vertical fragmentation techniques.
1. Improved Performance: Data is stored close to where it's most frequently used,
reducing network traffic and access time.
2. Parallel Execution: Multiple fragments can be processed in parallel, increasing
throughput.
3. Security: Sensitive data can be stored at sites with appropriate security measures.
4. Local Autonomy: Local sites have control over their data while still participating in
the distributed system.
5. Reduced Impact of Failures: If one site fails, only its fragments become unavailable,
not the entire database.
6. Storage Optimization: Only relevant data is stored at each site, optimizing storage
use.
Question 17:
Functions:
1. Global Query Processing: Analyzes, optimizes, and coordinates the execution of queries that
span multiple sites.
2. Global Transaction Management: Ensures ACID properties for transactions that involve data
from multiple sites.
3. Global Directory Management: Maintains metadata about the distribution of data across
sites (fragmentation schema, replication schema).
4. Global Optimization: Determines the most efficient execution strategy for distributed
queries.
5. Access Control: Enforces security and access policies across the distributed system.
6. Data Integration: Combines results from different sites into a coherent response.
7. Distributed Concurrency Control: Coordinates locking or versioning across multiple sites.
8. Global Recovery Management: Manages recovery procedures for multi-site transactions
after failures.
Functions:
1. Local Query Processing: Handles queries that can be executed entirely at the local site.
2. Local Transaction Management: Maintains ACID properties for local transactions.
3. Local Storage Management: Controls the physical storage of data fragments at each site.
4. Local Recovery: Manages the recovery of the local database after failures.
5. Local Concurrency Control: Implements concurrency control for local transactions.
6. Local Resource Management: Optimizes use of local resources (CPU, memory, I/O).
7. Communication Interface: Interacts with the Global System Manager and other sites.
8. Data Access Methods: Implements efficient methods for accessing and manipulating local
data.
Question 18
Network Failures
Link Failures: Communication links between sites can break, resulting in network
partitioning. When this happens, sites continue to operate but cannot communicate with each
other, potentially leading to inconsistent states if updates occur on both sides of the partition.
Message Loss: Messages between sites may be lost in transmission due to network
congestion or packet drops. This can cause transaction coordination issues if
acknowledgments or commit messages fail to reach their destination.
Network Congestion: High traffic can cause significant delays, making messages arrive too
late to be useful or causing timeouts, which may be mistakenly interpreted as failures.
Site Failures
Site Crashes: Individual sites may crash while others remain operational. This partial system
failure requires special recovery mechanisms to maintain global consistency.
Byzantine Failures: Some sites may continue operating but produce incorrect results or
behave unpredictably. These are particularly challenging because the faulty sites appear
active but cannot be trusted.
Clock Drift: Physical clocks at different sites run at slightly different rates, causing
timestamps to become increasingly misaligned over time. This affects time-based protocols
and can lead to incorrect ordering of events.
Clock Skew: Even when clocks run at the same rate, they may show different absolute times,
complicating timestamp-based coordination.
Transaction and Coordination Failures
Coordinator Failures: In two-phase commit and similar protocols, if the coordinator fails
during the protocol execution, participating sites may be left in an uncertain state, waiting
indefinitely.
Blocking Problems: After a failure, some protocols can leave the system in a state where
operational sites cannot proceed with their work until failed components recover.
Data-Related Failures
Replication Inconsistency: When data is replicated across multiple sites, failures during
update propagation can lead to inconsistent copies of the same data.
Lost Updates: In distributed environments, concurrent updates combined with failures can
lead to lost updates if proper coordination mechanisms aren't in place.
These failure types require specialized mechanisms beyond those used in centralized systems:
Question 19
The Two-Phase Commit (2PC) protocol is a distributed algorithm used to ensure atomicity in
distributed transactions. When a transaction T completes execution across multiple sites and
the coordinator Ci initiates the 2PC protocol, the following occurs:
Recovery Procedures:
If a participant fails during the protocol, it consults its log upon recovery:
o If it has a "commit" record, it commits the transaction
o If it has an "abort" record, it aborts the transaction
o If it has a "prepare" record but no decision record, it must contact the
coordinator
If the coordinator fails:
o After writing a "commit" or "abort" record, it completes that action upon
recovery
o If it fails before deciding, it can abort the transaction upon recovery
The 2PC protocol ensures atomicity but can lead to blocking if the coordinator fails after the
prepare phase, as participants may need to wait for the coordinator to recover.
Persistent messaging protocols ensure reliable message delivery even when the underlying
infrastructure is unreliable. Key protocols include:
Store-and-Forward Protocol:
Once-and-Only-Once Delivery:
Combines unique message IDs with acknowledgments.
Receivers track IDs of previously processed messages.
Messages with already-processed IDs are acknowledged but not processed again.
Transaction-Based Messaging:
These protocols ensure that messages are delivered reliably despite node failures, network
issues, or message loss.
A single site acts as the central lock manager for all replicated data.
All lock requests are directed to this central site.
If the primary copy of a data item is unavailable, the transaction can access a replica.
Advantages: Simple implementation, centralized control.
Disadvantages: Single point of failure if the lock manager fails.
A transaction must obtain locks from a majority of sites that hold replicas.
Updates are propagated to all available replicas.
If a majority of sites are available, the transaction can proceed.
This ensures that any two transactions accessing the same data item will have at least
one site in common.
Advantages: Can continue processing even if some sites are unavailable.
Disadvantages: Higher overhead due to multiple lock acquisitions, potential for
deadlocks.
1. Partitioning (Fragmentation):
o Horizontal: Table rows distributed across sites
o Vertical: Table columns distributed across sites
o Mixed: Combination of horizontal and vertical
2. Replication:
o Full replication: Complete copy at each site
o Partial replication: Subset of data at each site
o No replication: Data stored at only one site
3. Allocation Strategies:
o Centralized: All data at one site
o Distributed: Data spread across sites
o Hybrid: Combination of centralized and distributed
Processor Architecture:
Storage System:
Network Infrastructure:
These underlying computer system aspects drive key database design decisions, including
buffer management strategies, query optimization techniques, indexing structures, and
parallelization approaches.
Question 30: Describe how practical usage of the basic concepts of object-
orientation in databases can be realized from the abstractive knowledge of the
subject by expressing them in some languages in two ways.
i. Substitutability: The principle that a derived class object can be used anywhere a base
class object is expected. This enables polymorphic behavior and is fundamental to
implementing inheritance hierarchies effectively in databases.
ii. Factory Object: A special object responsible for creating instances of other objects.
Factory objects encapsulate object creation logic, allowing the system to decide which
concrete class to instantiate based on various factors at runtime. This pattern is useful for
implementing complex object creation processes in databases.
iii. Polymorphism: The ability of objects of different classes to respond to the same message
or method invocation differently. In the context of OODBs, polymorphism allows operations
to behave differently depending on the actual type of the object, even when accessed through
a common interface or base class.
Question 33:
a. (i) Caption for Figure 1.0: "Class Hierarchy Diagram for Person, showing Employee and
Customer subclasses with their respective specializations"
// Employee class
class Employee extends Person {
Float salary;
Date hireDate;
String department;
Float getSalary();
void setSalary(Float newSalary);
String getDepartment();
void setDepartment(String newDepartment);
}
// Customer class
class Customer extends Person {
String customerID;
Float creditLimit;
String getCustomerID();
Float getCreditLimit();
void setCreditLimit(Float newLimit);
}
Date getContractEndDate();
void setContractEndDate(Date newEndDate);
Boolean isContractActive();
}
String getOfficeLocation();
void setOfficeLocation(String newLocation);
void addSupervisor(Employee supervisor);
Employee[] getSupervisorsAssisted();
}
b. Two main differences between the semi-structured model and the object model:
1. Schema Flexibility: Semi-structured models (like XML) have flexible schemas where
structure can vary and be partially defined or missing, while object models typically have
rigid schemas with well-defined classes and relationships.
2. Identity Concept: Object models rely heavily on object identity (OIDs), while semi-structured
models often use path expressions or value-based identification.
XML Schema: A language for defining the structure, content, and semantics of XML
documents. It provides facilities for defining elements, attributes, data types, and
constraints. XML Schema is more powerful than DTDs, offering features like inheritance,
user-defined types, and namespaces.
XPath: A query language used to navigate and select nodes in an XML document. It uses
path expressions to identify and process nodes based on various criteria including element
names, attribute values, and hierarchical relationships. XPath is fundamental to other XML
technologies like XSLT and XQuery.
Question 34: Give two main differences between the structured and semi-
structured model.
1. Schema Definition: Structured models (like relational databases) require a predefined
schema that strictly defines the structure before data can be stored, while semi-
structured models allow data to be stored without a predefined schema or with a
flexible schema that can vary across instances.
2. Data Homogeneity: In structured models, all records of the same type follow
identical structure, while in semi-structured models, similar entities may have
different attributes or structures, allowing for heterogeneity in data representation.
Structured models typically have fixed data types while semi-structured models have more
flexible typing
Structured models use rigid relationships while semi-structured models use more flexible,
often hierarchical relationships
Query languages for structured models are typically more standardized (SQL) than for semi-
structured models
Question 12: Two important issues in studying parallelism are speedup and
scaleup mechanism. Differentiate the two. A number of factors work against
efficient parallel operation and can diminish both speedup and scaleup.
Enumerate these factors.
Speedup:
Refers to how much faster a task executes on a larger system compared to a smaller one
Measured as: time on smaller system / time on larger system for the same workload
Focuses on reducing response time for a fixed problem size
Formula: Speedup(n) = T(1) / T(n), where T(1) is time on one processor and T(n) is time on n
processors
Scaleup:
Refers to the ability to maintain performance as both system size and workload increase
proportionally
Measured as: throughput on larger system / through