0% found this document useful (0 votes)
13 views

501 Data Base Exercises Solution

The document discusses key concepts in object-oriented databases, including unique object identifiers, atomic attributes, and polymorphism. It highlights the need for object-oriented integrated data management systems due to their ability to handle complex data, model real-world entities, and improve performance. Additionally, it covers data fragmentation in distributed databases and various extensions made to SQL for object-oriented capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

501 Data Base Exercises Solution

The document discusses key concepts in object-oriented databases, including unique object identifiers, atomic attributes, and polymorphism. It highlights the need for object-oriented integrated data management systems due to their ability to handle complex data, model real-world entities, and improve performance. Additionally, it covers data fragmentation in distributed databases and various extensions made to SQL for object-oriented capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Database Concepts: Comprehensive Answers

Object-Oriented Database Concepts

Question 1: Briefly describe the following in object-oriented database


paradigm:

i. Object Id: A unique identifier assigned to each object in an object-oriented database that
remains constant throughout the object's lifetime, regardless of changes to the object's
attributes or state. The OID is used by the system to locate and reference objects.

ii. Atomic Attributes: Attributes that are considered indivisible units of data within an
object. These are the simplest form of attributes that cannot be broken down into smaller
components. Examples include integers, strings, and boolean values.

iii. Polymorphism: The ability of different classes to respond to the same message or method
invocation in different ways. It allows objects of different classes to be treated as objects of a
common superclass, enabling a single interface to represent different underlying
implementations.

Question 2: Justify the need for object-oriented integrated data management


system in contemporary data processing.

Object-oriented integrated data management systems are necessary in contemporary data


processing for several reasons:

1. Complex Data Representation: Modern applications deal with complex data structures that
are difficult to represent in traditional relational models.
2. Real-world Modeling: Object-oriented systems better represent real-world entities and
relationships through inheritance, encapsulation, and polymorphism.
3. Application Integration: They provide a seamless integration between application programs
and database systems by using the same data model.
4. Performance: For complex operations and queries, OODBMS can outperform relational
systems by avoiding complex joins.
5. Extensibility: They allow for easier extension of data types and operations.
6. Multimedia Support: Better handling of multimedia and large binary objects.
7. Software Engineering Benefits: Reduced impedance mismatch between programming
languages and database systems.

Question 3: List means to handle complexities arising from naming


ambiguities in Multiple Inheritance.

1. Explicit Qualification: Using class names to qualify ambiguous attribute or method


references (e.g., ClassA::method() vs ClassB::method()).
2. Precedence Rules: Establishing a predefined order for resolving conflicts (e.g., left-to-right
search in inheritance hierarchy).
3. Method Renaming: Explicitly renaming conflicting methods during inheritance.
4. Virtual Inheritance: Using virtual base classes in languages like C++ to ensure a single
instance of a base class.
5. Interface Implementation: Using interfaces (as in Java) instead of multiple inheritance of
implementation.
6. Delegation: Using composition and delegation instead of inheritance.
7. Linearization: Creating a linear ordering of inherited classes (as in Python's Method
Resolution Order).

Question 4/7: Briefly describe the following in object-oriented database:

i. Operator Overload: The ability to redefine the behavior of operators like +, -, *, / for
custom objects. This allows operators to perform different operations based on the types of
operands, enhancing code readability and expressiveness.

ii. Entry Point to a Database: A designated object or collection that serves as the starting
point for navigating the database. From this entry point, applications can traverse
relationships to access other objects. Examples include named root objects or extent
collections.

iii. Substitutability: The principle that allows an object of a derived class to be used
wherever an object of its base class is expected. This is a key aspect of polymorphism that
enables flexible design and implementation of object-oriented systems.

Question 5/8: What is the shortcoming of using an object's attribute for OID?
What is object persistence? Explain ways by which they are handled in a
typical OOD system.

Shortcomings of using an object's attribute for OID:

1. Attributes may change during the object's lifetime, while OIDs should remain constant.
2. Attributes may not be unique across all objects in the system.
3. If composite attributes are used, complexity increases in accessing and managing OIDs.
4. Performance issues can arise when searching for objects by attribute-based OIDs.

Object Persistence refers to the ability of objects to continue existing beyond the execution
of the program that created them, allowing them to be stored in non-volatile memory and
later retrieved.

Ways persistence is handled in OOD systems:

1. Persistence by Reachability: Objects that are reachable from designated persistent roots are
automatically made persistent.
2. Explicit Persistence: Objects are made persistent through explicit operations (save, store).
3. Persistent Classes: Some systems designate certain classes as persistent, making all their
instances automatically persistent.
4. Orthogonal Persistence: Persistence is independent of object type and is handled
transparently by the system.
5. Serialization: Converting objects to a storable format and deserializing them when needed.
6. Object-Relational Mapping: Mapping objects to relational database tables.
7. Native Storage: Using specialized storage engines designed for object data models.
Question 6/32: Rationalize the necessity for object-orientation in state-of-the-
art integrated data management system.

1. Natural Modeling: Object-orientation provides a more natural way to model real-world


entities and relationships.
2. Complex Data Handling: Modern applications require handling of complex data types like
multimedia, geospatial data, and time series, which is better supported by object models.
3. Reduced Impedance Mismatch: OO databases reduce the gap between programming
languages and database systems.
4. Extensibility: They allow for easier extension with new data types and operations.
5. Encapsulation: Business logic can be encapsulated with data, improving maintainability and
reusability.
6. Support for Modern Applications: Web, mobile, and IoT applications benefit from object-
oriented representations.
7. Performance for Complex Operations: Certain complex operations perform better in an OO
model than in relational models.
8. Integration with OO Development: Seamless integration with object-oriented development
methodologies and frameworks.
9. Inheritance and Polymorphism: Support for these concepts enables more flexible and
powerful data models.
10. Schema Evolution: OO systems often provide better support for evolving schemas over
time.

Question 9: Describe these built-in interfaces of the ODMG Object Model:


Collection, Set, Bag, Array and Dictionary.

1. Collection: The base interface for all collection types, providing common operations
like add, remove, contains, and iteration over elements. It represents a group of
objects without specifying order or uniqueness constraints.
2. Set: A collection that cannot contain duplicate elements. It models the mathematical
concept of a set where each element occurs exactly once. Set operations like union,
intersection, and difference are supported.
3. Bag: A collection that can contain duplicate elements, similar to a multiset in
mathematics. It allows multiple occurrences of the same element and tracks the count
of each element.
4. Array: An ordered collection where elements are accessed by their position (index).
Arrays have a fixed length and support random access to elements based on their
numerical index.
5. Dictionary: A collection that maps keys to values, also known as an associative array
or map. Each key in a dictionary can map to exactly one value, and keys must be
unique. Elements are accessed by their keys rather than by position.

Question 10: Describe how collection and large object types are handled in
object-relational databases.

Collection Types Handling:

1. Array Types: Supported through specialized array data types that can be stored in a single
column.
2. Nested Tables: Collections implemented as separate tables linked to the parent table.
3. VARRAYs: Variable-length arrays with a maximum size constraint.
4. Multi-valued Attributes: Represented using specialized collection types or through
normalization.
5. User-Defined Collection Types: Custom collection types defined using the database's type
system.

Large Object Types Handling:

1. BLOBs (Binary Large Objects): For storing unstructured binary data like images or videos.
2. CLOBs (Character Large Objects): For storing large text data.
3. Reference Mechanisms: Storing references to external files rather than the objects
themselves.
4. Streaming Interfaces: APIs for efficiently accessing and manipulating portions of large
objects.
5. Specialized Storage Engines: Using optimized storage techniques for large objects.
6. Chunking: Breaking large objects into manageable chunks for efficient storage and retrieval.
7. Compression: Automatic compression and decompression of large objects.
8. Caching Mechanisms: Special caching strategies for large object data.

Question 11/29: What are the differences and similarities between objects and
literals in the ODMG Object Model? / Give three differences between objects
and literals.

Differences:

1. Identity: Objects have a unique identity (OID) independent of their state, while literals do
not have identity and are identified solely by their value.
2. Reference: Objects can be referenced by other objects, whereas literals cannot be
referenced—they are always embedded within objects.
3. Mutability: Objects are typically mutable (their state can change while maintaining the same
identity), while literals are immutable.
4. Storage: Objects are stored independently and can be shared, while literals are stored as
part of the objects that contain them.
5. Lifecycle: Objects have independent lifecycles and can persist independently, while literals
exist only as part of their containing objects.

Similarities:

1. Both can have structured types with multiple attributes.


2. Both can be used to represent data in the object model.
3. Both can have operations/methods associated with them.
4. Both can be organized in type hierarchies.

Question 12.

Speedup vs. Scaleup

Speedup:
 Definition: Measures how much faster a system performs when additional processors are
added while keeping the workload constant.
 Formula: Speedup(n) = Time(1) / Time(n), where Time(1) is execution time on a single
processor and Time(n) is execution time on n processors.
 Focus: Reducing response time for a fixed-size problem.
 Ideal case: Linear speedup, where doubling the processors halves the execution time.

Scaleup:

 Definition: Measures a system's ability to maintain performance when both system


resources and workload increase proportionally.
 Formula: Scaleup(n) = Performance(n) / Performance(1), where n represents both system
size and workload size increases.
 Focus: Maintaining consistent response times as both system and problem size grow.
 Ideal case: Linear scaleup, where system performance remains constant as both data volume
and resources increase proportionally.

Factors Diminishing Speedup and Scaleup

1. Startup Costs: Time spent initializing parallel tasks, setting up environments, and
distributing work.
2. Interference: Contention for shared resources (memory, disks, network) causes
processes to wait, reducing efficiency.
3. Skew: Uneven distribution of workload among processors, causing some to finish
early while others become bottlenecks.
4. Communication Overhead: Time and resources spent on message passing and
coordination between parallel processes.
5. Serialization: Portions of algorithms that cannot be parallelized, limiting maximum
potential speedup (Amdahl's Law).
6. System Overhead: Additional OS and management tasks required to maintain the
parallel environment.
7. Data Dependencies: When tasks must wait for results from other tasks, limiting
parallelization opportunities.
8. Diminishing Returns: As more processors are added, the incremental benefit per
processor decreases due to coordination costs.
9. Hardware Limitations: Memory bandwidth, interconnect capacity, and network
latency can all become bottlenecks.
10. Software Limitations: Database management systems may not be optimized for
highly parallel execution.

Question 13

Reasons for Building Distributed Database Systems


1. Data Locality: Storing data closer to where it's most frequently accessed, reducing
network latency and improving response times.
2. Reliability and Availability: Enabling continued operation even if some sites fail, by
replicating data across multiple locations.
3. Performance Improvement: Distributing workload across multiple sites to avoid
bottlenecks and increase throughput.
4. Scalability: Adding new sites and resources incrementally as the system grows,
without major redesign.
5. Economic Factors: Leveraging cheaper computational resources at multiple small
sites compared to a single large mainframe system.
6. Organizational Structure: Reflecting the distributed nature of many organizations
with autonomous departments or branches.
7. Incremental Growth: Expanding the system by adding new sites without disrupting
existing operations.
8. Local Autonomy: Allowing local sites to maintain control over their data while still
participating in the distributed system.
9. Communication Cost Reduction: Minimizing data transfer across expensive or slow
network links.
10. Query Processing Optimization: Executing queries at sites where data resides rather
than transferring large datasets.
11. Load Balancing: Distributing query processing across multiple sites to avoid
overloading any single site.
12. Improved Resource Sharing: Enabling users at different sites to access data from the
entire organization

Question 14

Two Broad Categories of Server Systems

i. Transaction Servers and Data Servers

1. Transaction Servers: Transaction servers focus on processing client requests and


managing transaction execution. They handle the application logic, transaction
management, and coordinate with data servers to fulfill client requests. They're
primarily concerned with business logic and transaction processing rather than data
storage.
2. Data Servers: Data servers are responsible for data storage, retrieval, and
management. They focus on maintaining the physical data, handling storage
structures, and providing efficient data access methods. These servers respond to data
requests from transaction servers.

ii. Five Processes Involved in Each Category

Transaction Server Processes:

1. Transaction Management: Coordinating the execution of transactions, ensuring


ACID properties (Atomicity, Consistency, Isolation, Durability) are maintained
throughout transaction processing.
2. Query Processing: Parsing, validating, and optimizing client queries before sending
them to data servers for execution.
3. Connection Management: Handling client connections, authentication, and session
management to maintain the state of ongoing client interactions.
4. Concurrency Control: Managing simultaneous access to shared resources,
implementing locking mechanisms, and preventing conflicts between concurrent
transactions.
5. Recovery Management: Handling system failures, implementing logging
mechanisms, and ensuring that the system can recover to a consistent state after
failures.

Data Server Processes:

1. Storage Management: Organizing and managing physical data storage, including file
structures, indices, and storage allocation.
2. Buffer Management: Coordinating the use of memory buffers to minimize disk I/O
operations and improve performance.
3. Access Method Implementation: Providing efficient ways to retrieve data through
various access paths like indexing, hashing, and sequential scans.
4. Data Replication: Managing copies of data across multiple locations to improve
availability and performance.
5. Physical Data Organization: Optimizing how data is physically stored on disk,
including clustering related data, partitioning large tables, and managing data
compression.

Question 15: Describe three object database extensions made to SQL.

1. User-Defined Types (UDTs): SQL was extended to allow the definition of custom
data types, complete with attributes and methods, similar to classes in object-oriented
programming. This enables the representation of complex data structures within the
relational model.
2. Table Inheritance: SQL extensions allow tables to inherit from other tables,
mirroring the inheritance concept in object-oriented programming. Child tables inherit
columns from parent tables and can add their own specific columns.
3. Reference Types and Collections: SQL extensions include the ability to define
reference types (REF) that point to row objects in tables, similar to object references
in OO languages. Additionally, collection types like arrays, nested tables, and varying
arrays were added to support multi-valued attributes.

Other notable extensions include:

 Method definitions within UDTs


 Polymorphic table hierarchies
 Row types (structured types)
 Object identifiers through the use of REF types

Question 16:
Data Fragmentation in Distributed Databases
Data fragmentation is a technique used in distributed database systems where relations
(tables) are divided into smaller pieces called fragments, which are then distributed across
different sites in a network.

Types of Data Fragmentation

1. Horizontal Fragmentation

Horizontal fragmentation divides a relation by rows, creating subsets of tuples. Each


fragment contains certain rows of the relation based on specific conditions.

Illustration: Consider a table Employee(EmpID, Name, Salary, DeptID, Location):

Fragment 1 (New York office): SELECT * FROM Employee WHERE Location = 'New
York'

EmpID | Name | Salary | DeptID | Location


------------------------------------------
101 | Alice | 75000 | 10 | New York
105 | Charlie | 82000 | 20 | New York

Fragment 2 (London office): SELECT * FROM Employee WHERE Location = 'London'

EmpID | Name | Salary | DeptID | Location


----------------------------------------
102 | Bob | 65000 | 30 | London
106 | Diana | 70000 | 10 | London

2. Vertical Fragmentation

Vertical fragmentation divides a relation by columns, splitting attributes across fragments.


The primary key must be included in each fragment to allow reconstruction.

Illustration: For the same Employee table:

Fragment 1 (Personal info):

EmpID | Name | Location


------------------------
101 | Alice | New York
102 | Bob | London
105 | Charlie | New York
106 | Diana | London

Fragment 2 (Employment info):

EmpID | Salary | DeptID


---------------------
101 | 75000 | 10
102 | 65000 | 30
105 | 82000 | 20
106 | 70000 | 10

There's also a third type called Hybrid/Mixed Fragmentation, which combines both
horizontal and vertical fragmentation techniques.

Advantages of Data Fragmentation

1. Improved Performance: Data is stored close to where it's most frequently used,
reducing network traffic and access time.
2. Parallel Execution: Multiple fragments can be processed in parallel, increasing
throughput.
3. Security: Sensitive data can be stored at sites with appropriate security measures.
4. Local Autonomy: Local sites have control over their data while still participating in
the distributed system.
5. Reduced Impact of Failures: If one site fails, only its fragments become unavailable,
not the entire database.
6. Storage Optimization: Only relevant data is stored at each site, optimizing storage
use.

Disadvantages of Data Fragmentation

1. Complexity in Query Processing: Queries spanning multiple fragments need


coordination and reconstruction.
2. Increased Overhead: Managing and maintaining fragments adds system overhead.
3. Integrity Constraints: Enforcing database constraints across fragments is more
complex.
4. Reconstruction Costs: Combining fragments to answer queries can be expensive.
5. Design Challenges: Proper fragmentation design requires thorough analysis of access
patterns.
6. Potential for Imbalance: Workload may not be evenly distributed across fragments.
7. Join Operations: Performing joins across fragments at different sites is complex and
potentially expensive.

Question 17:

System Structure of a Distributed Database


Two Main Components

1. Global System Manager (GSM)

Functions:

1. Global Query Processing: Analyzes, optimizes, and coordinates the execution of queries that
span multiple sites.
2. Global Transaction Management: Ensures ACID properties for transactions that involve data
from multiple sites.
3. Global Directory Management: Maintains metadata about the distribution of data across
sites (fragmentation schema, replication schema).
4. Global Optimization: Determines the most efficient execution strategy for distributed
queries.
5. Access Control: Enforces security and access policies across the distributed system.
6. Data Integration: Combines results from different sites into a coherent response.
7. Distributed Concurrency Control: Coordinates locking or versioning across multiple sites.
8. Global Recovery Management: Manages recovery procedures for multi-site transactions
after failures.

2. Local System Manager (LSM)

Functions:

1. Local Query Processing: Handles queries that can be executed entirely at the local site.
2. Local Transaction Management: Maintains ACID properties for local transactions.
3. Local Storage Management: Controls the physical storage of data fragments at each site.
4. Local Recovery: Manages the recovery of the local database after failures.
5. Local Concurrency Control: Implements concurrency control for local transactions.
6. Local Resource Management: Optimizes use of local resources (CPU, memory, I/O).
7. Communication Interface: Interacts with the Global System Manager and other sites.
8. Data Access Methods: Implements efficient methods for accessing and manipulating local
data.

Question 18

Basic Failure Types in Distributed Database Systems


While distributed systems face traditional failures like hardware malfunctions and software
bugs, they also encounter unique failure types due to their distributed nature. Here are the key
additional failure types in distributed environments:

Network Failures

Link Failures: Communication links between sites can break, resulting in network
partitioning. When this happens, sites continue to operate but cannot communicate with each
other, potentially leading to inconsistent states if updates occur on both sides of the partition.

Message Loss: Messages between sites may be lost in transmission due to network
congestion or packet drops. This can cause transaction coordination issues if
acknowledgments or commit messages fail to reach their destination.

Network Congestion: High traffic can cause significant delays, making messages arrive too
late to be useful or causing timeouts, which may be mistakenly interpreted as failures.

Site Failures

Site Crashes: Individual sites may crash while others remain operational. This partial system
failure requires special recovery mechanisms to maintain global consistency.

Byzantine Failures: Some sites may continue operating but produce incorrect results or
behave unpredictably. These are particularly challenging because the faulty sites appear
active but cannot be trusted.

Clock Synchronization Problems

Clock Drift: Physical clocks at different sites run at slightly different rates, causing
timestamps to become increasingly misaligned over time. This affects time-based protocols
and can lead to incorrect ordering of events.

Clock Skew: Even when clocks run at the same rate, they may show different absolute times,
complicating timestamp-based coordination.
Transaction and Coordination Failures

Coordinator Failures: In two-phase commit and similar protocols, if the coordinator fails
during the protocol execution, participating sites may be left in an uncertain state, waiting
indefinitely.

Blocking Problems: After a failure, some protocols can leave the system in a state where
operational sites cannot proceed with their work until failed components recover.

Data-Related Failures

Replication Inconsistency: When data is replicated across multiple sites, failures during
update propagation can lead to inconsistent copies of the same data.

Lost Updates: In distributed environments, concurrent updates combined with failures can
lead to lost updates if proper coordination mechanisms aren't in place.

Dealing with Distributed Failures

These failure types require specialized mechanisms beyond those used in centralized systems:

 Distributed commit protocols (e.g., two-phase commit)


 Timeout-based failure detection
 Replication strategies
 Partition-aware algorithms
 Distributed consensus protocols (e.g., Paxos, Raft)
 Vector clocks and logical timestamps

Question 19

19. Two-Phase Commit (2PC) Protocol

The Two-Phase Commit (2PC) protocol is a distributed algorithm used to ensure atomicity in
distributed transactions. When a transaction T completes execution across multiple sites and
the coordinator Ci initiates the 2PC protocol, the following occurs:

Phase 1 - Prepare Phase:

 Coordinator Ci sends a "prepare" message to all participant sites.


 Each participant site determines if it can commit its part of the transaction.
 If a site can commit, it writes a "prepare" record to its log, forces the log to stable
storage, and sends a "ready" message to the coordinator.
 If a site cannot commit, it sends an "abort" message to the coordinator.

Phase 2 - Commit/Abort Phase:

 If all participants respond "ready," the coordinator decides to commit:


o Writes a "commit" record to its log
o Sends "commit" messages to all participants
o Participants commit their transactions locally and send acknowledgments
 If any participant responds "abort" or times out, the coordinator decides to abort:
o Writes an "abort" record to its log
o Sends "abort" messages to all participants
o Participants abort their transactions locally

Recovery Procedures:

 If a participant fails during the protocol, it consults its log upon recovery:
o If it has a "commit" record, it commits the transaction
o If it has an "abort" record, it aborts the transaction
o If it has a "prepare" record but no decision record, it must contact the
coordinator
 If the coordinator fails:
o After writing a "commit" or "abort" record, it completes that action upon
recovery
o If it fails before deciding, it can abort the transaction upon recovery

The 2PC protocol ensures atomicity but can lead to blocking if the coordinator fails after the
prepare phase, as participants may need to wait for the coordinator to recover.

20. Persistent Messaging Protocols

Persistent messaging protocols ensure reliable message delivery even when the underlying
infrastructure is unreliable. Key protocols include:

Store-and-Forward Protocol:

 Messages are stored persistently at each hop in the communication path.


 Each node acknowledges receipt only after the message is committed to stable
storage.
 If a node fails, messages are recovered from storage upon restart.

Message Queuing Protocol:

 Messages are placed in durable queues at the sender.


 A message remains in the sender's queue until an acknowledgment is received from
the receiver.
 Periodic retransmission occurs if acknowledgments are not received.

Message Logging and Recovery:

 All messages are logged to persistent storage before transmission.


 Message IDs are used to detect and eliminate duplicates.
 Sequence numbers track message order and identify gaps.

Once-and-Only-Once Delivery:
 Combines unique message IDs with acknowledgments.
 Receivers track IDs of previously processed messages.
 Messages with already-processed IDs are acknowledged but not processed again.

Transaction-Based Messaging:

 Message sending/receiving is part of a transaction.


 Messages are only considered sent when the transaction commits.
 If a transaction aborts, the message is not sent or is "rolled back."

These protocols ensure that messages are delivered reliably despite node failures, network
issues, or message loss.

21. Replication Protocols for High Availability

i. Single Lock-Manager Approach:

 A single site acts as the central lock manager for all replicated data.
 All lock requests are directed to this central site.
 If the primary copy of a data item is unavailable, the transaction can access a replica.
 Advantages: Simple implementation, centralized control.
 Disadvantages: Single point of failure if the lock manager fails.

ii. Distributed Lock Manager:

 Lock management is distributed across multiple sites.


 Each site manages locks for the data items it stores.
 For replicated data, a transaction must obtain locks from all sites holding replicas.
 If a site is unavailable, its locks can be obtained when it recovers.
 Advantages: No single point of failure, more scalable.
 Disadvantages: More complex, requires coordination among lock managers.

iii. Majority Protocol:

 A transaction must obtain locks from a majority of sites that hold replicas.
 Updates are propagated to all available replicas.
 If a majority of sites are available, the transaction can proceed.
 This ensures that any two transactions accessing the same data item will have at least
one site in common.
 Advantages: Can continue processing even if some sites are unavailable.
 Disadvantages: Higher overhead due to multiple lock acquisitions, potential for
deadlocks.

22. Distributed Data Storage and Replication

Distributed Data Storage Approaches:

1. Partitioning (Fragmentation):
o Horizontal: Table rows distributed across sites
o Vertical: Table columns distributed across sites
o Mixed: Combination of horizontal and vertical
2. Replication:
o Full replication: Complete copy at each site
o Partial replication: Subset of data at each site
o No replication: Data stored at only one site
3. Allocation Strategies:
o Centralized: All data at one site
o Distributed: Data spread across sites
o Hybrid: Combination of centralized and distributed

Advantages of Data Replication:

1. Improved availability: System can function even if some sites fail


2. Increased parallelism: Queries can be processed at multiple sites simultaneously
3. Reduced network traffic: Data can be accessed locally instead of remotely
4. Load balancing: Queries can be directed to less busy sites
5. Faster query response: Data can be accessed from the nearest site

Disadvantages of Data Replication:

1. Increased storage requirements: Multiple copies consume more space


2. Complex update propagation: Changes must be applied to all replicas
3. Potential inconsistency: Replicas may become out of sync
4. Increased update overhead: Updates must be propagated to all replicas
5. Concurrency control complexity: Coordinating updates across multiple sites

23. System Structure of Distributed Database

See Question 17 which I've already answered with a diagram.

24. Basic Failure Types in Distributed Systems

See Question 18 which I've already answered.

25. 2PC Protocol Discussion

See Question 19 which I've already answered.

26. Influence of Computer System on Database Architecture

The architecture of a database system is influenced by the underlying computer system in


several key aspects:

Processor Architecture:

 CPU speed and number of cores affect query processing capabilities


 SIMD/vectorization support impacts analytical query performance
 Multi-threading capabilities influence concurrent transaction handling
 Instruction set architecture affects database algorithm implementation
Memory Hierarchy:

 Cache sizes and levels impact database buffer management


 Memory bandwidth affects data movement efficiency
 RAM capacity influences in-memory processing capabilities
 Non-volatile memory technologies create new storage tier options

Storage System:

 Disk characteristics (SSD vs. HDD) affect I/O performance


 RAID configurations impact reliability and performance
 Storage interconnect (SATA, SAS, NVMe) influences data transfer rates
 Storage virtualization affects database file management

Network Infrastructure:

 Network bandwidth affects distributed query performance


 Network latency impacts distributed transaction processing
 Network topology influences partition planning
 Network reliability affects replication strategies

Operating System Features:

 File system capabilities affect database file management


 Virtual memory management impacts buffer pool design
 Process/thread scheduling affects query execution
 I/O scheduling influences disk access patterns

These underlying computer system aspects drive key database design decisions, including
buffer management strategies, query optimization techniques, indexing structures, and
parallelization approaches.

27. Persistent Messaging Protocols

See Question 20 which I've already answered.

28. Transaction Server and Data Server

i. Two Broad Categories:

 Transaction Server: Handles transaction processing, client requests, and business


logic. It manages the execution of transactions and coordinates with data servers.
 Data Server: Manages data storage, retrieval, and physical data organization. It
focuses on efficient data access and storage management.

ii. Five Processes Involved in Transaction Server:

1. Connection Management: Handles client connections, authentication, and session


maintenance.
2. Transaction Coordination: Manages transaction boundaries, ensures atomicity, and
coordinates distributed transactions.
3. Query Processing: Parses queries, optimizes execution plans, and coordinates query
execution across data servers.
4. Concurrency Control: Implements locking, timestamping, or versioning to manage
concurrent access to data.
5. Recovery Management: Handles logging, checkpointing, and recovery procedures to
ensure durability and consistency after failures.

iii. Five Interesting Issues in Data Server:

1. Storage Layout Optimization: Balancing data access patterns with storage


characteristics to optimize I/O performance.
2. Buffer Management Policies: Determining which data pages to keep in memory and
which to evict based on access patterns.
3. Index Selection and Maintenance: Choosing appropriate indexes and maintaining
them efficiently as data changes.
4. Data Compression Strategies: Balancing storage savings with decompression
overhead during query processing.
5. Physical Data Organization: Managing data clustering, partitioning, and file
organization to improve access efficiency.

Question 30: Describe how practical usage of the basic concepts of object-
orientation in databases can be realized from the abstractive knowledge of the
subject by expressing them in some languages in two ways.

Two primary ways to express object-orientation concepts in databases are:

1. Through Object-Oriented Database Languages:


o ODMG ODL/OQL: Object Definition Language and Object Query Language provide a
standardized way to define and query object-oriented databases.
o Example in ODL:
2. class Person {
3. attribute string name;
4. attribute Date birthDate;
5. relationship Set<Person> children inverse Person::parents;
6. relationship Set<Person> parents inverse Person::children;
7. };
o Example in OQL:
8. select p.name
9. from Persons p
10. where p.birthDate > "1990-01-01"
11. Through Object-Relational Mapping in Traditional Languages:
o SQL Extensions: Using object-relational features in SQL.
o Example in SQL with object extensions:
12. CREATE TYPE PersonType AS OBJECT (
13. id INTEGER,
14. name VARCHAR(50),
15. birthDate DATE,
16. MEMBER FUNCTION getAge RETURN NUMBER
17. );
18.
19. CREATE TABLE Persons OF PersonType;
o Programming Language Integration: Using ORM frameworks in languages like Java,
Python, etc.
o Example in Java with JPA:
20. @Entity
21. public class Person {
22. @Id @GeneratedValue
23. private Long id;
24. private String name;
25. private Date birthDate;
26. @OneToMany(mappedBy = "parent")
27. private List<Person> children;
28. @ManyToOne
29. private Person parent;
30. }

Question 31: Briefly describe the following in object-oriented database


paradigm:

i. Substitutability: The principle that a derived class object can be used anywhere a base
class object is expected. This enables polymorphic behavior and is fundamental to
implementing inheritance hierarchies effectively in databases.

ii. Factory Object: A special object responsible for creating instances of other objects.
Factory objects encapsulate object creation logic, allowing the system to decide which
concrete class to instantiate based on various factors at runtime. This pattern is useful for
implementing complex object creation processes in databases.

iii. Polymorphism: The ability of objects of different classes to respond to the same message
or method invocation differently. In the context of OODBs, polymorphism allows operations
to behave differently depending on the actual type of the object, even when accessed through
a common interface or base class.

Question 33:

a. (i) Caption for Figure 1.0: "Class Hierarchy Diagram for Person, showing Employee and
Customer subclasses with their respective specializations"

(ii) Pseudocode for the classes:

// Employee class
class Employee extends Person {
Float salary;
Date hireDate;
String department;

Float getSalary();
void setSalary(Float newSalary);
String getDepartment();
void setDepartment(String newDepartment);
}

// Customer class
class Customer extends Person {
String customerID;
Float creditLimit;

String getCustomerID();
Float getCreditLimit();
void setCreditLimit(Float newLimit);
}

// Temporary class (subclass of Employee)


class Temporary extends Employee {
Date contractEndDate;

Date getContractEndDate();
void setContractEndDate(Date newEndDate);
Boolean isContractActive();
}

// Secretary class (subclass of Employee)


class Secretary extends Employee {
String officeLocation;
Employee[] supervisorsAssisted;

String getOfficeLocation();
void setOfficeLocation(String newLocation);
void addSupervisor(Employee supervisor);
Employee[] getSupervisorsAssisted();
}

b. Two main differences between the semi-structured model and the object model:

1. Schema Flexibility: Semi-structured models (like XML) have flexible schemas where
structure can vary and be partially defined or missing, while object models typically have
rigid schemas with well-defined classes and relationships.
2. Identity Concept: Object models rely heavily on object identity (OIDs), while semi-structured
models often use path expressions or value-based identification.

c. XML Schema and XPath:

 XML Schema: A language for defining the structure, content, and semantics of XML
documents. It provides facilities for defining elements, attributes, data types, and
constraints. XML Schema is more powerful than DTDs, offering features like inheritance,
user-defined types, and namespaces.
 XPath: A query language used to navigate and select nodes in an XML document. It uses
path expressions to identify and process nodes based on various criteria including element
names, attribute values, and hierarchical relationships. XPath is fundamental to other XML
technologies like XSLT and XQuery.

Question 34: Give two main differences between the structured and semi-
structured model.
1. Schema Definition: Structured models (like relational databases) require a predefined
schema that strictly defines the structure before data can be stored, while semi-
structured models allow data to be stored without a predefined schema or with a
flexible schema that can vary across instances.
2. Data Homogeneity: In structured models, all records of the same type follow
identical structure, while in semi-structured models, similar entities may have
different attributes or structures, allowing for heterogeneity in data representation.

Additional differences include:

 Structured models typically have fixed data types while semi-structured models have more
flexible typing
 Structured models use rigid relationships while semi-structured models use more flexible,
often hierarchical relationships
 Query languages for structured models are typically more standardized (SQL) than for semi-
structured models

Distributed Database Concepts

Question 12: Two important issues in studying parallelism are speedup and
scaleup mechanism. Differentiate the two. A number of factors work against
efficient parallel operation and can diminish both speedup and scaleup.
Enumerate these factors.

Speedup vs. Scaleup:

Speedup:

 Refers to how much faster a task executes on a larger system compared to a smaller one
 Measured as: time on smaller system / time on larger system for the same workload
 Focuses on reducing response time for a fixed problem size
 Formula: Speedup(n) = T(1) / T(n), where T(1) is time on one processor and T(n) is time on n
processors

Scaleup:

 Refers to the ability to maintain performance as both system size and workload increase
proportionally
 Measured as: throughput on larger system / through

You might also like