Distributed Database: Module#3
CONCURRENCY CONTROL
Concurrency control in centralized database systems:
Concurrency control in a centralized database system refers to the mechanisms and techniques used
to ensure the correct execution of simultaneous transactions while maintaining data integrity and
consistency. It is essential to handle conflicts and ensure that transactions do not interfere with one
another in a way that causes data anomalies.
Why is Concurrency Control Important?
In a centralized database system, multiple users or applications often access the database
concurrently. Without proper control, concurrency can lead to the following issues:
1. Lost Updates: When two transactions overwrite each other’s changes, causing the loss of
one update.
o Example: Two bank clerks updating the same account balance simultaneously.
2. Dirty Reads: When one transaction reads uncommitted data from another transaction that
later fails or rolls back.
o Example: A report shows incorrect data because a transaction has not yet
completed.
3. Unrepeatable Reads: When a transaction reads the same data twice and gets different
results due to updates by other transactions.
o Example: Checking inventory levels twice during an order placement process and
finding inconsistent results.
4. Phantom Reads: When a transaction reads a set of rows but finds new rows added by
another transaction during subsequent reads.
o Example: Counting rows that match a condition and seeing a different count due to
concurrent inserts.
Concurrency Control Mechanisms:
Several techniques are used to ensure safe and efficient transaction execution:
1. Lock-Based Protocols
Locks restrict access to data items to prevent conflicts:
• Shared Lock (Read Lock): Allows multiple transactions to read the same data simultaneously
but prevents write access.
• Exclusive Lock (Write Lock): Allows a transaction to read and write the data while preventing
others from accessing it.
Types of Locking Protocols:
• Two-Phase Locking (2PL):
o Divides the transaction into two phases:
1. Growing Phase: Locks are acquired but not released.
2. Shrinking Phase: Locks are released but no new locks are acquired.
o Ensures serializability but may lead to deadlocks.
• Strict Two-Phase Locking:
o All locks are held until the transaction commits or aborts, preventing cascading
rollbacks.
• Deadlock Handling:
o Deadlock Prevention: Avoid situations that could lead to deadlocks (e.g., using
timeouts or ordering locks).
o Deadlock Detection: Detect and resolve deadlocks by aborting one or more
transactions.
2. Timestamp-Based Protocols
• Assigns a unique timestamp to each transaction.
• Ensures serializability by allowing transactions to proceed in timestamp order.
• Rules:
o A transaction can read/write a data item only if it doesn’t conflict with timestamps of
earlier transactions.
o Older transactions have priority.
3. Optimistic Concurrency Control
• Assumes conflicts are rare and does not use locks.
• Transactions pass through three phases:
1. Read Phase: Transactions read data and perform computations locally.
2. Validation Phase: Before committing, the system checks for conflicts with other
transactions.
3. Write Phase: If validation succeeds, changes are committed; otherwise, the
transaction rolls back.
Advantages:
• Efficient when conflicts are minimal.
• Avoids overhead of locking.
Disadvantages:
• Rollbacks can occur frequently if conflicts are common.
4. Multiversion Concurrency Control (MVCC)
• Maintains multiple versions of a data item.
• Transactions are assigned timestamps, and each transaction reads the most recent version
valid at its timestamp.
• Advantages:
o Prevents read-write conflicts.
o Supports high concurrency.
• Disadvantages:
o Higher storage and processing overhead due to maintaining multiple versions.
5. Serializability: Ensures the result of executing concurrent transactions is equivalent to a serial
(one-at-a-time) execution.
• Types of serializability:
o Conflict Serializability: Ensures transactions do not conflict in their operations
(read/write).
o View Serializability: Ensures the output of concurrent transactions matches the
output of some serial execution.
Challenges in Concurrency Control
1. Deadlocks: Circular waiting on resources causes a system halt.
2. Starvation: Some transactions are repeatedly denied access due to higher-priority
transactions.
3. Performance Overhead: Locking and validation mechanisms can reduce system throughput.
4. Complexity: Implementing and maintaining efficient concurrency control mechanisms is
challenging.
Benefits of Concurrency Control
1. Data Integrity: Prevents data anomalies during concurrent transactions.
2. System Efficiency: Optimizes resource utilization by allowing concurrent access.
3. User Experience: Reduces wait times and improves responsiveness for users accessing the
system simultaneously.
Concurrency control in DDBSs:
Concurrency control in Distributed Database Systems (DDBSs) is the process of managing
simultaneous transactions across multiple database nodes in a way that ensures consistency,
correctness, and reliability of data. This is especially important in distributed systems because data is
stored across multiple locations, and transactions often involve multiple nodes. Without proper
control, distributed databases may face data inconsistencies and conflicts.
Key Objectives of Concurrency Control in DDBSs:
1. Consistency: Ensure that the database remains in a consistent state after concurrent
transactions, just as it would in a centralized system.
2. Isolation: Prevent interference between transactions running simultaneously across different
nodes.
3. Global Serializability: Ensure the result of executing concurrent transactions is equivalent to
their execution in some serial (one-at-a-time) order across the entire distributed system.
4. Fault Tolerance: Ensure that even in case of node failures, concurrency control mechanisms
maintain consistency.
Challenges in Distributed Concurrency Control:
1. Data Distribution: Data is distributed across multiple nodes, making it difficult to coordinate
transactions.
2. Network Delays: Communication delays between nodes can cause conflicts or performance
issues.
3. Replication: Maintaining consistency among replicated data adds complexity.
4. Failures: Node failures or network issues can interrupt ongoing transactions, requiring
recovery mechanisms.
5. Scalability: As the number of nodes and transactions increases, maintaining concurrency
becomes more challenging.
Ensuring Global Serializability:
• Local Serializability: Ensure that each individual node enforces serializability for its local
transactions.
• Global Coordination: Use global concurrency control protocols (e.g., 2PL, timestamp
ordering) to synchronize transactions across nodes.
• Replication Consistency: Ensure that updates to replicated data are propagated in a
consistent manner.
Distributed concurrency control algorithms:
Distributed concurrency control algorithms are methods used to manage and coordinate concurrent
transactions in a distributed database system (DDBS). These algorithms ensure that transactions are
executed in a consistent, reliable, and serializable manner across multiple nodes, even when
transactions access and modify data located in different parts of the distributed system.
Key Goals of Distributed Concurrency Control:
1. Global Serializability: Ensure that the result of executing concurrent transactions is
equivalent to their serial execution across the distributed system.
2. Consistency: Maintain database consistency by avoiding conflicts between transactions.
3. Non-Blocking Execution: Allow concurrent transactions to proceed without unnecessary
delays.
4. Fault Tolerance: Handle system failures gracefully while ensuring transaction integrity.
Types of Distributed Concurrency Control Algorithms:
1. Lock-Based Algorithms: Lock-based algorithms ensure that only one transaction can access a data
item in a conflicting manner at a time.
a.) Distributed Two-Phase Locking (Distributed 2PL)
• Overview:
o Extends the two-phase locking protocol to a distributed environment.
o Each node enforces the two-phase locking protocol locally (growing and shrinking
phases).
• How It Works:
o A transaction must obtain locks on data from multiple nodes.
o Coordination between nodes ensures global serializability.
• Challenges:
o Deadlocks: Cycles in the wait-for graph across nodes can cause deadlocks.
o Communication Overhead: Requires extensive communication between nodes to
acquire and release locks.
• Deadlock Resolution:
o Timeout: Transactions are aborted if they exceed a predefined time.
o Wait-For Graph: Global graphs are analyzed to detect and resolve deadlocks.
2. Timestamp-Based Algorithm: These algorithms assign a unique timestamp to each transaction
and order transactions based on their timestamps.
a.) Basic Timestamp Ordering
• Overview:
o Transactions execute in the order of their timestamps.
o Each data item has two timestamps:
▪ Read Timestamp (RTS): The most recent time the item was read.
▪ Write Timestamp (WTS): The most recent time the item was written.
• Rules:
o A transaction can read/write a data item only if it does not violate timestamp order.
• Advantages:
o No locks are needed, avoiding deadlocks.
• Disadvantages:
o Transactions may be aborted frequently if they conflict with others.
b.) Distributed Timestamp Ordering
• Overview:
o Extends timestamp ordering to distributed systems.
o Nodes synchronize their timestamps to maintain global order.
• Challenges:
o Synchronizing clocks across nodes is difficult and can cause delays.
3. Optimistic Concurrency Control (OCC): Optimistic algorithms assume that conflicts are rare and
allow transactions to execute without restrictions during their initial phase.
• Phases:
1. Read Phase: Transactions read data and perform computations locally.
2. Validation Phase: Before committing, transactions are validated to ensure no
conflicts occurred with other transactions.
3. Write Phase: If validation is successful, changes are committed; otherwise, the
transaction is aborted.
• Advantages:
o High concurrency as no locks are used.
• Disadvantages:
o Expensive rollbacks if conflicts are detected during validation.
• Distributed OCC:
o Validation is performed globally to ensure that transactions do not conflict across
nodes.
4. Multiversion Concurrency Control (MVCC): MVCC algorithms allow multiple versions of a data
item to coexist, enabling concurrent transactions to read different versions of the same data.
• How It Works:
o Each transaction accesses the version of the data item valid at its start time.
o Writers create new versions of the data without affecting readers.
• Advantages:
o Readers and writers do not block each other.
o High concurrency is possible.
• Challenges:
o Requires additional storage for maintaining multiple versions.
o Synchronization of versions across nodes in a distributed system is complex.
5. Quorum-Based Algorithms: Quorum-based algorithms are used in replicated databases to ensure
consistency by requiring a majority of nodes to agree on transactions.
• Read and Write Quorums:
o A read quorum is the minimum number of nodes that must agree for a read
operation.
o A write quorum is the minimum number of nodes that must agree for a write
operation.
• Example:
o If there are 5 replicas:
▪ A write quorum might require 3 replicas to confirm a write.
▪ A read quorum might require 2 replicas to confirm a read.
• Advantages:
o Scalable and fault-tolerant.
• Disadvantages:
o Increased latency due to the need for agreement.
6. Hybrid Algorithms: Hybrid algorithms combine features of multiple concurrency control
techniques to balance performance and consistency.
• Example: Combining locking and timestamp ordering for better efficiency in distributed
systems.
Advantages:
1. Data Consistency: Ensure consistency across all nodes in the distributed database.
2. High Concurrency: Allow multiple transactions to execute simultaneously, improving
performance.
3. Fault Tolerance: Handle failures gracefully while preserving transaction integrity.
4. Scalability: Support large-scale distributed systems.
Challenges:
1. Communication Overhead: Coordination between nodes increases communication costs.
2. Deadlocks: Lock-based methods can lead to deadlocks in distributed environments.
3. Aborts and Rollbacks: Optimistic and timestamp-based methods may lead to frequent
transaction aborts.
4. Complexity: Algorithms are complex to implement and maintain in distributed
environments.
Deadlock management:
A deadlock occurs in a distributed database system when a set of transactions is unable to proceed
because each transaction is waiting for a resource that another transaction in the set is holding.
Deadlocks can halt the progress of transactions, so managing them effectively is crucial for
maintaining the system's efficiency and reliability.
Deadlock management is essential for ensuring the smooth operation of distributed database
systems. Each method—prevention, detection, or avoidance—has its strengths and weaknesses, and
the choice of technique depends on the specific requirements of the system, such as performance,
complexity, and resource availability. In practice, a combination of techniques may be used to
balance efficiency and reliability in distributed environments.
Causes of Deadlock:
1. Resource Contention: Two or more transactions compete for the same resource, such as a
lock on a data item.
2. Circular Wait: Transactions form a circular chain where each transaction is waiting for a
resource held by the next transaction in the chain.
3. Distributed Nature: In distributed systems, deadlocks are harder to detect and resolve due
to limited global knowledge of the system state.
Deadlock Management Techniques:
Deadlock management can be categorized into deadlock prevention, deadlock detection, and
deadlock avoidance. Here's an explanation of these techniques:
1. Deadlock Prevention: Deadlock prevention involves designing the system in such a way that
deadlocks cannot occur. This is achieved by ensuring that one of the necessary conditions for a
deadlock is never satisfied.
Techniques for Deadlock Prevention
• Resource Ordering:
o All resources are assigned a unique global order.
o Transactions must request resources in this predefined order, preventing circular
wait.
• Preemption:
o If a transaction cannot acquire a resource, the system preempts the resource from
another transaction or aborts one of the transactions.
• No Wait Policy:
o A transaction must request all required resources upfront.
o If all resources are not available, the transaction is aborted and retried later.
• Timeouts:
o A transaction is aborted if it waits for a resource longer than a predefined time.
2. Deadlock Detection: In this approach, the system allows deadlocks to occur but detects and
resolves them when they happen.
Techniques for Deadlock Detection
• Wait-For Graph (WFG):
o Nodes represent transactions, and edges represent "waiting for" relationships
between transactions.
o A cycle in the graph indicates a deadlock.
o In a distributed system, local wait-for graphs are created at each node, and these are
periodically combined to detect global deadlocks.
• Centralized Detection:
o A central coordinator collects wait-for graphs from all nodes to detect cycles.
• Distributed Detection:
o Nodes communicate with each other to share wait-for graphs and detect deadlocks
collaboratively.
Resolution:
• Abort one or more transactions in the deadlock cycle to break the circular wait.
• Select transactions to abort based on factors like priority, transaction age, or amount of work
completed.
3. Deadlock Avoidance: Deadlock avoidance involves making transaction execution decisions
dynamically to ensure that deadlocks do not occur.
Techniques for Deadlock Avoidance
• Wait-Die Scheme:
o A transaction can only wait for another transaction if it has a higher priority (e.g.,
based on a timestamp).
o If the transaction has lower priority, it is aborted ("dies") and retried later.
• Wound-Wait Scheme:
o A transaction with a higher priority can preempt ("wound") a lower-priority
transaction, forcing it to abort.
o If the higher-priority transaction must wait, it does so.
• Banker's Algorithm (adapted for DDBS):
o Checks if granting a resource request would keep the system in a "safe state" where
deadlocks are impossible.
Comparison of Techniques:
Technique Advantages Disadvantages
Deadlock May cause transaction aborts and
Simple and guarantees no deadlocks.
Prevention inefficiency.
Deadlock Efficient use of resources; resolves Complex in distributed systems; added
Detection only when needed. overhead for detection.
Deadlock Computationally expensive; requires
Avoids deadlocks proactively.
Avoidance detailed information.