0% found this document useful (0 votes)
31 views20 pages

Understanding Distributed Transactions

The document discusses distributed transactions in distributed database systems, highlighting their definition, characteristics, and the challenges they face, such as maintaining data consistency and fault tolerance. It explains the role of commit protocols and transaction coordinators in ensuring that distributed transactions are executed atomically and consistently across multiple sites. Additionally, it covers concurrency control techniques to manage simultaneous transaction executions and prevent issues like lost updates and dirty reads.

Uploaded by

bbawa631
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views20 pages

Understanding Distributed Transactions

The document discusses distributed transactions in distributed database systems, highlighting their definition, characteristics, and the challenges they face, such as maintaining data consistency and fault tolerance. It explains the role of commit protocols and transaction coordinators in ensuring that distributed transactions are executed atomically and consistently across multiple sites. Additionally, it covers concurrency control techniques to manage simultaneous transaction executions and prevent issues like lost updates and dirty reads.

Uploaded by

bbawa631
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

DISTRIBUTED TRANSACTIONS IN DISTRIBUTED DATABASE SYSTEM

Introduction

In earlier computer systems, data was stored and processed at a single central computer. As
technology advanced and organizations grew larger, the need for storing large amounts of
data and providing fast access to users at different locations increased. To meet these
requirements, databases were distributed across multiple computers connected by a network.
Such systems are called Distributed Database Systems.

In a distributed database system, data is not stored in one location but is spread across several
sites. Each site has its own local database system that manages data independently while
cooperating with other sites. Users can access data from any site as if it were stored locally,
even though it may actually be located in another place.

A transaction is a logical unit of work that performs one or more operations such as
insertion, deletion, or update on data. When a transaction involves data from only one
database, it is called a local transaction. However, when a transaction needs to access or
modify data stored at multiple sites, it becomes a distributed transaction.

A distributed transaction may involve reading data from one site, updating data at another
site, and confirming the result at a third site. This makes management of transactions more
difficult because each participating site must coordinate with others to ensure that all
operations are successfully completed.

The biggest challenge in a distributed transaction system is maintaining data consistency. It


must be ensured that either all databases involved reflect the changes made by the transaction
or none of them do. Partial execution may lead to data inconsistency, which can cause serious
problems such as incorrect balances, lost data, or system failure.

Another important issue is fault tolerance. Failures can occur due to:

 System crash
 Network failure
 Disk errors
 Power failure
 Coordinator failure

When a failure happens at any site, the transaction must either restart or rollback completely
from all sites. This requires strong failure-handling mechanisms.

Distributed transactions also raise issues related to concurrency control because multiple
users may access the same data at the same time from different locations. Without proper
control, this can lead to incorrect results. Therefore, distributed databases use locking
techniques and timestamp ordering to avoid conflicts.

To solve these challenges, distributed transaction systems use:

 Commit protocols
 Logging techniques
 Recovery algorithms
 Transaction coordinators
 Failure detection mechanisms

The aim of distributed transactions is to make a group of separate computers behave like a
single unified system. From the user’s point of view, the transaction should appear as if it
were executed on a single system, even though it is actually running on multiple servers
located at different places.

Thus, distributed transactions allow organizations to:

 Process large amounts of data efficiently


 Share resources across locations
 Improve availability and reliability
 Support global business operations
 Achieve better performance and scalability

Definition of Distributed Transaction


A Distributed Transaction is a type of transaction that performs operations on data stored at
more than one geographic location in a distributed database system. In such a transaction,
databases located on different computers, servers, or networks take part in a single logical
unit of work.

Formal Definition:

A distributed transaction is a transaction that accesses and updates data on more than one site
in a distributed database system and guarantees ACID properties across all participating
locations.

In simple terms, when one transaction performs insert, update, or delete operations on
multiple databases situated at different sites, it is called a distributed transaction.

For example, in a banking system, when money is transferred from one account to another,
one database may deduct the amount while another database credits it. Since more than one
database is involved, the transaction becomes distributed.

The most important feature of a distributed transaction is atomic execution. This means the
transaction must execute completely at all sites or not execute at all. Partial execution is not
allowed. If one site fails while performing an operation, all other completed operations at
other sites must be undone through a rollback process.

Distributed transactions require special software called a Transaction Coordinator, which


controls the execution process and ensures that all sites agree on the final result. The
coordinator sends instructions, collects responses, and makes the final commit or rollback
decision.
Another important objective of distributed transactions is maintaining consistency. All
databases must reflect the same result. If some databases commit while others fail, it may
lead to data corruption and system failure. Therefore, strict coordination is necessary.

Distributed transactions also ensure isolation, so that multiple transactions running at the
same time do not interfere with each other. Locks or timestamps are used to control
concurrent execution.

Finally, durability ensures that once a distributed transaction is successfully completed, the
changes remain saved permanently, even if a system crash occurs.

Characteristics of Distributed Transactions

Distributed transactions have the following features:

1. Multiple databases are involved


2. Transaction is executed at different locations
3. Coordinator controls the transaction
4. Communication between sites is required
5. Atomicity must be maintained
6. Recovery mechanism is required
7. All sites must agree before commit
8. Locking is used to avoid conflicts
9. Failure at one site affects all
10. Decision must be common at all nodes

Types of Transactions
In a distributed database environment, transactions are classified based on the number of
database sites they access. Mainly, there are two types of transactions:

(a) Local Transaction


A Local Transaction is a transaction that accesses and updates data stored at only one site in
a distributed database system. All operations of the transaction such as insert, update, delete,
and read are performed within a single database.

In this type of transaction, no other database participates, and there is no communication with
other systems over the network. The entire transaction is completed locally using the database
management system of that particular site.

Characteristics of Local Transactions:

 Executes at only one database site


 No coordination with other sites required
 Faster execution
 Less communication cost
 Easier recovery
 Lower complexity
Example of Local Transaction:

Updating the salary of an employee stored in a local database system is a local transaction
because it does not involve any other database.

(b) Distributed Transaction


A Distributed Transaction is a transaction that accesses and updates data located on more
than one site in a distributed database system. It involves coordination among multiple
database servers to ensure correctness and consistency.
In a distributed transaction, each participating site executes part of the transaction and
cooperates with other sites through a transaction coordinator. All sites must successfully
complete their part for the transaction to commit successfully.
If any site fails while executing its operation, all other sites must rollback their changes to
maintain consistency.
Characteristics of Distributed Transactions:

 Involves multiple sites


 Requires data communication across network
 Coordinator controls execution
 Commit decision must be common
 More complex than local transaction
 Needs commit protocols
 High reliability mechanisms required

Example of Distributed Transaction:

In an online banking system, when money is transferred from one account to another across
different banks, deducting money from one account and crediting it to another happens at two
different sites. This is a distributed transaction.

Commit Protocols in Distributed Transactions


Meaning of Commit Protocol

In a distributed database system, a single transaction may involve several databases located at
different sites. Each site performs a part of the transaction independently. The major
challenge is to ensure that all participating databases reach the same final decision about the
transaction.

The final decision can only be one of two options:

 Commit the transaction


 Rollback the transaction

A Commit Protocol is a technique or procedure used to coordinate all the participating sites
and make sure that every site executes the same final action at the end of the transaction.
Definition of Commit Protocol

A commit protocol is a structured set of rules that coordinates multiple database sites to
determine whether a distributed transaction should be committed or rolled back, while
ensuring atomicity and consistency.

Purpose of Commit Protocol

The main goals of a commit protocol are:

1. To guarantee that all sites come to the same decision


2. To ensure that partial execution never occurs
3. To maintain data consistency
4. To preserve atomicity
5. To detect failures and take recovery actions
6. To handle network breakdowns
7. To prevent data corruption
8. To ensure reliable transaction processing

Why Commit Protocol is Required

In distributed systems, failures can occur at many levels:

 Node failure
 Message loss
 Network failure
 Crash of coordinator
 Disk failure

Without a commit protocol, one site may commit while another may rollback. This would
result in a database inconsistency problem which can seriously damage the integrity of the
system.

Commit protocols prevent such problems and ensure safe and reliable transaction execution.

Key Requirements of Commit Protocols

A good commit protocol must satisfy the following:

 All sites must participate equally


 Final decision must be unanimous
 Communication must be reliable
 Logs must be properly maintained
 Reasonable performance
 Fault tolerance must exist
 Must support recovery
Role of Transaction Coordinator

A commit protocol uses a special component known as the Transaction Coordinator.

The coordinator:

 Sends messages to all sites


 Collects votes
 Makes final decision
 Broadcasts outcome
 Handles failure situations

Commit protocols are essential in a distributed database system. Without them, it is


impossible to ensure that databases located at multiple sites maintain consistency.
Commit protocols make distributed systems reliable and trustworthy by enforcing a single
final decision on all sites and ensuring correct transaction execution even when failures
occur.

Types of Commit Protocols


Mainly two commit protocols are used:

1. Two Phase Commit Protocol (2PC)


2. Three Phase Commit Protocol (3PC)

6.1 Two Phase Commit Protocol (2PC)


The Two Phase Commit Protocol is the most widely used commit protocol in distributed
databases.

There are two main roles:


 Coordinator – controls the transaction
 Participants – database sites that execute the transaction
Working of Two Phase Commit Protocol
It consists of two stages:
Phase 1: Prepare Phase (Voting Phase)
In this phase, the coordinator sends a message to all participants asking:
"Are you ready to commit?"
Each site:

 Executes its operations

 Stores updates in log

 Checks for errors

 Sends response:

o YES → ready to commit


o NO → cannot commit
Phase 2: Commit Phase (Decision Phase)
If all sites reply YES:
 Coordinator sends COMMIT command
 All sites commit changes
If any site replies NO:
 Coordinator sends ROLLBACK command
 All sites undo changes
2PC Diagram
COORDINATOR
|
-------------------------
| | |
Site A Site B Site C

Phase 1 → Vote (YES / NO)


Phase 2 → Commit / Rollback
Advantages of Two Phase Commit
 Ensures data consistency
 Guarantees atomicity
 Simple logic
 Widely implemented
Disadvantages of Two Phase Commit
 Coordinator failure blocks system
 Slow process
 Network overhead
 Participants remain locked
 Blocking protocol
6.2 Three Phase Commit Protocol (3PC)
To overcome the blocking problem of 2PC, Three Phase Commit was introduced.
Working of 3PC
It has three steps:
Phase 1: Can Commit Phase
Coordinator asks:

"Can you commit?"

Participants reply YES or NO.

Phase 2: Pre-Commit Phase


Coordinator tells all sites to get ready to commit.
Participants:
 Save data in stable storage
 Prepare for final commit
Phase 3: Do Commit Phase
Coordinator sends final commit instruction.
All participants commit the transaction.

Advantages of Three Phase Commit


 No blocking problem
 Better failure handling
 Coordinator failure does not stop system
Disadvantages of Three Phase Commit
 More messages
 More time
 Complex design
 High communication cost

CONCURRENCY CONTROL IN DDBMS

Meaning of Concurrency Control


Concurrency control in a Distributed Database Management System (DDBMS) is a technique
used to control the simultaneous execution of transactions so that the database remains
correct and consistent.

In distributed systems, many users may access the same data from different locations at the
same time. If concurrency is not properly managed, this may lead to incorrect results, data
loss, or system failure.

Concurrency control ensures that even though transactions run in parallel, the final result is
the same as if they were executed one by one.

Definition
Concurrency Control is the process of managing multiple transactions in a distributed
database system in such a way that data remains accurate, consistent, and reliable.

Need for Concurrency Control in DDBMS


Concurrency control is required because:

 Many users may update the same data at the same time
 Transactions are executed at different sites
 Network delays may cause improper execution
 System failures may occur
 Resources are shared
 Data must not become inconsistent
 Isolation property must be maintained
 Database integrity must be protected
Problems Without Concurrency Control (With Real-World Examples)

If concurrency control is not applied in a Distributed Database Management System


(DDBMS), multiple transactions may interfere with each other and produce incorrect results.
This can lead to data corruption, wrong reports, and system failure.

The major problems that occur without concurrency control are discussed below:

(a) Lost Update Problem


The lost update problem occurs when two transactions update the same data item at the same
time, and one update is overwritten by the other. As a result, one transaction’s result is lost.
Example (Bank System):
Suppose account balance = ₹5000
Two users withdraw money at the same time:
Transaction T1: Withdraw ₹1000
Transaction T2: Withdraw ₹2000
Both read initial balance = ₹5000.
T1 calculates: 5000 – 1000 = 4000
T2 calculates: 5000 – 2000 = 3000
Now:
 T1 writes ₹4000
 After that, T2 writes ₹3000
Final result becomes ₹3000.
Correct balance should be:
5000 – 1000 – 2000 = ₹2000
Here, T1 update is lost. This is called Lost Update Problem.

(b) Dirty Read Problem


Dirty read occurs when one transaction reads data written by another transaction that has not
yet been committed.
Example (Online Shopping System):
Transaction T1: Updates product price from ₹500 to ₹800 (not yet committed).
Transaction T2: Reads updated price as ₹800 and shows it to customer.
Later, T1 fails and rolls back. Price returns to ₹500.
But customer already saw ₹800.
This incorrect reading is called Dirty Read because uncommitted data was read.

(c) Inconsistent Retrieval Problem


Inconsistent retrieval happens when a transaction reads some values before another
transaction updates them and some values after the update, leading to incorrect results.
Example (Bank Report System):
Suppose:
Account A = ₹3000
Account B = ₹5000
Transaction T1: Transfer ₹1000 from A to B.
Transaction T2: Calculates total balance.
T2 reads A = ₹3000
Then T1 updates A = ₹2000 and B = ₹6000
Now T2 reads B = ₹6000.
Total seen by T2 = 3000 + 6000 = ₹9000
Actual balance = 2000 + 6000 = ₹8000
Report is incorrect due to inconsistent data.

(d) Phantom Read Problem


Phantom read occurs when a transaction gets different results for the same query executed
twice because another transaction inserts or deletes rows.
Example (University Database):
Transaction T1:
SELECT all students where marks > 70.
Result: 5 students.
Transaction T2 inserts another student with marks = 75.
T1 runs query again and gets 6 students.
The new row appears like a “ghost” result → Phantom Read.

Techniques of Concurrency Control in DDBMS

Concurrency control in a Distributed Database Management System (DDBMS) is required to


ensure that multiple transactions do not interfere with one another while accessing shared
data. Since data is stored at different sites, control is necessary to avoid inconsistency.

Distributed Locking Protocol


In this technique, before a transaction can access a data item, it must obtain a lock on that
item. Other transactions cannot change that data until the lock is released.

Types of Locks
(a) Shared Lock (Read Lock)
Allows only reading of data. Multiple transactions can hold this lock at the same time.

(b) Exclusive Lock (Write Lock)


Allows modification. Only one transaction is allowed.

Working

1. The transaction requests a lock.


2. If lock is available, it is granted.
3. If lock is busy, the transaction waits.
4. After completion, the lock is released.

Real-World Example (Bank System)


Customer A withdraws money.
Customer B deposits money at the same time.
The system locks the account.
Customer B must wait until A finishes.
Correct balance is maintained.

Advantages
 Simple technique

 Avoids inconsistency

 Provides safety

Disadvantages
 Deadlocks

 Delays

 Performance decrease

Timestamp Ordering Protocol


Each transaction is given a timestamp when it starts execution. Transactions are processed in
timestamp order to maintain consistency.
Rules
 Older transaction is executed first.
 If newer transaction violates rule, it is restarted.
Real-World Example (Ticket Booking System)
User A books ticket at 10:00 AM.
User B books ticket at 10:01 AM.
A is processed first.
If B conflicts, it is restarted.
Advantages
 No deadlock
 High parallel processing
Disadvantages
 Starvation
 Extra restarts

Optimistic Concurrency Control


This method assumes conflicts are rare. Transactions proceed without locking and are
checked only at commit time.
Phases
1. Read Phase – Data read into memory.
2. Validation Phase – Conflict checked.
3. Write Phase – Data saved if valid.
Real-World Example (Online Examination System)
Students attempt questions freely.
Final submission is validated before saving.

If conflict occurs, submission is rejected.


Advantages

 No locks required
 High speed if conflicts are low
Disadvantages
 Rollbacks
 High validation cost
Quorum-Based Protocol
In this technique, a transaction can read or write data only after receiving permission from a
minimum number of database replicas.
Types
 Read quorum
 Write quorum
Real-World Example (Cloud Storage System)
File is updated only after approval from multiple servers.
Advantage
 Reduces network load
Disadvantage
 Communication delays

DISTRIBUTED QUERY PROCESSING

Meaning of Distributed Query Processing


Distributed Query Processing is the method used to process SQL queries when data is stored
across multiple sites in a distributed database system.

When a user submits a query, the system:

 Finds where the data is located


 Divides the query into parts
 Sends sub-queries to appropriate sites
 Collects results
 Combines them into a final answer

Definition
Distributed Query Processing is the process of analyzing, optimizing, and executing queries
in a distributed database system where data is stored at different network locations.

Objectives of Distributed Query Processing


The main objectives are:

 Minimize data transfer cost


 Reduce response time
 Improve performance
 Efficient use of network bandwidth
 Locate data quickly
 Execute queries in parallel
 Generate correct results

Phases of Distributed Query Processing


Distributed query processing is completed in three main phases:

4.1 Query Decomposition


The query is broken into smaller parts and converted into internal form.

Steps:

 Parsing the query


 Syntax checking
 Converting to relational algebra
 Removing redundant data

4.2 Query Optimization


Best execution plan is selected.

Optimization decides:

 Where to process query


 How much data to transfer
 Whether to send query or data
 Which site should execute which part

Types of Optimization:

(a) Static Optimization

Plan decided before execution.

(b) Dynamic Optimization

Plan decided during execution.

4.3 Query Execution


After optimization, query is executed.

Steps:
 Sub-queries sent to sites
 Local queries executed
 Results transferred
 Final result assembled

Query Processing Techniques in Distributed Database

In a Distributed Database Management System (DDBMS), data is stored at different sites


(branches, servers, locations).
When a user fires a query, the system must decide:

 Where to execute the query?


 Which site will process which part?
 Should we send data to query or query to data?

For this, different query processing techniques are used.

Data Localization
Meaning

Data Localization means converting a global query (written as if data is in one big database)
into local queries for each site where the data is actually stored.

The user writes:

SELECT * FROM STUDENT WHERE CITY = 'Amritsar';

User doesn’t care where data is stored.


The DDBMS internally:

 Finds which sites have STUDENT data


 Breaks query into smaller parts
 Sends each part to appropriate site
 Collects and merges the results

This conversion of one global query into multiple local queries is called data localization.

Steps in Data Localization

1. Identify data location


o Check which sites have required relations or fragments.
2. Rewrite global query
o Convert global tables into fragments (horizontal / vertical).
3. Generate local sub-queries
o One query for each participating site.
4. Execute locally at each site
o Each site processes its own sub-query.
5. Combine all partial results
o Final result sent back to user.

Example (University Database)

Assume STUDENT table is horizontally fragmented:

 Site A (Amritsar Campus): STUDENT where CITY = 'Amritsar'


 Site B (Jalandhar Campus): STUDENT where CITY = 'Jalandhar'
 Site C (Ludhiana Campus): STUDENT where CITY = 'Ludhiana'

Global Query:

SELECT NAME, CITY FROM STUDENT WHERE CITY = 'Amritsar' OR CITY =


'Jalandhar';

Data Localization:

 At Site A:
SELECT NAME, CITY FROM STUDENT WHERE CITY = 'Amritsar';
 At Site B:
SELECT NAME, CITY FROM STUDENT WHERE CITY = 'Jalandhar';
 At Site C:
No query needed (Ludhiana campus not required).

The results from Site A and Site B are combined and shown to the user as one result.

Advantages of Data Localization

 Reduces amount of unnecessary data transfer


 Uses local processing power
 Keeps global query independent of physical distribution
 Better performance

Disadvantages

 Requires knowledge of fragmentation and data distribution


 Query rewriting is complex

Centralized Processing
Meaning

In Centralized Processing, all required data is brought to one central site, and the entire
query is processed at that single site.

Here, we send data to query, not query to data.


How It Works

1. User sends query to central server (Coordinator site).


2. Central site identifies which sites contain required data.
3. Data from those sites is transferred to central site.
4. Central site executes full query.
5. Final result returned to user.

Example (Bank Head Office System)

Assume:

 Branch A, B, C each store local ACCOUNT data.


 Head office (Central site) wants:

SELECT BRANCH_NAME, SUM(BALANCE)


FROM ACCOUNT
GROUP BY BRANCH_NAME;

Centralized Approach:

 Branch A sends its ACCOUNT data to Head Office


 Branch B sends its ACCOUNT data to Head Office
 Branch C sends its ACCOUNT data to Head Office
 Head Office executes the GROUP BY and SUM
 Head Office sends final summary report to manager

All processing of query is done at Head Office only.

Advantages of Centralized Processing

 Implementation is simple
 Head Office controls everything
 No need for complex distributed algorithms
 Easier to manage security and access control

Disadvantages

 High network cost (large data movement)


 Central site becomes a bottleneck
 If central site fails → whole system stops
 Poor scalability for big data

Distributed Processing
Meaning

In Distributed Processing, each site executes its own part of the query, and only the
necessary intermediate results are sent over the network.
Here we send query to data, not data to query.

Processing is shared among multiple sites → parallel execution.

How It Works

1. Global query is decomposed into several sub-queries.


2. Sub-queries are sent to different sites where required data is stored.
3. Each site executes its part locally (using its own DBMS).
4. Partial results are sent back to a coordinator site.
5. Coordinator combines these partial results to produce the final output.

Example (Online Shopping System)

Suppose tables:

 CUSTOMER at Site A
 ORDERS at Site B
 PAYMENT at Site C

Query:

SELECT [Link], O.ORDER_ID, [Link]


FROM CUSTOMER C, ORDERS O, PAYMENT P
WHERE [Link] = [Link] AND [Link] = [Link]
AND [Link] = 'Dinanagar';

Distributed Processing:

 At Site A (CUSTOMER):
Filter customers from Dinanagar
 At Site B (ORDERS):
Get orders related to those customers
 At Site C (PAYMENT):
Get payment details for those orders

Then partial results are joined and final output is prepared at coordinator site.

Advantages of Distributed Processing

 Uses all sites’ CPU power (parallelism)


 Reduces amount of data transferred
 Faster response for large queries
 Scalable and efficient

Disadvantages

 More complex query planning and optimization


 Requires good coordination between sites
 Network failure can disturb execution
6. Query Decomposition Steps in Distributed Query Processing

Query Decomposition is the first and most important stage of distributed query processing. In
this stage, the user’s SQL query is converted into an internal form so that it can be executed
efficiently over distributed sites.

The main aim is to:

 Check correctness
 Reduce complexity
 Improve performance
 Generate an optimized query

6.1 Normalization
Meaning
Normalization in query decomposition means converting the SQL query into a standard
internal form, usually in the form of relational algebra or a query tree.
This allows the system to understand the query clearly and apply optimization techniques
easily.
Example
SQL Query:
SELECT * FROM STUDENT WHERE (MARKS > 60 AND CITY = 'Delhi') OR (CITY =
'Delhi' AND MARKS > 60);
Normalized Form:
SELECT * FROM STUDENT WHERE MARKS > 60 AND CITY = 'Delhi';
Redundant condition is removed and query is written in clear form.
Importance of Normalization
 Removes duplicate conditions
 Converts query into simple format
 Makes query easy to optimize
 Avoids repeated work
6.2 Analysis
Meaning
In this step, the syntax and logic of the query is checked.
The system ensures:
 Table names are valid

 Column names are correct

 Data types match

 Conditions make sense logically

 No ambiguity is present
Example
Query:
SELECT NAME FROM STUDENT WHERE AGE = 'abc';
This query is incorrect because AGE is numeric.
During analysis, this error is detected and query is rejected.
Importance of Analysis
 Finds errors early

 Prevents crash

 Ensures only valid queries are executed

 Avoids incorrect results


6.3 Simplification
Meaning :In this stage, the query is simplified by:

 Removing unnecessary conditions

 Eliminating impossible conditions

 Rewriting redundant operations


Example

Query:
SELECT * FROM EMPLOYEE
WHERE SALARY > 5000 AND SALARY > 3000;

Simplified Query:
SELECT * FROM EMPLOYEE
WHERE SALARY > 5000;

Second condition is useless and removed.


Another Example
SELECT * FROM STUDENT WHERE ROLLNO = 10 AND ROLLNO = 11;
This condition is impossible. The query is simplified to return no result.
Importance of Simplification
 Reduces work
 Improves performance
 Avoids waste of resources
 Makes query execution faster

6.4 Restructuring
Meaning
In restructuring, the query is rewritten into a form that is more efficient to execute.
This includes:
 Reordering joins
 Moving conditions closer to data
 Selecting smaller result sets
 Choosing better execution path
Example
Original Query:
SELECT *
FROM EMPLOYEE E, DEPARTMENT D
WHERE [Link] = [Link]
AND [Link] = 'Sales';
Restructured Query:
SELECT *
FROM EMPLOYEE E,
(SELECT * FROM DEPARTMENT WHERE NAME = 'Sales') D
WHERE [Link] = [Link];
Filtering DEPARTMENT first reduces result size and speeds up join.

You might also like