0% found this document useful (0 votes)

245 views18 pages

Query Processing and Optimization Overview

The document provides an overview of query processing including three main steps: 1) parsing and translation, 2) optimization, and 3) evaluation. It discusses query optimization techniques like reordering operations and using semijoins to reduce costs. Distributed query processing adds challenges of fragmentation, replication, and communication costs between sites.

Uploaded by

Anurag Upadhyay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

245 views18 pages

Query Processing and Optimization Overview

Uploaded by

Anurag Upadhyay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter 5: Overview of Query Processing

• Query Processing Overview

• Query Optimization
• Distributed Query Processing Steps

Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of this course.

DDB 2008/09 J. Gamper Page 1

Query Processing Overview

• Query processing: A 3-step process that transforms a high-level query (of relational
calculus/SQL) into an equivalent and more efficient lower-level query (of relational
algebra).
1. Parsing and translation
– Check syntax and verify relations.
– Translate the query into an equivalent
relational algebra expression.
2. Optimization
– Generate an optimal evaluation plan
(with lowest cost) for the query plan.
3. Evaluation
– The query-execution engine takes an
(optimal) evaluation plan, executes that
plan, and returns the answers to the
query.

DDB 2008/09 J. Gamper Page 2

Query Processing . . .

• The success of RDBMSs is due, in part, to the availability

– of declarative query languages that allow to easily express complex queries without
knowing about the details of the physical data organization and
– of advanced query processing technology that transforms the high-level
user/application queries into efficient lower-level query execution strategies.

• The query transformation should achieve both correctness and efficiency

– The main difficulty is to achieve the efficiency
– This is also one of the most important tasks of any DBMS

• Distributed query processing: Transform a high-level query (of relational

calculus/SQL) on a distributed database (i.e., a set of global relations) into an
equivalent and efficient lower-level query (of relational algebra) on relation fragments.

• Distributed query processing is more complex

– Fragmentation/replication of relations
– Additional communication costs
– Parallel execution

DDB 2008/09 J. Gamper Page 3

Query Processing Example

• Example: Transformation of an SQL-query into an RA-query.

Relations: EMP(ENO, ENAME, TITLE), ASG(ENO,PNO,RESP,DUR)
Query: Find the names of employees who are managing a project?

– High level query

SELECT ENAME
FROM EMP,ASG
WHERE [Link] = [Link] AND DUR > 37

– Two possible transformations of the query are:

∗ Expression 1: ΠEN AM E (σDU R>37∧EM [Link] O=[Link] O (EM P × ASG))
∗ Expression 2: ΠEN AM E (EM P ⋊
⋉EN O (σDU R>37 (ASG)))
– Expression 2 avoids the expensive and large intermediate Cartesian product, and
therefore typically is better.

DDB 2008/09 J. Gamper Page 4

Query Processing Example . . .

• We make the following assumptions about the data fragmentation

– Data is (horizontally) fragmented:
∗ Site1: ASG1 = σEN O≤”E3” (ASG)
∗ Site2: ASG2 = σEN O>”E3” (ASG)
∗ Site3: EM P 1 = σEN O≤”E3” (EM P )
∗ Site4: EM P 2 = σEN O>”E3” (EM P )
∗ Site5: Result
– Relations ASG and EMP are fragmented in the same way
– Relations ASG and EMP are locally clustered on attributes RESP and ENO,
respectively

DDB 2008/09 J. Gamper Page 5

Query Processing Example . . .

• Now consider the expression ΠEN AM E (EM P ⋊

⋉EN O (σDU R>37 (ASG)))
• Strategy 1 (partially parallel execution):
– Produce ASG′1 and move to Site 3
– Produce ASG′2 and move to Site 4
– Join ASG′1 with EMP1 at Site 3 and
move the result to Site 5
– Join ASG′2 with EMP2 at Site 4 and
move the result to Site 5
– Union the result in Site 5

• Strategy 2:
– Move ASG1 and ASG2 to Site 5
– Move EMP1 and EMP2 to Site 5
– Select and join at Site 5

• For simplicity, the final projection is

omitted.

DDB 2008/09 J. Gamper Page 6

Query Processing Example . . .
• Calculate the cost of the two strategies under the following assumptions:
– Tuples are uniformly distributed to the fragments; 20 tuples satisfy DUR>37
– size(EMP) = 400, size(ASG) = 1000
– tuple access cost = 1 unit; tuple transfer cost = 10 units
– ASG and EMP have a local index on DUR and ENO
• Strategy 1
– Produce ASG’s: (10+10) * tuple access cost 20
– Transfer ASG’s to the sites of EMPs: (10+10) * tuple transfer cost 200
– Produce EMP’s: (10+10) * tuple access cost * 2 40
– Transfer EMP’s to result site: (10+10) * tuple transfer cost 200
– Total cost 460
• Strategy 2
– Transfer EMP1 , EMP2 to site 5: 400 * tuple transfer cost 4,000
– Transfer ASG1 , ASG2 to site 5: 1000 * tuple transfer cost 10,000
– Select tuples from ASG1 ∪ ASG2 : 1000 * tuple access cost 1,000
– Join EMP and ASG’: 400 * 20 * tuple access cost 8,000
– Total cost 23,000

DDB 2008/09 J. Gamper Page 7

Query Optimization

• Query optimization is a crucial and difficult part of the overall query processing
• Objective of query optimization is to minimize the following cost function:
I/O cost + CPU cost + communication cost

• Two different scenarios are considered:

– Wide area networks
∗ Communication cost dominates
· low bandwidth
· low speed
· high protocol overhead
∗ Most algorithms ignore all other cost components
– Local area networks
∗ Communication cost not that dominant
∗ Total cost function should be considered

DDB 2008/09 J. Gamper Page 8

Query Optimization . . .

• Ordering of the operators of relational algebra is crucial for efficient query processing
• Rule of thumb: move expensive operators at the end of query processing
• Cost of RA operations:

Operation Complexity
Select, Project O(n)
(without duplicate elimination)
Project O(n log n)
(with duplicate elimination)
Group
Join
Semi-join O(n log n)
Division
Set Operators
Cartesian Product O(n2 )

DDB 2008/09 J. Gamper Page 9

Query Optimization Issues

Several issues have to be considered in query optimization

• Types of query optimizers

– wrt the search techniques (exhaustive search, heuristics)
– wrt the time when the query is optimized (static, dynamic)

• Statistics
• Decision sites
• Network topology
• Use of semijoins