Query Optimization
Heuristic Optimization
Lecturer: Dr Hui Ma
Engineering and Computer Science
Outline
Query Processing in DBMS
Query optimization techniques
Heuristic query optimization
Translating SQL into relational algebra
Reordering query operations
Transformation rules for relational algebra operations
Readings from the textbook:
Chapter 19: Algorithms for Query Processing and
Optimization
Chapters 17: Disk Storage, Basic File Structures, and
Hashing (Sections: 17.2, to 17.8)
Chapter 18: Indexing Structures for Files
(Sections: 18.1 to 18.5)
SWEN 304 Lect10: Heuristic Query Optimization 2
Query Processing in DBMS
Users/applications submit queries to the DBMS
The DBMS processes queries before evaluating them
Recall: DBMS mainly use declarative query languages (such
as SQL)
Queries can often be evaluated in different ways
SQL queries do not determine how to evaluate them
SWEN 304 Lect10: Heuristic Query Optimization 3
Query Processing in DBMS
SWEN 304 Lect10: Heuristic Query Optimization 4
Query Optimisation
Query preparation:
Decompose query into query blocks containing
Exactly one SELECT and FROM clause
At most one WHERE, GROUP BY and HAVING clause
Nested queries within a query are identified as separate
query blocks
Aggregate operators in SQL must be included in the extended
algebra
SWEN 304 Lect10: Heuristic Query Optimization 5
Heuristic Query Optimization: An Example
Student({Lname, Fname, StudId, Major})
Grades({StudId, CrsId, Grade})
Course({Cname, CrsId, Points, Dept})
SQLquery:
SELECT StudId, Lname
FROM Student s, Grades g, Course c
WHERE s.StudId=g.StudId AND g.CrsId= c.CrsId
AND Grade = ‘A+’ AND Dept = ‘Comp’ AND Major = ‘Math’;
•Relational Algebra:
StudId, LName (σ s.StudId=g.StudId ˄ g.CrsId=c.CrsId ˄ Grade=‘A+’ ˄ Major = ‘Math’
˄ Dept = ‘Comp’ (Course × (Student × Grades)))
SWEN 304 Lect10: Heuristic Query Optimization 6
Query Tree
Query tree: A tree data structure that corresponds to a
relational algebra expression
It represents the input relations of the query as leaf nodes
of the tree, and
Represents the relational algebra operations as internal
nodes
The initial query tree for the example SQL query:
Lower level nodes, starting from leaves, contain Cartesian
product operators
These are applied onto relations from the SQL FROM
clause
After that are (relational) select and join conditions from
SQL WHERE clause to upper tree nodes applied
Finally is project operator of the SELECT clause attribute
list to tree root applied
SWEN 304 Lect10: Heuristic Query Optimization 7
Query Tree Example
• Relational Algebra:
StudId, LName (σ s.StudId=g.StudId ˄ g.CrsId=c.CrsId ˄ Grade=‘A+’ ˄ Major= ‘Math’ ˄
Dept = ‘Comp’ (Course × (Student × Grades)))
SWEN 304 Lect10: Heuristic Query Optimization 8
Query Tree Exercise
• Draw query trees for the following relational algebra expressions
StudId (σGrade=‘A+’ (Grades))
StudId, LName (σGrade=‘A+’ (Student * Grades))
(StudId,(σ CrsId = ‘M214’ (Grades))) - (StudId,(σ CrsId = ‘C201’ (Grades)))
SWEN 304 Lect10: Heuristic Query Optimization 9
Query Execution Plan
For each query tree, computation proceeds bottom-up:
Child nodes must be executed before their parent nodes
But there can exist multiple methods of executing sibling
nodes, e.g.
Process sequentially
Process in parallel
A query execution plan consists of a query tree with
additional annotation at each node indicating the
implementation method for each RA operator
The query optimizer determines which query execution
plan is optimal, using a variety of algorithms
Realistically, we cannot expect to always find the best
plan but we expect to consistently find a plan that is
good (near-optimal)
SWEN 304 Lect10: Heuristic Query Optimization 10
Query Optimisation
Query optimizer rewrites the naïve (canonical) query plan
into a more efficient evaluation plan
Choosing a suitable query execution plan is called Query
Optimization and it mainly means making decisions
On the order of the execution of basic relational algebra
operations and
On data access methods
SWEN 304 Lect10: Heuristic Query Optimization 11
Query Optimization Techniques
Two main query optimization techniques
One relies on the heuristic reordering of the relational algebra
operations,
The other involves systematic estimating the cost of different
execution plans, and choosing one with the lowest cost
Heuristics vs. cost-based optimization
General heuristics allow to improve performance of most
queries
Costs estimated from statistics allow for a good optimization
of each specific query
Most DBMS use a hybrid approach between heuristics and
cost estimations
SWEN 304 Lect10: Heuristic Query Optimization 12
Query Optimization Techniques
Enumerating potential query plans:
SWEN 304 Lect10: Heuristic Query Optimization 13
Heuristic Query Optimization
Process for heuristics optimization
1. The parser of a high-level query generates an initial
internal representation
2. Apply heuristics rules to optimize the internal
representation
3. A query execution plan is generated to execute
groups of operations based on the access paths
available on the files involved in the query
The main heuristic is to apply first the operations that
reduce the size of intermediate results
E.g., Apply SELECT and PROJECT operations before
applying the JOIN or other binary operations
SWEN 304 Lect10: Heuristic Query Optimization 14
Using Heuristics in Query Optimization (1)
General Transformation Rules for Relational Algebra
Operations:
1. Cascade of : A conjunctive selection condition can be
broken up into a cascade (sequence) of individual
operations:
c1 ʌ c2 ʌ ... ʌ cn(R) = c1 (c2 (...(cn(R))...) )
2. Commutativity of : The operation is commutative:
c1 (c2(R)) = c2 (c1(R))
3. Cascade of : In a cascade (sequence) of operations,
all but the last one can be ignored:
List1 (List2 (...(Listn(R))...) ) = List1(R)
4. Commuting with : If the selection condition c involves
only the attributes A1, ..., An in the projection list, the
two operations can be commuted:
A1, A2, ..., An (c (R)) = c (A1, A2, ..., An (R))
SWEN 304 Lect10: Heuristic Query Optimization 15
Using Heuristics in Query Optimization (2)
General Transformation Rules for Relational Algebra
Operations (contd.):
5. Commutativity of ( and x ):
R1 C R2 = R2 CR1; R1 x R2 = R2 x R1
6. Commuting with (or x ): If all the attributes in the selection
condition c involve only the attributes of one of the relations
being joined—say, R1—the two operations can be commuted
as follows:
c (R1 R2 ) = c (R1) R2
Alternatively, if the selection condition c can be written as (c1
and c2), where condition c1 involves only the attributes of R1
and condition c2 involves only the attributes of R2, the
operations commute as follows:
c(R1 R2) = c1(R1) c2 (R2)
SWEN 304 Lect10: Heuristic Query Optimization 16
Using Heuristics in Query Optimization (3)
General Transformation Rules for Relational Algebra
Operations (contd.):
7. Commuting with (or x): Suppose that the projection list
is AL = {A1, ..., An, B1, ..., Bm}, where A1, ..., An are
attributes of R1 and B1, ..., Bm are attributes of R2. If the join
condition c involves only attributes in AL, the two operations
can be commuted as follows:
AL (R1 C R2) = (A1, ..., An (R1)) C( B1, ..., Bm (R2))
If the join condition C contains additional attributes not in
AL, these must be added to the projection list, and a final
operation is needed.
AL(R1 CR2) = AL(AL1(R1) CAL2(R2)),
where ALi contains the common attributes in Ri and AL and
the common attributes of R1 and R2
SWEN 304 Lect10: Heuristic Query Optimization 17
Using Heuristics in Query Optimization (4)
General Transformation Rules for Relational Algebra
Operations (contd.):
8. Commutativity of set operations: The set operations ∪
and ∩ are commutative but “–” is not.
9. Associativity of , x, ∪, and ∩ : These four operations
are individually associative; that is, if stands for any
one of these four operations (throughout the
expression), we have
(R1 R2) R3 = R1 (R2 R3)
10. Commuting with set operations: The operation
commutes with ∪, ∩, and –. If stands for any one of
these three operations, we have
c (R1 R2) = (c (R1)) (c (R2))
SWEN 304 Lect10: Heuristic Query Optimization 18
Using Heuristics in Query Optimization (5)
General Transformation Rules for Relational Algebra
Operations (contd.):
11. The operation commutes with ∪.
AL (R1 ∪ R2) = (AL (R1)) ∪ (AL (R2))
12. Converting a (, x) sequence into : If the condition c
of a that follows a x corresponds to a join condition,
convert the (, x) sequence into a as follows:
(C (R1 x R2)) = R1 C R2
SWEN 304 Lect10: Heuristic Query Optimization 19
Using Heuristics in Query Optimization (6)
Summary of Heuristics for Algebraic Optimization:
1. The main heuristic is to apply first the operations
that reduce the size of intermediate results.
2. Perform select operations as early as possible to
reduce the number of tuples and perform project
operations as early as possible to reduce the number
of attributes. (This is done by moving select and
project operations as far down the tree as possible,
e.g. rule 6, 7, 10, 11)
3. The select and join operations that are most
restrictive should be executed before other similar
operations. (This is done by reordering the leaf
nodes of the tree among themselves and adjusting
the rest of the tree appropriately.)
SWEN 304 Lect10: Heuristic Query Optimization 20
An Example Initial Query Tree:
Relational algebra query
StudId, LName (σGrade=‘A+’ ˄ Major = ‘Math’ ˄ Dept = ‘Comp’ ˄
s.StudId=g.StudId ˄ g.CrsId=c.CrsId (Course × (Student × Grades)))
Query tree
SWEN 304 Lect10: Heuristic Query Optimization 21
Analysis of the Initial Query Tree
According to the structure of the initial query
tree, two Cartesian products should be executed
first
The main heuristic: Apply SELECT and PROJECT
operations before applying the JOIN or other
binary operations
Hence, move select operations down the tree
SWEN 304 Lect10: Heuristic Query Optimization 22
Query Tree After Moving Down Selects
Move select operations down the tree (Rule 6)
SWEN 304 Lect10: Heuristic Query Optimization 23
Query Tree After Introducing Joins
Replacing each Cartesian product followed by a select
according to a join condition with a join operator (Rule 12)
SWEN 304 Lect10: Heuristic Query Optimization 24
Query Tree After Switching Join Nodes
Assume there are less Comp courses than Math major students
switching Course and Student so that the very restrictive select
operation could be applied as early as possible (Rule 9)
SWEN 304 Lect10: Heuristic Query Optimization 25
Query Tree after Pushing down Project
keeping in intermediate relations only the attributes needed by
subsequent operations by applying project () operations as early as
possible (Rule 7)
SWEN 304 Lect10: Heuristic Query Optimization 26
Effect of Project Operations
Performing project operations as early as
possible brings an improvement in the efficiency
of a query execution
tuples get shorter after projection, more of them
will fit into a file block of the same size
the same number of tuples will be contained in a
smaller number of blocks
As there will be less blocks to be processed by
subsequent operations, the query execution will
be faster
SWEN 304 Lect10: Heuristic Query Optimization 27
Query Optimisation
All transformations can be applied to the canonical
evaluation plan
However, there is no best operator sequence that is
always optimal
Efficiency depends on the current data instance, the
actual implementation of base operations, access paths
and indexes, etc.
Idea: assign average costs to operators (nodes) and
aggregate costs for each query plan
SWEN 304 Lect10: Heuristic Query Optimization 28
Summary
Heuristic optimization converts a declarative
query to a canonical algebraic query tree that is
then gradually transformed using transformation
rules
The main heuristics is to perform unary
relational operations (selection and projection)
before binary operations (joins, set theoretic),
and aggregate functions with (or without)
grouping
SWEN 304 Lect10: Heuristic Query Optimization 29
Next
Cost-based query optimization
Readings:
Chapter 19: Algorithms for Query Processing and
Optimization
Chapters 17: Disk Storage, Basic File Structures, and
Hashing (Sections: 17.2, to 17.8)
Chapter 18: Indexing Structures for Files
(Sections: 17.1 to 17.5)
Supposed knowledge: File Organization – COMP261
SWEN 304 Lect10: Heuristic Query Optimization 30