0% found this document useful (0 votes)

2 views50 pages

04 Advanced Database System Chap 02 [RVUNC]

Chapter Two discusses query processing and optimization, detailing the steps of parsing, optimization, and evaluation of SQL queries. It explains the internal representations of queries, such as query trees and graphs, and the importance of indexes in improving query execution time. Additionally, the chapter covers algorithms for sorting and implementing SELECT and JOIN operations, emphasizing the role of heuristics and cost estimates in query optimization.

Uploaded by

tagesseabate887

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views50 pages

04 Advanced Database System Chap 02 [RVUNC]

Uploaded by

tagesseabate887

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

CHAPTER TWO

Query Processing and Optimization

Chapter Outline
 Introduction to Query Processing
Translating SQL Queries into Relational Algebra
Basic algorithms
Sorting: internal sorting and external sorting

 Implementing the SELECT operation

 Implementing the JOIN operation

 Implementing the Project operation

Using Heuristics in Query Optimization

Using Selectivity and Cost Estimates in Query Optimization

Semantic Query Optimization

4/11/2022 1
Introduction to Query Processing

• A query expressed in a high-level query language such as

SQL must first be scanned, parsed, and validated.
• The scanner identifies the language tokens—such as
SQL keywords, attribute names, and relation names—in
the text of the query.
• The parser checks the query syntax to determine whether it
is formulated according to the syntax rules (rules of grammar)
of the query language.
• The query must also be validated, by checking that all
attribute and relation names are valid and semantically
meaningful names in the schema of the particular database
being queried.

4/11/2022 2
Cont…

• An internal representation of the query is then created, usually

as a tree data structure called a query tree.

• It is also possible to represent the query using a graph data

structure called a query graph.

• The DBMS must then devise an execution strategy for

retrieving the result of the query from the database files.

• A query typically has many possible execution strategies, and

the process of choosing a suitable one for processing a
query is known as query optimization.

4/11/2022 3
Cont…

• Query optimization: The process of choosing a suitable execution

strategy for processing a query.

• Two internal representations of a query:

 Query Tree

 Query Graph
• The query optimizer module has the task of producing an execution plan,
and the code generator generates the code to execute that plan.
• The runtime database processor has the task of running the query code,
whether in compiled or interpreted mode, to produce the query result.
• If a runtime error results, an error message is generated by the runtime
database processor.

4/11/2022 4
Cont…

• There are three phases that a query passes through during the DBMS
processing of that query:
 Parsing and translation

 Optimization

 Evaluation

• Most queries submitted to a DBMS are in a high-level language such as

SQL.

• During the parsing and translation stage, the human readable form of
the query is translated into forms usable by the DBMS.

• These can be in the forms of a relational algebra expression, query tree and
query graph

4/11/2022 5
Cont…

4/11/2022 6
Parsing and Translating the Query
• The first step in processing a query submitted to a DBMS is to convert the
query into a form usable by the query processing engine.
• High-level query languages such as SQL represent a query as a string, or
sequence, of characters.
• Certain sequences of characters represent various types of tokens such as
keywords, operators, operands, literal strings, etc. Like all languages, there are
rules (syntax and grammar) that govern how the tokens can be combined into
understandable (i.e. valid) statements.
• The primary job of the parser is to extract the tokens from the raw string
of characters and translate them into the corresponding internal data
elements (i.e. relational algebra operations and operands) and structures (i.e.
query tree, query graph).
• The last job of the parser is to verify the validity and syntax of the original
query string.
4/11/2022 7
Optimizing the Query

• In this stage, the query processor applies rules to the internal data structures
of the query to transform these structures into equivalent, but more efficient
representations.

• The rules can be based upon mathematical models of the relational

algebra expression and tree (heuristics), upon cost estimates of different
algorithms applied to operations or upon the semantics within the query
and the relations it involves.

• Selecting the proper rules to apply, when to apply them and how they are
applied is the function of the query.

4/11/2022 8
Evaluating the Query
• The final step in processing a query is the evaluation phase. The
best evaluation plan candidate generated by the optimization engine is
selected and then executed.
• Note that there can exist multiple methods of executing a query. Besides
processing a query in a simple sequential manner, some of a query‘s
individual operations can be processed in parallel—either as independent
processes or as interdependent pipelines of processes or threads.
• Regardless of the method chosen, the actual results should be same.
• The term optimization is actually a misnomer because in some cases the
chosen execution plan is not the optimal (best) strategy—it is just a
reasonably efficient strategy for executing the query.
• Finding the optimal strategy is usually too time-consuming except for the
simplest of queries and may require information on how the files are
implemented and even on the contents of the files—information that may
not be fully available in the DBMS catalog.
• Hence, planning of an execution strategy may be a more accurate
description than query optimization.
4/11/2022 9
THE ROLE OF INDEXES
• The utilization of indexes can dramatically reduce the execution time of various
operations such as select and join.
• Let us review some of the types of index file structures and the roles they play
in reducing execution time and overhead:
• Dense Index: Data-file is ordered by the search key and every search key
value has a separate index record.
• This structure requires only a single seek to find the first occurrence of a set of
contiguous records with the desired search value.
• Sparse Index: Data-file is ordered by the index search key and only some of
the search key values have corresponding index records. Each index record‘s
data-file pointer points
• Dense index — Index record appears for every search-key value in the file.

4/11/2022 10
Cont…

•Sparse Index: Data-file is ordered by the index search key and only some
of the search key values have corresponding index records. Each index
record‘s data-file pointer points
•Dense index — Index record appears for every search-key value in the
file.
•To the first data-file record with the search key value.
•While this structure can be less efficient (in terms of number of disk
accesses) than a dense index to find the desired records, it requires less
storage space and less overhead during insertion and deletion operations.

4/11/2022 11
Cont…

•Primary Index: The data file is ordered by the attribute that is also the
search key in the index file. Primary indices can be dense or sparse. This
is also referred to as an Index Sequential File. For scanning through a
relation‘s records in sequential order by a key value, this is one of the
fastest and more efficient structures—locating a record has a cost of 1
seek, and the contiguous makeup of the records in sorted order
minimizes the number of blocks that have to be read.
•However, after large numbers of insertions and deletions, the
performance can degrade quite quickly, and the only way to restore the
performance is to perform reorganization.
4/11/2022 12
Cont…
• Secondary Index: The data file is ordered by an attribute that is different
from the search key in the index file. Secondary indices must be dense.

•Multi-Level Index: An index structure consisting of 2 or more tiers of

records where an upper tier‘s records point to associated index records of
the tier below. The bottom tier‘s index records contain the pointers to the
data-file records. Multi-level indices can be used, for instance, to reduce the
number of disk block reads needed during a binary search.
4/11/2022 13
Cont…

Clustering Index: A two-level index structure where the records in the first
level contain the clustering field value in one field and a second field pointing
to a block [of 2nd level records] in the second level.
The records in the second level have one field that points to an actual data file
record or to another 2nd level block.

4/11/2022 14
Translating SQL Queries into Relational Algebra
• SQL is the query language that is used in most commercial RDBMSs.

• An SQL query is first translated into an equivalent extended relational

algebra expression—represented as a query tree data structure—that is then
optimized.

• Typically, SQL queries are decomposed into query blocks, which form the
basic units that can be translated into the algebraic operators and optimized.

• A query block contains a single SELECT-FROMWHERE expression, as

well as GROUP BY and HAVING clauses if these are part of the block.

• Hence, nested queries within a query are identified as separate query

blocks.

4/11/2022 15
Cont…

For example consider COMPANY Relational Database

Schema

4/11/2022 16
Cont…

Consider the following SQL query on the EMPLOYEE relation

SELECT LNAME, FNAME FROM EMPLOYEE
WHERE SALARY > (SELECT MAX (SALARY)
FROM EMPLOYEE WHERE DNO=5);
• the outer block is
SELECT LNAME, FNAME FROM EMPLOYEE
WHERE SALARY > c where c represents the result returned from the inner
block.
The inner block could be translated into the extended relational algebra
expression
∏MAX SALARY (sDNO=5(EMPLOYEE)) and the outer block into the
expression
∏ LNAME, FNAME(sSALARY>C(EMPLOYEE))

4/11/2022 17
Cont…

• The query optimizer would then choose an execution plan for each block.

• We should note that in the above example, the inner block needs to be
evaluated only once to produce the maximum salary, which is then used—
as the constant c—by the outer block.

• We called this an uncorrelated nested query.

• It is much harder to optimize the more complex correlated nested

queries where a tuple variable from the outer block appears in the
WHERE-clause of the inner block

4/11/2022 18
Cont…

EX 1: R = (A, B, C) S = (D, E, F)

• Let relations r(R) and s(S) be given. An expression in SQL that is

equivalent to each of the following queries.

A. SELECT distinct A from r ΠA(r)

B. SELECT * FROM r WHERE B = 17 σB =17 (r)

C. SELECT distinct * FROM r, s r × s

D. SELECT distinct A, F FROM r, s WHERE C = D ΠA,F (σC =D(r × s))

Ex: 2: Let R = (A, B, C), and let r1 and r2 both be relations on schema

(SELECT * FROM r1) union (SELECT * FROM r2) r1 ∪ r2

SELECT * FROM r1 WHERE (A, B, C) in (SELECT * FROM r2) r1 ∩ r2

SELECT ∗ FROM r1 WHERE (A, B, C) not in (SELECT ∗ FROM r2) r1 − r2

4/11/2022 19
Cont…
• Example
For every project located in ‘Stafford’, retrieve the project number, the
controlling department number and the department manager’s last name,
address and birth date.
• SQL query:
SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATE

FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E

WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;
• Relation algebra:

∏PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((∏ PLOCATION=‘STAFFORD’(PROJECT))

DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))

4/11/2022 20
Cont…

4/11/2022 21
Algorithms for External Sorting
• Sorting is one of the primary algorithms used in query processing.

• For example, whenever an SQL query specifies an ORDER BY-

clause, the query result must be sorted.

• Sorting is also a key component in sort-merge algorithms used for

JOIN and

• other operations (such as UNION and INTERSECTION), and

in duplicate elimination algorithms for the PROJECT
operation (when an SQL query specifies the DISTINCT option
in the SELECT clause).

4/11/2022 22
External sorting:

– Refers to sorting algorithms that are suitable for large files of records
stored on disk that do not fit entirely in main memory, such as most
database files.
• Sort-Merge strategy:
– Starts by sorting small subfiles (runs) of the main file and then merges
the sorted runs, creating larger sorted subfiles that are merged in turn.

• The sort-merge algorithm, like other database algorithms,

requires buffer space in main memory, where the actual
sorting and merging of the runs is performed.
• The basic algorithm,, consists of two phases: the sorting
phase and the merging phase.
4/11/2022 23
Cont…
– Sorting phase: nR = (b/nB)
– Merging phase: dM = Min (nB-1, nR); nP = (logdM(nR))
– nR: number of initial runs; b: number of file blocks;
– nB: available buffer space; dM: degree of merging;
– nP: number of passes.
• In the sorting phase, runs (portions or pieces) of the file that can fit in the
available buffer space are read into main memory, sorted using an internal
sorting algorithm, and written back to disk as temporary sorted subfiles (or
runs).
• The size of each run and the number of initial runs (nR) are dictated by
the number of file blocks (b) and the available buffer space (nB).

4/11/2022 24
For example

• If the number of available main memory buffers nB = 5 disk

blocks and the size of the file b = 1024 disk blocks, then

• nR= ⎡(b/nB)⎤ or 205 initial runs each of size 5 blocks (except

the last run which will have only 4 blocks).

• Hence, after the sorting phase, 205 sorted runs (or 205 sorted
subfiles of the original file) are stored as temporary subfiles on
disk.

4/11/2022 25
Cont…

• In the merging phase, the sorted runs are merged during one or more
merge passes. Each merge pass can have one or more merge steps.

• The degree of merging (dM) is the number of sorted subfiles that can be
merged in each merge step.

• During each merge step, one buffer block is needed to hold one disk block
from each of the sorted subfiles being merged, and one additional buffer is
needed for containing one disk block of the merge result, which will
produce a larger sorted file that is the result of merging several smaller
sorted subfiles.

• Hence, dM is the smaller of (nB − 1) and nR, and the number of merge
passes is ⎡(logdM(nR))⎤.

4/11/2022 26
Cont…

• In our example where nB = 5, dM = 4 (four-way merging), so the 205

initial sorted runs would be merged 4 at a time in each step into 52 larger
sorted subfiles at the end of the first merge pass.

• These 52 sorted files are then merged 4 at a time into 13 sorted files, which
are then merged into 4 sorted files, and then finally into 1 fully sorted file,
which means that four passes are needed.

4/11/2022 27
CONT…

4/11/2022 28
Algorithms for SELECT and JOIN Operations
Implementing the SELECT Operation
• Examples:
– (OP1):  SSN='123456789' (EMPLOYEE)
– (OP2):  DNUMBER>5(DEPARTMENT)
– (OP3):  DNO=5(EMPLOYEE)
– (OP4):  DNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE)
– (OP5):  ESSN=123456789 AND PNO=10(WORKS_ON)
Search Methods for Simple Selection:
S1 Linear search (brute force):Retrieve every record in the file,
and test whether its attribute values satisfy the selection
condition.

4/11/2022 29
Cont…

S2 Binary search: If the selection condition involves an equality

comparison on a key attribute on which the file is ordered,
binary search (which is more efficient than linear search) can
be used. (See OP1).
S3 Using a primary index or hash key to retrieve a single
record:If the selection condition involves an equality
comparison on a key attribute with a primary index (or a hash
key), use the primary index (or the hash key) to retrieve the
record.

4/11/2022 30
Cont…

S4 Using a primary index to retrieve multiple records: If the comparison

condition is >, ≥, <, or ≤ on a key field with a primary index, use the index to find
the record satisfying the corresponding equality condition, then retrieve all
subsequent records in the (ordered) file.
S5 Using a clustering index to retrieve multiple records: If the selection
condition involves an equality comparison on a non-key attribute with a clustering
index, use the clustering index to retrieve all the records satisfying the selection
condition.
S6 Using a secondary (B+-tree) index: On an equality comparison, this search
method can be used to retrieve a single record if the indexing field has unique
values (is a key) or to retrieve multiple records if the indexing field is not a key.
In addition, it can be used to retrieve records on conditions involving >,>=, <, or
<=. (FOR RANGE QUERIES)

4/11/2022 31
Search Methods for Complex Selection:
S7 Conjunctive selection: If an attribute involved in any single
simple condition in the conjunctive condition has an access path
that permits the use of one of the methods S2 to S6, use that
condition to retrieve the records and then check whether each
retrieved record satisfies the remaining simple conditions in the
conjunctive condition.
S8 Conjunctive selection using a composite index
If two or more attributes are involved in equality conditions in the
conjunctive condition and a composite index (or hash structure)
exists on the combined field, we can use the index directly.
4/11/2022 32
Cont…

S9 Conjunctive selection by intersection of record pointers:

 This method is possible if secondary indexes are available on all (or
some of) the fields involved in equality comparison conditions in the
conjunctive condition and if the indexes include record pointers (rather
than block pointers).
 Each index can be used to retrieve the record pointers that satisfy the
individual condition.
 The intersection of these sets of record pointers gives the record
pointers that satisfy the conjunctive condition, which are then used to
retrieve those records directly.
 If only some of the conditions have secondary indexes, each
retrieved record is further tested to determine whether it satisfies the
remaining conditions.

4/11/2022 33
Cont…

– Whenever a single condition specifies the selection, we can

only check whether an access path exists on the attribute involved
in that condition.
• If an access path exists, the method corresponding to that
access path is used; otherwise, the “brute force” linear search
approach of method S1 is used. (See OP1, OP2 and OP3)
– For conjunctive selection conditions, whenever more than
one of the attributes involved in the conditions have an access
path, query optimization should be done to choose the access path
that retrieves the fewest records in the most efficient way.
4/11/2022 34
Implementing the JOIN Operation:

– Join (EQUIJOIN, NATURAL JOIN)

• two–way join: a join on two files

• e.g. R A=B S

• multi-way joins: joins involving more than two files.

• e.g. R A=B S C=D T

• Examples

– (OP6): EMPLOYEE DNO=DNUMBER DEPARTMENT

– (OP7): DEPARTMENT MGRSSN=SSN EMPLOYEE

4/11/2022 35
Methods for implementing joins:

– J1 Nested-loop join (brute force):

• For each record t in R (outer loop), retrieve every record s
from S (inner loop) and test whether the two records satisfy
the join condition t[A] = s[B].
– J2 Single-loop join (Using an access structure to retrieve the
matching records):
• If an index (or hash key) exists for one of the two join
attributes say, B of S — retrieve each record t in R, one at a
time, and then use the access structure to retrieve directly all
matching records s from S that satisfy s[B] = t[A].

4/11/2022 36
Cont…
– J3 Sort-merge join:

• If the records of R and S are physically sorted (ordered) by value of

the join attributes A and B, respectively, we can implement the join in
the most efficient way possible.

• Both files are scanned in order of the join attributes, matching the
records that have the same values for A and B.

• In this method, the records of each file are scanned only once each
for matching with the other file—unless both A and B are non-key
attributes, in which case the method needs to be modified slightly.

4/11/2022 37
Cont…

– J4 Hash-join:
• The records of files R and S are both hashed to the same hash
file, using the same hashing function on the join attributes A of R
and B of S as hash keys.
• A single pass through the file with fewer records (say, R)
hashes its records to the hash file buckets.
• A single pass through the other file (S) then hashes each of its
records to the appropriate bucket, where the record is combined
with all matching records from R.

4/11/2022 38
4/11/2022 39
Cont…

• Factors affecting JOIN performance

– Available buffer space
– Join selection factor
– Choice of inner VS outer relation

4/11/2022 40
Using Heuristics in Query Optimization

• Process for heuristics optimization

1. The parser of a high-level query generates an initial internal
representation;
2. Apply heuristics rules to optimize the internal representation.
3. A query execution plan is generated to execute groups of operations
based on the access paths available on the files involved in the query.
• The main heuristic is to apply first the operations that reduce the size of
intermediate results.
– E.g., Apply SELECT and PROJECT operations before applying the
JOIN or other binary operations.

4/11/2022 41
Cont…

• Query tree:
– A tree data structure that corresponds to a relational algebra expression.
– It represents the input relations of the query as leaf nodes of the tree, and
represents the relational algebra operations as internal nodes.
– An execution of the query tree consists of executing an internal node
operation whenever its operands are available and then replacing that internal
node by the relation that results from executing the operation.
• Query graph:
• A graph data structure that corresponds to a relational calculus expression.
• It does not indicate an order on which operations to perform first.
• There is only a single graph corresponding to each query.

4/11/2022 42
Cont…
• Example: For every project located in ‘Stafford’, retrieve the
project number, the controlling department number and the
department manager’s last name, address and birthdate.
• Relation algebra:
 PNUMBER, DNUM, LNAME, ADDRESS, BDATE
((( PLOCATION=‘STAFFORD’(PROJECT))

DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))

• SQL query:
Q2: SELECT P.NUMBER,P.DNUM,E.LNAME,
E.ADDRESS, E.BDATE
FROM PROJECT AS P,DEPARTMENT
AS D, EMPLOYEE AS E
WHERE P.DNUM=D.DNUMBER AND
D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;
4/11/2022 43
Cont…

4/11/2022 44
Cont…
• Heuristic Optimization of Query Trees:
– The same query could correspond to many different
relational algebra expressions — and hence many different
query trees.
– The task of heuristic optimization of query trees is to find a
final query tree that is efficient to execute.
• Example:
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON,
PROJECT
WHERE PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND ESSN=SSN
AND BDATE > ‘1957-12-31’;

4/11/2022 45
Cont…

4/11/2022 46
Using Selectivity and Cost Estimates in Query Optimization
• A query optimizer does not depend solely on heuristic rules; it
also estimates and compares the costs of executing a query
using different execution strategies and algorithms, and it then
chooses the strategy with the lowest cost estimate.
• For this approach to work, accurate cost estimates are required
so that different strategies can be compared fairly and
realistically.
• In addition, the optimizer must limit the number of execution
strategies to be considered; otherwise, too much time will be
spent making cost estimates for the many possible execution
strategies.
• Hence, this approach is more suitable for compiled queries where
the optimization is done at compile time and the resulting execution
strategy code is stored and executed directly at runtime.
4/11/2022 47
Cont…

• Cost-based query optimization:

– Estimate and compare the costs of executing a query using different
execution strategies and choose the strategy with the lowest cost
estimate. (Compare to heuristic query optimization)
• Issues
– Cost function
– Number of execution strategies to be considered
• Cost Components for Query Execution
1. Access cost to secondary storage
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
Note: Different database systems may focus on different cost
components.
4/11/2022 48
Cont…
• Examples of Cost Functions for SELECT
• S1. Linear search (brute force) approach
– CS1a = b; For an equality condition on a key, CS1a = (b/2) if the record
is found; otherwise CS1a = b.
• S2. Binary search: CS2 = log2b + (s/bfr) –1 For an equality condition on a
unique (key) attribute, CS2 =log2b
• S3. Using a primary index (S3a) or hash key (S3b) to retrieve a single
record CS3a = x + 1; CS3b = 1 for static or linear hashing;
– CS3b = 1 for extendible hashing;
• S4. Using an ordering index to retrieve multiple records: For the
comparison condition on a key field with an ordering index, CS4 = x + (b/2)
• S5. Using a clustering index to retrieve multiple records:
– CS5 = x + ┌ (s/bfr) ┐
• S6. Using a secondary (B+-tree) index:
– For an equality comparison, CS6a = x + s; For an comparison condition
such as >, <, >=, or <=, CS6a = x + (bI1/2) + (r/2)
4/11/2022 49
Semantic Query Optimization :
– Uses constraints specified on the database schema in order to modify one query
into another query that is more efficient to execute.
• Consider the following SQL query,
SELECT E.LNAME, M.LNAME
FROM EMPLOYEE E M
WHERE E.SUPERSSN=M.SSN AND E.SALARY>M.SALARY
• Explanation:
– Suppose that we had a constraint on the database schema that stated that no
employee can earn more than his or her direct supervisor. If the semantic query
optimizer checks for the existence of this constraint, it need not execute the query
at all because it knows that the result of the query will be empty. Techniques
known as theorem proving can be used for this purpose.

4/11/2022 50

WP - Tackling The Universal Journal (ACODCA) Data Challenge - F
No ratings yet
WP - Tackling The Universal Journal (ACODCA) Data Challenge - F
22 pages
Chapter - 1 - Query Optimization
No ratings yet
Chapter - 1 - Query Optimization
38 pages
CO3-SESSION-23
No ratings yet
CO3-SESSION-23
27 pages
ADB Notes 2021
No ratings yet
ADB Notes 2021
43 pages
CO3 Session 11
No ratings yet
CO3 Session 11
27 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
Chapter 2 Adb
No ratings yet
Chapter 2 Adb
21 pages
Chapter-2
No ratings yet
Chapter-2
47 pages
2 Algorithms For Query Processing Optimization
No ratings yet
2 Algorithms For Query Processing Optimization
46 pages
Query Processing
No ratings yet
Query Processing
20 pages
AMSAL
No ratings yet
AMSAL
58 pages
Chapter 1 Query Processing and Optimization
No ratings yet
Chapter 1 Query Processing and Optimization
129 pages
Advanced Database System Chapter Three Query Processing and Optimization
No ratings yet
Advanced Database System Chapter Three Query Processing and Optimization
94 pages
CH - 1 Query Process SW
No ratings yet
CH - 1 Query Process SW
43 pages
CH - 2 Query Process
No ratings yet
CH - 2 Query Process
44 pages
Query Optimization: Admas University, Advanced DBMS Lecture Note
No ratings yet
Query Optimization: Admas University, Advanced DBMS Lecture Note
5 pages
CO3-Notes-Query Processing and Optimization
No ratings yet
CO3-Notes-Query Processing and Optimization
5 pages
Query Optimization
No ratings yet
Query Optimization
103 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
31 pages
ADBMS Chapter 1
No ratings yet
ADBMS Chapter 1
47 pages
ch2. pdf
No ratings yet
ch2. pdf
72 pages
Ch-2 Query Processing and Optimization
No ratings yet
Ch-2 Query Processing and Optimization
26 pages
36-Module-4 Query Optimization-16-03-2024
No ratings yet
36-Module-4 Query Optimization-16-03-2024
6 pages
Advanced Database Chapter Two Query Processing and Optimization
100% (1)
Advanced Database Chapter Two Query Processing and Optimization
43 pages
Adb_ch2
No ratings yet
Adb_ch2
72 pages
Module - 4
No ratings yet
Module - 4
60 pages
ADBMS Chapter One
No ratings yet
ADBMS Chapter One
21 pages
Final DBMS Unit 7
No ratings yet
Final DBMS Unit 7
48 pages
Chapter One1
No ratings yet
Chapter One1
21 pages
Chapter 2 - Query Processing and Optimization
100% (1)
Chapter 2 - Query Processing and Optimization
28 pages
Advancedchapter 2 2013
No ratings yet
Advancedchapter 2 2013
16 pages
What Is Query Processing?
No ratings yet
What Is Query Processing?
9 pages
QUERY Processing and Relational Algebra
No ratings yet
QUERY Processing and Relational Algebra
27 pages
Query Optimization
No ratings yet
Query Optimization
60 pages
Chapter 2 Query Optimization
No ratings yet
Chapter 2 Query Optimization
31 pages
Query Processing Concepts
No ratings yet
Query Processing Concepts
99 pages
Lecture 4 Query Processing
No ratings yet
Lecture 4 Query Processing
18 pages
Chapter 2 Querry Proccessing
No ratings yet
Chapter 2 Querry Proccessing
7 pages
Adbs CH2
No ratings yet
Adbs CH2
56 pages
Chapter 1 Query Processing
No ratings yet
Chapter 1 Query Processing
58 pages
ADBChapter 1
No ratings yet
ADBChapter 1
32 pages
Chapter 2 Query Processing
No ratings yet
Chapter 2 Query Processing
56 pages
Query Proc Notes
No ratings yet
Query Proc Notes
10 pages
Chapter 1 - Query Processing and Optimization
No ratings yet
Chapter 1 - Query Processing and Optimization
62 pages
Unit-5 Query Processing and Optimization
No ratings yet
Unit-5 Query Processing and Optimization
40 pages
Query Processing
No ratings yet
Query Processing
5 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
127 pages
CHAPTER_2_Query_Processing_&_Optimization_Handout_Material
No ratings yet
CHAPTER_2_Query_Processing_&_Optimization_Handout_Material
17 pages
Chapter 4 Query Optimization
100% (2)
Chapter 4 Query Optimization
35 pages
Introduction To Query Processing and Optimization
No ratings yet
Introduction To Query Processing and Optimization
4 pages
Advaced DB U1
No ratings yet
Advaced DB U1
48 pages
Ad Database All Slide
No ratings yet
Ad Database All Slide
49 pages
2 Chapter 3 Query Optimization
No ratings yet
2 Chapter 3 Query Optimization
29 pages
Chapter Two Query Processing (2)
No ratings yet
Chapter Two Query Processing (2)
60 pages
Advanced Database Systems Chapter 2
100% (1)
Advanced Database Systems Chapter 2
16 pages
CH 02
No ratings yet
CH 02
127 pages
Rdbms Assignment
No ratings yet
Rdbms Assignment
12 pages
Unit-4 DBMS Merged
No ratings yet
Unit-4 DBMS Merged
156 pages
Query Processing
No ratings yet
Query Processing
4 pages
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
From Everand
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Chapter-3-Measures of Central Tendency (1)
No ratings yet
Chapter-3-Measures of Central Tendency (1)
12 pages
04 Advanced Database system Chap 01 [RVUNC] (1)
No ratings yet
04 Advanced Database system Chap 01 [RVUNC] (1)
59 pages
chapter 2 Class and object2017
No ratings yet
chapter 2 Class and object2017
34 pages
Chapter 3
No ratings yet
Chapter 3
18 pages
Visual-Basic-operators
No ratings yet
Visual-Basic-operators
3 pages
An End-To-End Cyber Security Maturity Model For Technology Startups
No ratings yet
An End-To-End Cyber Security Maturity Model For Technology Startups
7 pages
Order Management System
No ratings yet
Order Management System
5 pages
Wecp Tableau Questions
No ratings yet
Wecp Tableau Questions
11 pages
BSA Drum Bus
No ratings yet
BSA Drum Bus
19 pages
Implementing Cisco Service Provider Next-Generation Edge Network Services
No ratings yet
Implementing Cisco Service Provider Next-Generation Edge Network Services
21 pages
Netbackup System Requirements
No ratings yet
Netbackup System Requirements
3 pages
Gauss Contest Paper 2021
No ratings yet
Gauss Contest Paper 2021
4 pages
Subsquid Whitepaper v0.6
No ratings yet
Subsquid Whitepaper v0.6
12 pages
A.all Blocks of Scratch 0
No ratings yet
A.all Blocks of Scratch 0
10 pages
Autocad Job Interview Questions and Answers
No ratings yet
Autocad Job Interview Questions and Answers
15 pages
IGCSE Computer Science 0478 Topical Questions for Chapter 2 from past papers
100% (1)
IGCSE Computer Science 0478 Topical Questions for Chapter 2 from past papers
95 pages
Iot Monitoring
No ratings yet
Iot Monitoring
703 pages
RAD Rooms: Primax International Solutions
100% (1)
RAD Rooms: Primax International Solutions
51 pages
Js 2
No ratings yet
Js 2
58 pages
Ccs 1000 D Digital Discussion System Manual
No ratings yet
Ccs 1000 D Digital Discussion System Manual
10 pages
Query Execution
No ratings yet
Query Execution
25 pages
Development Manual YDLIDAR X4 PDF
No ratings yet
Development Manual YDLIDAR X4 PDF
10 pages
UNIT V Regular Expression, Rollover and Frames
No ratings yet
UNIT V Regular Expression, Rollover and Frames
31 pages
Algebra Booster
No ratings yet
Algebra Booster
94 pages
Nicole Elyse Nelson
No ratings yet
Nicole Elyse Nelson
3 pages
CS605 Grand Quiz by Junaid
No ratings yet
CS605 Grand Quiz by Junaid
16 pages
Advt No.18-2022
No ratings yet
Advt No.18-2022
2 pages
CSS Grade 9 Q3 LAS8
No ratings yet
CSS Grade 9 Q3 LAS8
9 pages
System Hardware and Data Handling
No ratings yet
System Hardware and Data Handling
25 pages
Praveen Team PPT Stls AIA
No ratings yet
Praveen Team PPT Stls AIA
14 pages
20110520101701
100% (2)
20110520101701
27 pages
Nanoheal Software and Provision User Guide.
No ratings yet
Nanoheal Software and Provision User Guide.
23 pages
HP ProBook 455 G4 Quickspecs
No ratings yet
HP ProBook 455 G4 Quickspecs
34 pages

04 Advanced Database System Chap 02 [RVUNC]

Uploaded by

04 Advanced Database System Chap 02 [RVUNC]

Uploaded by

CHAPTER TWO

Query Processing and Optimization

 Implementing the SELECT operation

 Implementing the JOIN operation

 Implementing the Project operation

Using Heuristics in Query Optimization

Using Selectivity and Cost Estimates in Query Optimization

• A query expressed in a high-level query language such as

• An internal representation of the query is then created, usually

• It is also possible to represent the query using a graph data

• The DBMS must then devise an execution strategy for

• A query typically has many possible execution strategies, and

• Query optimization: The process of choosing a suitable execution

• Two internal representations of a query:

• Most queries submitted to a DBMS are in a high-level language such as

• The rules can be based upon mathematical models of the relational

•Multi-Level Index: An index structure consisting of 2 or more tiers of

• An SQL query is first translated into an equivalent extended relational

• A query block contains a single SELECT-FROMWHERE expression, as

• Hence, nested queries within a query are identified as separate query

For example consider COMPANY Relational Database

Consider the following SQL query on the EMPLOYEE relation

• We called this an uncorrelated nested query.

• It is much harder to optimize the more complex correlated nested

• Let relations r(R) and s(S) be given. An expression in SQL that is

A. SELECT distinct A from r ΠA(r)

B. SELECT * FROM r WHERE B = 17 σB =17 (r)

C. SELECT distinct * FROM r, s r × s

D. SELECT distinct A, F FROM r, s WHERE C = D ΠA,F (σC =D(r × s))

(SELECT * FROM r1) union (SELECT * FROM r2) r1 ∪ r2

SELECT * FROM r1 WHERE (A, B, C) in (SELECT * FROM r2) r1 ∩ r2

SELECT ∗ FROM r1 WHERE (A, B, C) not in (SELECT ∗ FROM r2) r1 − r2

FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E

∏PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((∏ PLOCATION=‘STAFFORD’(PROJECT))

DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))

• For example, whenever an SQL query specifies an ORDER BY-

• Sorting is also a key component in sort-merge algorithms used for

• other operations (such as UNION and INTERSECTION), and

• The sort-merge algorithm, like other database algorithms,

• If the number of available main memory buffers nB = 5 disk

• nR= ⎡(b/nB)⎤ or 205 initial runs each of size 5 blocks (except

• In our example where nB = 5, dM = 4 (four-way merging), so the 205

S2 Binary search: If the selection condition involves an equality

S4 Using a primary index to retrieve multiple records: If the comparison

S9 Conjunctive selection by intersection of record pointers:

– Whenever a single condition specifies the selection, we can

– Join (EQUIJOIN, NATURAL JOIN)

• two–way join: a join on two files

• multi-way joins: joins involving more than two files.

• e.g. R A=B S C=D T

– (OP6): EMPLOYEE DNO=DNUMBER DEPARTMENT

– (OP7): DEPARTMENT MGRSSN=SSN EMPLOYEE

– J1 Nested-loop join (brute force):

• If the records of R and S are physically sorted (ordered) by value of

• Factors affecting JOIN performance

• Process for heuristics optimization

DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))

• Cost-based query optimization:

You might also like