CHAPTER - 02 - Query Processing - CS 2nd Year - 2016
CHAPTER - 02 - Query Processing - CS 2nd Year - 2016
Query processing requires that the DBMS identify and execute a strategy for
retrieving the results of the query. Query optimization is necessary to determine
the optimal alternative to process a query.
The first approach is to use a rule based or heuristic method for ordering the
operations in a query execution strategy.
The second approach estimates the cost of different execution strategies and
chooses the best solution.
The scanner identifies the query tokens such as SQL keywords, attribute
names, and relation names that appear in the text of the query.
The query must also be validated by checking that all attribute and
relation names are valid and semantically meaningful names in the
schema of the particular database being queried.
The DBMS must then formulate an execution strategy or query plan for
retrieving the results of the query from the database files.
Typically, SQL queries are decomposed into query blocks, which form the
basic units that can be translated into the algebraic operators and
optimized.
Query Processing: It is a procedure of converting a query written in high level language (Eg. SQL)
into a correct and efficient execution plan expressed in low level language, which is used for
data manipulation.
Query Processing is the activity performed in extracting data from the database.
Before retrieving, updating or deleting data in database, a query goes through a series of query
compilation steps. These steps are known as execution plan.
Execution Plan: It is the basic algorithm used for each operation in the query.
In query processing, the first phase is transformation in which parser first checks the syntax of
query and also checks the relations and attributes used in the query that are defined in the
database.
After checking the syntax and verifying the relations, query is transformed into
equivalent expression that are more efficient to execute.
The next step is to validate the user privileges and ensure that the query does not
disobey the relevant integrity constraints.
Query processor first checks the syntax and existence of relations and their
attributes in database.
After validations, query processor transform it into equivalent and more efficient
expression.
For example, query will be converted into a standard internal format that parser
can manipulate.
Order of tokens are also maintained to make sure that all the
rules of language grammars are followed.
It starts with the high-level query that is transformed into low level
correct.
For example, SQL Query is decomposed into blocks like Select block,
The type specification of the query qualifier and result is also checked at
this stage.
A query tree is constructed using tree data structure that corresponds to the relational
algebra expression.
Query Graph Notation: Graph data structure is also used for internal
representation of query.
Query can be converted into one of the following two normal forms:
(1) Conjunctive normal form: It is a sequence of conjuncts that are connected with ‘AND’
operator.
A conjunctive selection consists only those tuples that satisfy all conjuncts.
(2) Disjunctive normal forms: It is a sequence of disjuncts that are connected with ‘OR’ operator.
A disjunctive selection contains those tuples that satisfy anyone of the disjunct.
Disjunctive normal form is more useful as it allows the query to break into a series of
independent sub-queries linked by union.
(D) Query Simplifier: The major tasks of query simplifier are as follows:
It introduces integrity constraints, view definitions into the query graph representation.
It eliminates query that voids any integrity constraint without accessing the database.
rules are applied to restructure the query to give a more efficient implementation.
expression into different ways and query optimizer choose the most
information.
Heuristic rules are used in the form of a query tree or query graph
structure.
Optimizer starts with initial query tree and transform it into an equivalent
and efficient query tree using transformation rules.
In cost based query optimization, optimizer estimates the cost of running of all
The alternative which uses the minimum resources is having minimum cost.
The cost of a query operation is mainly depend on its selectivity i.e., the proportion
of the input relations that forms the output.
results (tables or files) that are generated by the execution strategy for the
query.
Compiled by: Temesgen Tilahun 33
Cost based Query Optimization
From all the above components, the most important is access cost to secondary
storage because secondary storage is comparatively slower than other devices.
Optimizer try to minimize computation cost for small databases as most of the
data files are stored in main memory.
For large database, it try to minimize the access cost to secondary storage and
D. Primary access method for each file and attributes for each file.
E. Number of levels for each multi-level index for an attribute A given as IA.
The linear search costs 2000 and condition salary > 9000 first gives
cost estimate of Isalary + (B/2) = 2 + (2000/2) = 2 + 1000 = 1002 and
σAge = 20 AND salary > 9000 (Employee) = total cost is 32 + 1002 = 1034
Compiled by: Temesgen Tilahun 41
Cost Function for Join Operation
R1 = 7, 2, 9, 8, 3, 9, 1, 3, 6
R2 = 8, 4, 2, 1, 3, 2, 7, 3
?
Compiled by: Temesgen Tilahun 49