0% found this document useful (0 votes)
1 views39 pages

8 Query Optimization

The document covers disk storage and indexing in databases, detailing the types of storage, file organizations, and indexing methods such as primary, clustering, and secondary indexes. It emphasizes the importance of indexing for efficient data retrieval and query optimization, which reduces resource usage and improves performance. Additionally, it discusses query optimization techniques, including cost-based and rule-based optimization, to enhance database query execution strategies.

Uploaded by

hudaaghazi22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views39 pages

8 Query Optimization

The document covers disk storage and indexing in databases, detailing the types of storage, file organizations, and indexing methods such as primary, clustering, and secondary indexes. It emphasizes the importance of indexing for efficient data retrieval and query optimization, which reduces resource usage and improves performance. Additionally, it discusses query optimization techniques, including cost-based and rule-based optimization, to enhance database query execution strategies.

Uploaded by

hudaaghazi22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

University of Salahaddin

College of Engineering
Software & Informatics Engineering Department

DISK STORAGE AND


INDEXING
2ND STAGE
Lecturer:
Dr. Hanan Kamal
2

Content
• Introduction
• Storage of Database
• Files
• Types of Single-Level Ordered Indexes
• Primary Indexes
• Clustering Indexes
• Secondary Indexes
3

Introduction
• Computerized database → (physically) computer storage
medium.
• Computer storage media form a storage hierarchy that
includes two main categories:
1. Primary storage
2. Secondary and tertiary storage

• Usually Primary :
• Provides fast access to data but is of limited storage capacity.
• More expensive and have less storage capacity than secondary
and tertiary storage devices.
4

Storage of Databases
• Databases typically store large amounts of data that must
persist over long periods of time, and hence is often
referred to as persistent data.
• Transient data Parts of data are accessed and processed
repeatedly during this period.
• Databases are stored permanently (or persistently) on
magnetic disk, for the following reasons:
• Size
• losses
• The cost
5

Storage of Databases (cont.)


• The data stored on disk is organized as files of records.
• Each record is a collection of data values that can be
interpreted as facts about entities, their attributes, and
their relationships.
• There are several primary file organizations
• heap file (or unordered file)
• sorted file (or sequential file) (sort key).
• hashed file (hash key)
• B-trees
6

Files
Unordered (heap) files Ordered (Sorted) Files

1. New records are 1. File records are kept


inserted at the end of sorted .
the file. 2. A binary search can be
2. To search for a record, used
a linear search . 3. Insertion is expensive:
3. Record insertion is records must be
quite efficient. inserted in the correct
4. Reading the records in order.
order of a particular 4. Reading the records in
field requires sorting order of the ordering
the file records. field is quite efficient.
7

• Some blocks of
an ordered
(sequential) file
of EMPLOYEE
records with
Name as the
ordering key
field.
What is Indexing?

• A data structure to quickly locate records without scanning


the entire table.
• Acts like a book index for faster access.
• Improves read/query performance significantly.
Why Indexing Matters

• Reduces I/O cost of query execution.


• Improves performance for SELECT, JOIN, and ORDER
BY.
• Supports constraints like UNIQUE and PRIMARY KEY.
10

Types of Single-Level Ordered Indexes


• Indexing idea (book index)
• An index access structure is usually defined on a single field of
a file, called an indexing field (or indexing attribute).

• Indexes can also be characterized as dense or


sparse.
• A dense index has an index entry for every search
key value (and hence every record) in the data file.

• A sparse (or nondense) index, has index entries


for only some of the search values.
11

Primary Indexes
• A primary index is an ordered file whose records are of
fixed length with two fields
• Index entry (or index record) in the index file, will refer to
the two field values of index entry i as <K(i), P(i)>.

• K(i) is of t he same data type as the ordering


key Field called the primary key of the data file
• P(i) is a pointer to a disk block (a block
address).
12
13

Clustering Indexes
• If file records are physically ordered on a nonkey field
which does not have a distinct value for each record that
field is called the clustering field and the data file is
called a clustered file.

• A clustering index is also an ordered file with two


fields:
1. field is of the same type as the clustering field of the
data file
2. disk block pointer.
14
15

Secondary Indexes
• A secondary index provides a secondary means of
accessing a data file for which some primary access
already exists.
• The data file records could be ordered, unordered.

• The secondary index may be created on:


1. Candidate key and has a unique value in every record
2. Nonkey field with duplicate values.
16

Secondary index on a key, nonordering field of a


file.
17

Secondary index on a nonkey, nonordering field


of a file.
18

Summary of Indexes
19

Thank you
20

1. What is indexing?
2. Talk about
A.Primary Indexes
B.Clustering Indexes
C.Secondary Indexes
University of Salahaddin
College of Engineering
Software & Informatics Engineering Department

QUERY OPTIMIZATION
ND
2 STAGE

Lecturer:
Dr. Hanan Kamal
22

Introduction
• In network and hierarchical DBMSs, low-level
procedural query language is generally
embedded in high-level programming language.
• Programmer’s responsibility to select most
appropriate execution strategy.
• With declarative languages such as SQL, user
specifies what data is required rather than how it
is to be retrieved.
• Relieves user of knowing what constitutes good
execution strategy.
23

Introduction
• Also gives DBMS more control over system
performance.

• Two main techniques for query


optimization:
• Heuristic rules that order operations in a query;
• Comparing different strategies based on relative
costs, and selecting one that minimizes
resource usage.

• Disk access tends to be dominant cost in


query processing for centralized DBMS.
24

Query Processing

Activities involved in retrieving data from


the database.

• Aims of QP:
• Transform query written in high-level language
(e.g. SQL), into correct and efficient execution
strategy expressed in low-level language
(implementing RA);
• Execute strategy to retrieve required data.
Introduction to Query Optimization
• Query optimization (QO) is the process of
choosing the most efficient way to execute a SQL
query.

• The importance of QO:


- Reduces response time
- Minimizes resource usage
- Essential for large-scale databases
26

Query Optimization
• A query optimizer evaluates multiple query execution
plans and chooses the best one based on cost
estimates.
• As there are many equivalent transformations of same
high-level query, aim of QO is to choose one that
minimizes resource usage.
• Generally, reduce total execution time of query.
• May also reduce response time of query.
• Problem computationally intractable with large number of
relations, so strategy adopted is reduced to finding near
optimum solution.
Goals of Query Optimization
- Minimize I/O cost
- Reduce CPU usage
- Ensure fastest response time
- Avoid unnecessary computations
Cost-Based Optimization
This method evaluates alternative plans using a
cost model based on:
- Disk I/O
- CPU cost
- Network latency
Rule-Based Optimization
Uses heuristics (rules) to transform queries:
- Push selections early
- Combine projections
- Join order optimization
30

Example 21.1 - Different Strategies

Find all Managers who work at a London


branch.

SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = ‘Manager’ AND b.city = ‘London’);
31

Example 21.1 - Different Strategies


• Three equivalent RA queries are:
(1) (position='Manager')  (city='London') 
(Staff.branchNo=Branch.branchNo) (Staff X Branch)
(2) (position='Manager')  (city='London')(
Staff Staff.branchNo=Branch.branchNo Branch)
(3) (position='Manager'(Staff))
Staff.branchNo=Branch.branchNo
(city='London' (Branch))
Join Optimization
Join operations can be expensive. Types of join
algorithms:
- Nested Loop Join
- Hash Join
- Merge Join
Subquery Optimization
• Rewrite subqueries as joins or EXISTS to
improve performance.
View Materialization
• Views can be materialized (stored) or
virtual (computed per query).
Materialization improves performance at
the cost of storage.
Statistics for Optimization
Database optimizers use table statistics:
- Row counts
- Value distributions
- Index usage
Query Rewriting Techniques
- Remove redundant conditions
- Simplify expressions
- Use EXISTS instead of IN
37

Phases of Query Processing


Summary
✔ Query optimization is crucial for efficient
database systems
✔ Use indexes, rewrite queries, and analyze
execution plans
✔ Know your data and schema
39

Thank you

You might also like