0% found this document useful (1 vote)
470 views69 pages

Optimizing SQL Queries in Oracle

This course teaches techniques for optimizing SQL queries in Oracle databases. It covers rearranging table structures, modifying SQL code, using indexes, and partitioning tables. The course begins with basic data optimizations like normalizing tables. It then covers SQL code optimizations and using indexes before discussing how to partition large tables to improve query performance. The instructor uses sample data of candy bar surveys to demonstrate the various optimization strategies taught in the course.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
470 views69 pages

Optimizing SQL Queries in Oracle

This course teaches techniques for optimizing SQL queries in Oracle databases. It covers rearranging table structures, modifying SQL code, using indexes, and partitioning tables. The course begins with basic data optimizations like normalizing tables. It then covers SQL code optimizations and using indexes before discussing how to partition large tables to improve query performance. The instructor uses sample data of candy bar surveys to demonstrate the various optimization strategies taught in the course.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 69

Optimizing SQL Queries in Oracle

Introduction

Hello and welcome to the Pluralsight course, Optimizing SQL Queries in


Oracle. My name is Scott Hecht and in this course I'll be discussing several
things you can do to speed up your SQL queries. We'll cover everything from
rearranging your table data, to modifying your SQL code, as well as how to
use indexes and partitions. Let's get started.

Course Overview

Before we go over the course contents in detail, let's give an overview of the
course in general. We start off describing how to speed up your SQL queries
by first optimizing your table data. That is, we talk about minimizing the
columns and rows in your data using a variety of techniques, such as normal
forms, where to place nulls in a table to save space, and so on. We then
describe how to speed up your SQL queries by focusing on your SQL code
itself. For example, using intersect instead of an interjoin can speed up your
queries tremendously in certain instances. Next, we talk about how to
create and use indexes on one or more columns in your tables. Finally, we
discuss how to create and use partitions to break up very large tables into
smaller pieces called partitions, so that only the desired data, based on your
SQL queries where clause, is accessed by Oracle. The reason I have ordered
this lecture as shown is to prevent you from just jumping to indexes or
partitions as if they will be the cure all for all of your SQL query problems. A
properly structured table or set of tables, along with well written SQL code,
can go a long way to speeding up your SQL queries and it is only at that
point that indexes and partitions should be considered if your SQL queries
are still taking too long to run. On the other hand, if you are a beginning
database administrator tasked with creating a database you may want to
consider all of these things upfront while still in the database design phase.

Course Contents

Next, let's go over the course contents in more detail. Of course, we start off
with the course introduction, which is the subject of this module. We discuss
basic optimizations as related to data, that as we talk about how to shrink
down the size of your tables by arranging them in third normal form. We
discuss how nulls are stored in the table and how arranging your columns
appropriately can also shrink down the size of your table. In the next module
we discuss how you can change your SQL code to get your queries to run
faster. No indexes or partitions are created in this section, only simple

changes to your SQL code is explored. In the next few modules we discuss a
more advanced SQL optimization technique called indexes. We start off with
an overview of indexes. If appropriately applied, indexes can make your SQL
queries execute much much faster, but with great power comes great
responsibility. Indexes take up space in your database, so you have to
balance speed versus space usage. There are several types of indexes we
will be discussing, such as B-Tree indexes, which are the Boeing 747 of
indexes. That is, they are the index most often used in a tables columns and
are considered a general purpose index. Bitmap indexes are used when the
column or columns you want to index have a small or medium number of
distinct values, as compared to the total number of rows in the table. For
example, in the data used throughout the course the responded gender
column contains two distinct values, m and f, as compared to the total
number of rows, which is five million rows. A bitmap index only responded
gender column could be considered depending on the SQL query submitted
to the database. We then move on to discuss several additional types of
indexes, such as function based indexes, used when you're SQL queries
frequently contain a specific function applied to a column. Now, normally
applying the function to a column that is indexed will prevent Oracle from
using that index, but function based indexes fix that. We also briefly discuss
index organized tables, as well as bitmap drain indexes. In the remaining
modules we discuss partitions, which are used to break up very large tables
based on a specific column or columns, so that it appears to Oracle as if the
table is actually several individual tables. For example, we could partition
our data into 10 partitions using the survey year column and if your SQL
query subsets by say, survey year 2004, Oracle knows enough to skip over
the partitions containing the other years. This can speed up your queries
tremendously. Similar for the modules on indexes, we start off this series of
modules with an overview of partitions. We'll then discuss several types of
partitions, such as list, range, and hash partitions. We then move on to
discuss composite partitions, which allow you to partition by one method
and then sub-partition using another method. For example, we can partition
first by survey year and then sub-partition by responded gender within
those partitions. Finally, we discuss the interaction between partitions and
indexes.

Module Contents

Let's go over the contents of the remaining part of this particular module.
We discuss who will benefit from this course. We talk about the database
version used to prepare this course. We'll look at the data used throughout
the course. We'll then talk briefly about Tablespaces and how they are used
with tables, indexes, and partitions. We'll also talk about the Oracle query
optimizer, which is used to determine how your query will be executed. We
remind you that several great resources are Oracle's on Pluralsight's
websites themselves. Next, we also remind you that your greatest resource
is your company's database administrator, or DBA, and we end this module
with a brief summary.

Who Will Benefit?

Who will benefit from this course? Firstly, SQL developers will benefit greatly
from this course, especially if they've never used or only casually used
indexes and partitions before. If you connect to Oracle via a third party tool
such as SAS, SPSS, R, and so on, you can benefit from this course as well.
It's not always necessary to pull all of the data back from Oracle into say,
SAS datasets, in order to then subset the data down. That is, if you're going
to subset the data, why pull it all back into SAS? Why not just subset the
data in Oracle and then pull back all of the subsetted data? A clean
separation of tasks between your database and your statistical software will
most likely work in your favor. That is, let Oracle do the data manipulation
and let SAS, SPSS, and R do what they're good at, namely the statistics and
the graphics. Finally, if you're a new database administrator, yes that's how
you actually appear to people, then you will definitely benefit from this
course, although you will still have to read through the manuals because
well, that's what good DBAs do.

Database Version

Let's talk about the Oracle database version used to prepare this course. To
prepare this course I used Oracle 11g release 2, although Oracle 12c is out,
an introductory course, such as this, will suffice with Oracle 11g release 2.
The client tool I'm using is Oracle SQL Developer, which is freely available
on Oracle's website. Finally, if you don't have access to an Oracle database
in work, that's sad, or you don't have Oracle SQL Developer, you can
download both from Oracle's wonderful website at www.oracle.com.

Data Used in Course

Let's talk about the data use throughout the course. The table is called
CANDYBAR_HISTORICAL_DATA and contains the fake responses to several
yearly fake surveys on candy bar consumption, as well as fake ratings on
key fake attributes. This data is fake of course, but there's a lot of it, as I've
created five million rows of data. Woo hoo. By the way, I've included the
data in the Pluralsight download section of this course and you can load it
into your own database by yourself or ask your database administrator to
load it in for you. Now, this table will be broken apart into fact, as well as
dimension tables in the first part of module two. The SQL code to do this is
included in the download file as well. I will be going back and forth between
the CANDYBAR_HISTORICAL_DATA tables and the fact and dimension tables
during the course of this lecture in order to show SQL runtime comparisons.
Now, let's go over the columns in this table. RESPONDENT_ID, this column

contains a number representing a unique survey respondent.


RESPONDENT_NAME, this column contains the respondent's full name.
RESPONDENT_ADDR, this column contains the respondent's address.
RESPONDENT_CITY, the name of a respondent's city of course.
RESPONDENT_STATE, the respondent's two letter U.S. state code.
RESPONDENT_ZIPCODE, this column contains the respondent's five digit U.S.
zip or postal code. RESPONDENT_PHONE_NUM, the respondent's telephone
number with area code. RESPONDENT_GENDER, the respondent's gender,
that's m for male and f for female. RESPONDENT_DOB, the respondent's
date of birth as a date data type. CANDYBAR_ID, this column contains a
number representing a unique candy bar. For example, one represents the
Three Musketeers candy bar. CANDYBAR_NAME, this is the full name of the
candy bar, for example Three Musketeers. CANDYBAR_MFR_ID, this column
represents the manufacturer of the associated candy bar, for example,
Three Musketeers is manufactured by our manufacturer ID number 49.
CANDYBAR_MFR_NAME, this is the full name of the manufacturer of the
associated candy bar. For example, Three Musketeers is manufactured by
the fine folks at Mars. CANDYBAR_WEIGHT_OZ, this is the fake weight in
ounces of the fake candy bar, the fake serving respondent was not really
sent. Let's finish up our list of columns in the CANDYBAR_HISTORICAL_DATA
table. SURVEY_DATE, this column is the date the survey was given to the
respondent and is a date data type. SURVEY_YEAR, this column is just the
year associated with the survey date and is a four digit number.
TASTE_RATING, this is the rating of the candy bar on the attribute taste and
ranges from 1-10 where 1 means the candy bar tastes like a dead hamster
and 10 means the candy bar tastes amazoid. APPEARANCE_RATING, this is
the rating of the candy bar on the attribute appearance and ranges from 110 where 1 means the candy bar looks awful and 10 means the candy bar
looks wonderful. TEXTURE_RATING, this is the rating of the candy bar on the
attribute texture and ranges from 1-10 where 1 means the candy bar has a
texture of sand and 10 means the candy bar has a pleasing texture.
OVERALL_RATING, this is the overall rating of the candy bar on a scale from
1-10. LIKELIHOOD_PURCHASE, this is the likelihood the respondent would
purchase the candy bar on a scale from 1-10 where 1 indicates the
respondent would only purchase the candy bar at gunpoint and 10 indicates
the respondent would definitely purchase the candy bar with his own money.
NBR_BARS_CONSUMED, this column indicates the number of bars the
respondent consumed during the year. Let's complete our talk about the
data use throughout the course by showing you a single row of data, and
here it is. As you can see, the text column, such as the address, the name of
the candy bar, and the name of the manufacturer, take up a lot of space in
the CANDYBAR_HISTORICAL_DATA table. In our next module, we'll look into
how to slim down this table in order to speed up our SQL queries.

A Comment about Tablespaces

Let's talk briefly about Tablespaces in Oracle. An Oracle Tablespace is where


objects, such as tables and indexes, are stored and is completely

transparent to you since your database administrator has associated a


default tablespace with your username or schema. Any table or index you
create will, by default, be placed on this Tablespace, but you may
occasionally want to store tables, indexes, and partitions in separate
tablespaces in order to, among other things, reduce auto contention
between these objects. First, let's talk about Single-File Databases. A SingleFile Database contains all of your tables and indexes within a single file
stored on disk. If you've ever used Microsoft Access, and I suspect you have,
you know that Access stores your data, as well as indexes, in a single .mdb
file. On the other hand, Oracle makes use of one or more physical files to
store database object such as tables, indexes, and other good stuff. When
your database administrator creates a new database he or she names one
or more files on disk where the database will store your tables and so on.
These files on disk are physical files, since they actually exist on disk. A
tablespace is associated with one or more files. That is, a tablespace is a
logical construct and is associated with several physical files on disk. A
single physical file can only be associated with one tablespace though. As
mentioned, your tables and indexes are stored within the tablespace your
DBA has setup for your database. Each database can contain several
tablespaces, as well as associated physical files on disk, but your DBA can
create more tablespaces and files, if the need arises. For example, here's a
graphical representation of files, tablespaces, tables, and indexes. We have
two tablespaces, TBS_ONE and TBS_TWO. Take note that TBS_ONE only
currently has two physical files associated with it, whereas TBS_TWO has
three. Also, note that TBS_ONE currently houses two tables and an index,
whereas TBS_TWO contains just two tables. Now, when creating one or more
tables, indexes or partitions, you can specify a tablespace other that the
default. Normally, SQL developers don't do this, although database
administrators may choose to do this. In any case, I show you how to do this
in several places during the course, but you can ignore it unless you really
need it.

Oracle Query Optimizer

Let's talk briefly about the Oracle Query Optimizer. What is the Oracle Query
Optimizer? Let me put it to you like this. When you have a lot of errands to
run what goes through your mind? Usually, you try to figure out the fastest
route to complete your errands. You don't just drive from one place to the
next place randomly do you? You mentally map out a route so that say, the
next stop is nearest to the previous stop along your route, but this isn't
always the case. What happens if you have a signed check from work? You
probably want to get it to the bank first and then do the rest of your errands.
If you have to shop for food, you probably want to do that last, since you
don't want your ice cream to melt while you're at the craft store, and what
happens if one of the roads is closed? All of these factors are taken into
account by your brain, which tosses around the many possible routes you
can take, and then you decide on the best route given the information you
have. Bank first, craft store second, supermarket last, and then home. Your

brain assigns weights, or costs, to each of the routes and the route with the
lowest total cost is the one you use. A similar thing is done when you submit
a SQL query to Oracle. Oracle analyzes your query and decides the best
possible way to execute your query given serial cost of performing tasks,
such as full table scans versus using an index. The best way to execute your
query is called the execution plan and, in this case, the optimizer is also
known as the cost based optimizer because, just like deciding the best way
to run errands, Oracle chooses the best execution plan based on the total
cost of running your query. Now, in order to roll out Oracle's query optimizer
to determine the best execution plan for your query, you need to help it
along by gathering database relevant information, such as the number of
rows in the tables appearing in your query, along with their average run
length, the number of distinctive values within the columns, the number of
nulls in the columns, information about indexes, and so on. These are called
optimizer statistics and are used by the query optimizer to create the best
possible execution plan for your query. We'll talk about how to easily gather
optimizer statistics, if necessary, later on in the course. Now, I'm not saying
your SQL query is bad or wrong, I'm saying that Oracle can determine the
best possible plan of attack to execute your query, just like you planning on
the best possible route to get your errands done. For example, the Oracle
Query Optimizer can decide which index to use or to use no indexes at all.
The Oracle Query Optimizer can also decide the best possible way to join
two or more tables together. For most of us programmers out there, when
you much two SAS datasets together what do you normally do? Well, usually
you do a sort, merge maneuver. Oracle's Query Optimizer can decide to
perform a sort and merge maneuver as well or it can decide to use one of
the other merge techniques available to Oracle, nested loops, hash joins,
and so on. In later modules we'll show you how to determine if Oracle is
actually using your indexes or not by taking a look at the execution plan
itself. While we won't go into too much detail, we will point out a few things
that should help you speed up your queries. One thing to watch out for
though, is that the use of an index is not always the best plan of attack and
a full table scan may be more appropriate in certain instances.

Oracle and Pluralsight Resources

Let's talk about resources such as Oracle and Pluralsight. If you find yourself
still in need of more information in the topics presented in this course,
you're first stop should be to Oracle's website, as well as to Pluralsight.com.
On Oracle's website you can download their entire documentation bundle
containing all their documentation. If you prefer not to download all of that,
you should instead download the following. The Oracle database SQL
Language Reference Manual, this manual contains all the SQL, DDL, and
DML you are likely to use. The Oracle Database Data Warehousing Guide,
this manual goes into detail about indexes and partitions, as well as a lot
more. The Oracle Database PL/SQL Packages and Types Reference. This
manual details all the packages and procedures available to you, but
specifically, the DBMS stats package, which is used to gather optimizer

statistics. Naturally, don't forget about Pluralsight for more course.


Specifically, Hugo Kornelis has created a very nice lecture called relational
database design on database design course, but it also includes several
sections on the normal forms. David Berry has created a course entitled
Oracle Performance Tuning for Developers, which is an advanced course on
the topic I'm about to present to you. Please check out both of these
lectures when you have a moment.

Your Database Administrator

Let's talk about your database administrator. If you tried all the things I've
suggested in this course, all the Oracle database related Pluralsight courses,
the Oracle manuals, Google, stackoverflow.com, asktom.com, and so on,
and you still can't get your SQL query to execute in a reasonable amount of
time, it's time to email your database administrator. He or she really is the
best resource in your company. Include all the steps you've gone through to
attempt to speed up your query and include all relevant SQL code. The last
thing you should ever do is send an email demanding your DBA make your
SQL code run faster. That stuff's not going to work. The approach outlined
above will most likely get you, not only a response to your email, but some
respect from your database administrator, and that's a good thing.

Summary

In summary, although this was just the course contents, we did learn a few
things. We went over the entire course contents. We talked about
tablespaces and how each tablespace is associated with one or more
physical files on disk and that each tablespace could contain tables, indexes,
partitions, as well as other objects. We talked about the Oracle Query
Optimizer, which takes your query and the optimizer statistics and creates
the best possible execution plan for your SQL code. When all else fails, talk
to you database administrator. He may be schizophrenic, but he's nice
people. In the next module we discuss SQL optimizations related to your
data.

Basic SQL Optimization Part I

Introduction

Hello and welcome back to the Pluralsight course, Optimizing SQL Queries in
Oracle. My name is Scott Hecht and in this module we'll take a look at some
basic methods to optimize your SQL queries with changes applied only to

your data. The information presented in this module usually should be


performed first before moving on to the other optimization techniques we
talk about later on in the course, so let's get started.

Module Contents

First though, let's go over the module contents. This is the first of two
modules on basic optimizations. The first part deals with optimizations
related to your data. In the next module we'll talk about basic optimizations
related to your SQL code. We start of this module with a basic explanation of
Primary and Foreign Keys. Later on in this module we talk about these in
more detail. I assume that you've been exposed to this before, so it should
just be a reminder. We next describe first, second, and Third Normal Forms,
which allow you to take a very large table, like our
CANDYBAR_HISTORICAL_DATA table, and slim it down into several related
tables. Databases love slim. We then revisit Primary and Foreign Keys and
explain in the line and out of line constraints, although identifying your
tables Primary and Foreign Keys won't necessarily speed up all of your
queries, they will be very helpful when joining two or more tables together
by these keys. We talk a little bit about removing infrequently accessed
columns and we end this module with a chat about how NULLs are stored in
an Oracle table, how that shifting mostly NULL columns to the end of the
table can save a little bit of space in a table. Please remember this
particular module, as well as the next, deals solely with data and SQL
related optimizations. Indexes and partitions are not introduced until later in
the course. With that said, when you create a Primary Key Oracle either
creates an index automatically or uses an appropriate preexisting index. We
talk more about that later on in the course.

Primary/Foreign Keys

Let's talk about Primary and Foreign Keys. What is a Primary Key? A Primary
Key uniquely identifies each row in a table and can be made up of a single
column or two or more columns in combination. The Primary Key column or
columns must be unique for each row in the table and these columns must
not contain NULLs. For example, here is the table containing two columns,
CANDYBAR_ID, which is an identification number for a specific candy bar and
the name of the candy bar. The column CANDYBAR_ID is the Primary Key of
this table since each candy bar is uniquely identified. That is, if I code where
CANDYBAR_ID = 1, I'd get back a single row of data. This is true for each
CANDYBAR_ID in this file. Now, knowing this allows me to safely code, say
can't count star on this table, and I will be secure knowing that the number
returned is the number of unique candy bars in that table. If this table
repeated candy bars, my count star would return an incorrect value. As
another example, here's a table containing five columns. The columns
TASTE_RATING and OVERALL_RATING are ratings from a particular

respondent for a particular candy bar on a particular survey date. Because


of that, the three columns, RESPONDENT_ID, CANDYBAR_ID, and
SURVEY_DATE taken together, make up the Primary Key. Just like for the
CANDYBAR_ID in the previous example, given a where clause where say,
RESPONDENT_ID is 1 and CANDYBAR_ID is 79, and SURVEY_DATE is March
21, 2013, I will be sure that I will get back only a single row of data. Now,
occasionally tables contain an additional single column that acts as the
Primary Key. For example, if I create a column that contains a row number
for each row in the table, that could be the Primary Key instead of the three
columns shown. If that's the case, the three columns shown will be known as
the alternate key or alternate Primary Key, whereas the row number column
would be the Primary Key. There's nothing wrong with this, but as far as
understanding the data contained within the table, the three columns in red
are more helpful than the ROW_ID column. Knowing this fact allows us to
perform some complicated SQL queries such as where the average overall
rating for each respondent for each candy bar across all the surveys taken.
As you see here, I can do this by grouping by RESPONDENT_ID and
CANDYBAR_ID just because I know what the Primary Key is. Now that we've
learned what a Primary Key is, what's a Foreign Key? A Foreign Key is
column that appears in one table, but is actually a Primary Key in a
completely different table. The table shown is just the candy bar table from
the previous slide with one additional column, the CANDYBAR_MFR_ID, which
is just the identification number of the manufacturer of the candy bar. As
you see, this column is not the Primary Key of this table, CANDYBAR_ID still
is, but CANDYBAR_MFR_ID is the Primary Key within its own table, which
contains one row for each candy bar manufacturer. Knowing this allows you
to safely join these two tables together as shown without rows repeating
accidentally.

Normal Forms - Part I

Let's talk about optimizations related specifically to data. What do I mean by


that? Well, without data you won't have anything to query, so data is a
rather important component of your SQL query. Now, ignoring indexes and
partitions, which are introduced later on in the course, if your tables are
huge with dozens of columns and billions of rows, then you probably expect
your queries to run slow, but that doesn't mean there aren't ways of
speeding up your queries a little bit. What can we do? Although you've
probably seen this in a college course or read about it in a SQL book,
modifying your data to be in Third Normal Form can decrease your SQL
query runtime by eliminating redundant data or moving columns to other
tables. Recall that our table, CANDYBAR_HISTORICAL_DATA has 22 columns,
some of which are large text strings. 22 columns is a pretty large record
size. Can we do better? Note first, that databases love skinny tables, that is,
tables with only a few columns. I'll explain why in just a moment. Now you
may have thought that Normal Forms were just something you learn in
college and weren't really used in the real world. Well, we're seeing them
run with the transform CANDYBAR_HISTORICAL_DATA table into Third Normal

Form significantly reduces the number of columns, which then has the
potential to reduce down our SQL query runtime. I'll go over the Normal
Forms in detail in just a moment. Finally, just a reminder. Databases love
skinny tables, so a skinny table should execute your query faster than a
corresponding fat table. Why is that? Let's take a look at the inside of a
typical hard drive. As you can see, there is a spinning platter, which holds
the data, and a read/write armature, which reads from and writes to the
disk. Each row of data in the table is stored on the hard drive one row after
the next. Now, in order to move from one row to the next the read/write
armature has to wait for the platter to spin around to the next row. If each
row is very large, as it is in our CANDYBAR_HISTORICAL_DATA table, it takes
time to move from row to row. On the other hand, if each row is tiny the
read/write armature can scan through each row much quicker, and it's the
goal, or at least one of the goals, of Third Normal Form to slim down a table
so that each row is as small as possible, but still allows us to perform any
analysis we need to do. Let's take a look at the Normal Forms in more detail.
Let's start off with the First Normal Form. In a nutshell, a table is in First
Normal Form if there are no repeating fields. For example, suppose you had
a table with two columns, RESPONDENT_ID and PHONE_NUM. From the table
you can see that we only have a single phone number per respondent, but
what happens if you need to add a second phone number? One way to
easily capture the data for a second phone number is to add an additional
column called PHONE_NUM2 to the table. This violates First Normal Form
because you are adding an additional column that is similar to a column that
already exists, namely PHONE_NUM. One solution to this problem is to move
the phone numbers out to their own table, along with the RESPONDENT_ID,
and an additional column indicating the type of phone number, say cell
phone, work phone, home phone, and so on. Next, let's talk about Second
Normal Form. For a table to be in Second Normal Form it must be in First
Normal form and have no attributes that are associated with only part of the
Primary Key. In the table shown we have five columns, RESPONDENT_ID,
CANDYBAR_ID, SURVEY_DATE, CANDYBAR_WEIGHT_OZ, and
OVERALL_RATING. As discussed in the previous module, the first three
columns are the Primary Key of this table. Now, take a look at the column
CANDYBAR_WEIGHT_OZ. You'll note that the ounces displayed are all 4.2 for
CANDYBAR_ID equals 1. This makes sense since a candy bar's weight is
associated with a candy bar and neither the respondent who filled in the
survey nor the date of the survey. To put it another way, the column
CANDYBAR_WEIGHT_OZ is associated with the CANDYBAR_ID column only
and not with the rest of the Primary Key. A candy bar's weight has nothing to
do with a respondent and nothing to do with a survey date. Because of this,
the table is not in Second Normal Form. To fix this situation we move the
column CANDYBAR_WEIGHT_OZ into its own table along with the
CANDYBAR_ID column. We take a look at this more closely in just a moment.
Let's continue our discussion on the Normal Forms by looking up the Third
Normal Form. For a table to be in the Third Normal Form it must first pass
the First Normal Form test, as well as the Second Normal Form test, and
additionally, all of the non-key columns in the table must be associated with
the entire key and not just part of it. For example, we already saw that
CANDYBAR_WEIGHT_OZ is only associated with the CANDYBAR_ID column,

so we got rid of it and moved it to a different table. Next, the


RESPONDENT_NAME column is associated with the RESPONDENT_ID column
only, that is, a respondent's name doesn't change because of the candy bar
he's been assigned to write, so RESPONDENT_NAME has to go. Finally, let's
look at the OVERALL_RATING. This column is the response given by a
particular respondent, for a particular candy bar, on a particular survey
date. Thus, OVERALL_RATING is associated with the entire Primary Key and
not just part of it, so OVERALL_RATING can stay. What's the big deal about
all this Normal Form stuff? Among other things, placing a large table in Third
Normal Form allows you to remove columns out to smaller tables, removing
duplicated data. On the next slide we're going to take the
CANDYBAR_HISTORICAL_DATA table and its 22 columns and put it in Third
Normal Form. As you'll see, not only will this reduce down the total number
of columns in our main table, but as a consequence the record size for each
row will be smaller and our queries should run faster.

Normal Forms - Part II

Our goal now is to transform our CANDYBAR_HISTORICAL_DATA table into a


single skinny table containing just those columns directly related to the
Primary Key columns. By doing this we'll need to create one or more smaller
tables containing additional columns we've removed from the larger table.
The skinny table is known as the fact table and the additional information
tables are called dimension tables. Taken together, the fact and dimension
tables just recreate the large CANDYBAR_HISTORICAL_DATA table, but
without the repeated data and, as mentioned, skinny tables should allow our
SQL queries to execute faster. Now, by breaking the large table into fact and
dimension tables you aren't losing information you're just making it more
manageable for the database to process. Again, think back to the image of
the hard drive shown a few slides back. Let's start off by creating the fact
table. Well, based on the definition of Third Normal Form, we need to keep
all of those columns that are directly associated with our Primary Key, so
first we need to identify the Primary Key columns. As stated earlier, the
following columns are good candidates for the Primary Key,
RESPONDENT_ID, CANDYBAR_ID, and SURVEY_DATE. Next, if you take a look
at the list of columns, the following columns are associated completely with
our Primary Key columns, TASTE_RATING, APPEARANCE_RATING,
TEXTURE_RATING, OVERALL_RATING, LIKELIHOOD_PURCHASE, and
NBR_BARS_CONSUMED. It's these columns which allow us to perform our
analysis and that's it. Each one of the analysis columns, such as
TASTE_RATING and so forth, is completely associated with a specific
RESPONDENT_ID, CANDYBAR_ID, and SURVEY_DATE. On the other hand, the
RESPONDENT_NAME is only associated with every RESPONDENT_ID and as
such, is removed from the fact table and placed into a dimension table. Our
next goal is to create several dimension tables holding the information we
excluded from our fact table. First, we create a dimension table associated
with the respondents. This table will contain the RESPONDENT_ID as the
Primary Key and the RESPONDENT_NAME, RESPONDENT_ADDR, and so on

as the attributes. The next dimension table contains all of those columns
related to the candy bar itself. It would contain the CANDYBAR_ID as the sole
Primary Key, as well as the CANDYBAR_NAME, the CANDYBAR_MFR_ID, and
the CANDYBAR_WEIGHT_OZ, and so on for the rest of the dimension tables.

Normal Forms - Part III

After drawing out our tables on paper, let's take a look at our final database
design. In the center of everything is our fact table called CANDYBAR_FACT,
which contains only the columns associated with the Primary Key columns,
RESPONDENT_ID, CANDYBAR_ID, and SURVEY_DATE, and are shown above
the horizontal line. This table is in Third Normal Form because the remaining
columns, those appearing below the horizontal line, rely directly on all three
columns of the Primary Key and not just part of it. As you can see, we have
significantly fewer columns in this table than in the
CANDYBAR_HISTORICAL_DATA table, 9 columns versus 22 columns. Here we
have the CANDYBAR_DIM table, candy bar dimension table, containing the
name of the candy bar, its manufacturer ID, its weight in ounces. We also
have the candy bar manufacturer dim table containing the name of the
manufacturer along with its ID. Note that if we kept the column
CANDYBAR_MFR_NAME in the CANDYBAR_DIM table, the name would have
been associated with the CANDYBAR_MFR_ID column, which is not part of
the Primary Key. This violates Third Normal Form. Besides that, the
CANDYBAR_MFR_NAME would have repeated several times, since one
manufacturer can produce several types of candy bars. For example, the
Mars corporation produces the Twix bar, MilkyWay bar, Mars bar, or course,
and so on, thus Mars would have repeated many times in the
CANDYBAR_DIM table. Next, we have our RESPONDENT_DIM table containing
the RESPONDENT_ID as the Primary Key, as well as everything associated
directly with it, such as the RESPONDENT_NAME, the RESPONDENT_ADDR,
and so on. Finally, here's our dimension table related to dates. As you see,
our Primary Key is the SURVEY_DATE column and we have two attributes
related to the SURVEY_DATE column, SURVEY_YEAR, which is just a four digit
year associated with the SURVEY_DATE, and SURVEY_MONTH, which is the
number of the month, 1 is January, 2 is February, and so on. At this point,
we have our CANDYBAR_FACT table, as well as several dimension tables.
Now, instead of each table being isolated, you can join them together based
on the keys. For example, we can join together the tables CANDYBAR_DIM
and CANDYBAR_MFR_DIM by the CANDYBAR_MRF_ID column. This is
represented by the line as shown. We can do a similar thing with the
CANDYBAR_FACT table. If need be, you can join the three dimension tables,
DATE_DIM, CANDYBAR_DIM, and RESPONDENT_DIM to the CANDYBAR_FACT
table. These joins are represented by the three lines pointing from the
Primary Keys of the dimension tables to the columns that make up the
Primary Key of the fact table. On this slide let's perform a speed test using
the same query on the CANDYBAR_HISTORICAL_DATA table and the
CANDYBAR_FACT table. Here we have a very simple sum on the
TASTE_RATING column, and note that I just submitted the command, SET

TIMING ON, so that the elapsed time it takes the query to run will be
displayed in the output, which is always nice to see. Here are the results and
this query ran in just under 10 seconds. Now, here we have the equivalent
query on the TASTE_RATING column, but it uses CANDYBAR_FACT instead of
CANDYBAR_HISTORICAL_DATA and the results are naturally the same, but
the execution time is about 2.5 seconds. This query runs in a quarter of the
time and this is solely due to the record size of the table. That is, we've gone
from a fat table, CANDYBAR_HISTORICAL_DATA containing 22 columns, to a
skinny table, CANDYBAR_FACT with only 9 columns, and I think you now
understand why I mentioned that you shouldnt just jump into indexes and
partitions. This Third Normal Form stuff is really nice.

Primary/Foreign Keys REDUX

Let's revisit Primary and Foreign Keys. Recall that the fact and dimension
tables we created each had one or more columns uniquely identifying
individual rows. For example, in the CANDYBAR_DIM table shown here it's
the CANDYBAR_ID column, which uniquely identifies each one of the 250
rows of data within this table. That is, a CANDYBAR_ID of 1 will only occur
once in this table and only one row will be displayed if that CANDYBAR_ID is
requested, similar for the remaining 249 rows. As another example, the
table CANDYBAR_FACT needs three columns, the RESPONDENT_ID, the
CANDYBAR_ID, and the SURVEY_DATE in order to distinguish each individual
row of data uniquely. That is, RESPONDENT_ID of 1, CANDYBAR_ID of 1, and
SURVEY_DATE of March 21, 2013 will only result in a single row of data being
returned from this table. This is true of all the respondents, candy bar IDs,
and dates. You will only get back a single row of data. These special columns
are called Primary Keys. In the case of the CANDYBAR_DIM table, there is
only one column, the CANDYBAR_ID and it is the Primary Key. In the
CANDYBAR_FACT table it takes three columns to make the Primary Key,
RESPONDENT_ID, CANDYBAR_ID, and SURVEY_DATE. Now, if you look at the
CANDYBAR_DIM table, you'll see the CANDYBAR_MRF_ID column. This
column is a Foreign Key because CANDYBAR_MFR_ID is the Primary Key in its
own table. CANDYBAR_MFR_DIM. As you see here, the column,
CANDYBAR_MFR_ID is the Primary Key of the table CANDY_MFR_DIM. Take
note that in our database design all Primary Keys appear above the
horizontal lines. Everything below this line are called attributes or non-key
columns and are solely related to the Primary key. Okay, so what does this
have to do with optimization? When you create your fact and dimension
tables using the create table statement you specify which column or
columns act as the Primary Key or Foreign Key, these are known as
constraints and Oracle will automatically put an index on the Primary Key
column allowing you to join fact and dimension tables together much
quicker. Oracle does not automatically index the Foreign Key within the fact
table though. Again, we talk about indexes later in the course, but let's take
a look at the code used to specify Primary and Foreign Key constraints. Let's
look at how to create a Primary Key using the CREATE TABLE Syntax. In this
example we're telling the Oracle that the column SURVEY_DATE is the

Primary Key of the DATE_DIM table. Following the column data type, DATE in
this case, you'll specify the keyword CONSTRAINT followed by the name of
the constraint. In this case, I have called it pk_surveydate. Next, we follow
up with the keywords PRIMARY KEY, indicating that this column is the sole
Primary Key for this table. After the table is created you can then insert data
into it. This type of Primary Key syntax is called the inline constraint syntax
because we're using the constraint keyword on the same line as the
definition of the column. Now, this syntax only allows you to define a single
column as the Primary Key. On the other hand, you can use the out-of-line
constraint syntax in order to indicate that multiple columns act as the
Primary Key. For example, here we're creating the fact table,
CANDYBAR_FACT. Note that after the column NBR_BARS_CONSUMED has
been defined, we follow up with a comma and the constraint keyword
followed by a name for the constraint, followed by the keywords PRIMARY
KEY, and in parentheses we have a comma delimited list of the columns
which make up the entire Primary Key. Let's take a look at how to create a
Foreign Key using the CREATE TABLE Syntax. As I've mentioned, a Foreign
Key is just a Primary Key in another table. As such, that other table must
have a Primary Key defined on it. Here I'm creating that Primary Key on the
column, CANDYBAR_MFR_ID in the CANDYMFR_DIM table in preparation for
creating the constraint in the fact table. Next, I'll create a Foreign Key by
using the constraint keyword along with the references keyword. As you can
see, the column CANDYBAR_MFR_ID in this table is actually the Primary Key
in the table defined above, and it is this table that is specified after the
references keyword, so let's go through it in more detail. First, I specify the
column CANDYBAR_MFR_ID and then I'll immediately follow up with the
CONSTRAINT keyword. Note that I do not specify the data type and I'll
explain why in just a moment. After the CONSTRAINT keyword I give a name
to the CONSTRAINT. Here it's fk_candybarmfrid. I then follow up with the
REFERENCES keyword followed by the name of the table to reference, in this
case, the table CANDYMFR_DIM as shown above, and following parentheses I
place the name of the column in the CANDYMFR_DIM table that is the
Primary Key. It's the data type in this table that's used as the data type in
the CANDYBAR_DIM table. As I mentioned on the previous slide, Oracle does
not automatically place an index on the column CANDYBAR_MFR_ID in the
CANDYBAR_DIM table, unlike for Primary Keys. We talk more about Primary
and Foreign keys in the modules on indexes later on in the course.

Infrequent Columns

Let's talk about columns that will be used infrequently or not at all. You
already know this, but one way to make your SQL queries execute faster is
to include only those columns on the SELECT clause that you'll be using in
your analysis. This may seem obvious, but I've heard many programmers
over the years say, well, I may need that column later on. I'm not dismissing
that, but if you don't need a column don't include it on the SELECT clause
and you should probably avoid using the asterisk in favor of a list of desired
columns. This not only makes you think about the columns you really need,

but helps out anyone who has to modify your SQL code later on. Now, if you
know you'll never use a particular column, just don't load it into the
database, save the space and your SQL should run a tiny bit faster. On the
other hand, you can create a separate table containing the relevant Primary
Key, as well as these infrequently accessed columns, thus the vast majority
of your analysis will take place on the fact table containing the frequently
accessed columns, and if you do need the infrequently accessed columns,
they're there on another table. For example, if our survey data contained
columns like the version of the survey software used by the respondent, the
timestamp produced by that software, the location of the survey site, the
expiration date of each candy bar eaten by the respondent, and so on, these
columns may not necessarily be used on a day to day basis for analysis, but
may be needed for other reasons later on. These columns can be moved out
to a separate table from the fact table keeping the fact table skinny.

NULLs and Space

Let's talk about NULLs. NULLs can both take up space and not take up space
in a table, so what do I mean by that? If a NULL value appears between two
non-NULL columns, Oracle uses 1 byte of space to indicate that the value in
this column for this row is NULL. If a NULL value appears at the end of a row,
that is it's the last column in the definition of your table, it takes up no space
at all, since Oracle just ends the row at the last non-NULL column and
moves on to the next round. With these two facts it's probably a good idea
to define your table using the CREATE TABLE statement with the columns in
the order from least number of NULLs to most number of NULLs. The least
number of NULLs column will be the Primary Key column of columns and the
last column in the CREATE TABLE statement should probably be the column
that has the most NULLs in it, and so on and so forth in between. Imagine
how much extra space your table will use if you placed the column with the
most NULLs somewhere in the middle of the other columns. Again,
remember the graphic of the hard drive and the read/write armature. The
smaller the record size the faster you can spin through the table.

Summary

In summary, so, what did we just learn? In this module we talked about
Primary and Foreign Keys and how to identify them in your tables. We
learned about first, second, and third Normal Forms and learned that
transferring our data into Third Normal Form can not only save space in a
table by moving columns not directly associated with the Primary Key out to
one or more dimension tables, but that doing so can actually significantly
reduce your SQL queries runtime. We've now looked at how to create
Primary and Foreign Key Constraints by using the inline and out-of-line
constraints syntax. We mentioned that infrequently accessed columns
should be moved out to an auxiliary table. Finally, we talked about how

NULLs take up space when stuck between two non-NULL columns, but does
not take up space when appearing at the end of the column list. In the next
module we'll present an overview of indexes.

Basic SQL Optimization Part II

Introduction

Hello and welcome back to the Pluralsight course, Optimizing SQL Queries in
Oracle. My name is Scott Hecht. Recall that in the last module we discussed
changes to your data and saw how much quicker SQL queries can execute
when those changes were made to a large table. In this module we'll take a
look at a few basic methods to optimize your SQL queries themselves with
changes to just the SQL code. As I mentioned before, the information
presented in this module, as well as the last one, should be performed first
before moving on to the more advanced optimization techniques, such as
indexes and partitions, both of which we talk about later on in the course.
Let's get started.

Module Contents

First though, let's go over the module contents. In this course we'll focus on
optimization to your SQL code. That is, things you can do solely to your SQL
code without using indexes and partitions. We start off looking at ways to
optimize your SQL code for non-joins. We first talk very briefly about
neatening up your SQL code. We'll then talk about using the select and word
clauses to restrict the columns and rows you bring back for the database.
Next, we talk about the INTERSECT operator versus an INNER JOIN. We then
move on to the MINUS operator versus a JOIN. Next, we talk about
correlated subqueries and then we talk about the IN Condition versus the
EXISTS Condition. We then compare the Multi-Column IN Condition versus a
series of ANDs and ORs. Next, we talk about the WITH clause and show how
using it can speed up your queries dramatically. We end this section with a
brief chat about the APPEND Hint. Next, we move onto optimizing SQL joins
again, without the use of indexes and partitions. We start off by discussing
how the order of the tables in the FROM clause can make a difference in
runtime when not using indexes. Next, we talk about ON clauses versus
WHERE clauses. We'll then talk about avoiding Cartesian products if possible
and we end with a summary.

Neaten Up SQL Code

Let's talk about optimizing Non-Joins. First, be sure to neaten up your SQL
code and I know you all know this. Be sure to indent your SQL code. No,
lining up your code does not make the SQL execute faster. With that said,
lining up your code will help you spend less time figuring out where the
subqueries are, what parenthesis goes with what other parenthesis, and so
on. If it takes you 10 minutes to understand a complicated piece of SQL
code that's not nice and neat, then you should add those 10 minutes to the
execution time of your query because you've just blown 10 minutes. I'm
saying that half-jokingly of course. For example, here's some SQL code and I
think you get my point.

Restrict Columns/Rows

Let's talk about how to restrict the number of columns and rows you pull
back from the database. Specifically, let's talk about the SELECT and WHERE
clauses. Use the SELECT clause to bring back only those columns you need
to perform your analysis. The more columns you bring back the more
temporary space your query will use, and because of this, it may perform
slower than if you had fewer columns. Avoid using the asterisk on the
SELECT clause. Yes, I know it's convenient, but if you hand your SQL code
over to another developer the asterisk won't give him or her any indication
as to the columns that are coming back from the tables. Besides, using the
asterisk will pull back all the columns from the table and that means more
time for your query to execute and more disk space taken up. Similar to the
SELECT clause, use the WHERE clause to limit the number of rows coming
back from the database. Again, I know you know this, but I've seen many
programmers over the years pull back much more data than they actually
need, just in case. Make sure that when comparing a constant, be that a
number, string or date, to a column with the constant matches, that the
data type matches as well for that column. For example, comparing a string
to a number will force an implicit conversion and that will slow down your
query. Be sure to avoid comparing a text string containing a date to a
column that is a date data type. In this case, you can use the DATE literal or
the two underscore DATE function to avoid the implicit conversion. Oracle
recommends trying to structure your WHERE clauses so that they use the
AND keyword, as well as Equijoins, or joins involving the equal sign. Later on
in the course we'll talk about how indexes and the WHERE clause go hand in
hand.

INTERSECT vs. INNER JOIN

Let's talk about INTERSECT versus an INNER JOIN. The INTERSECT keyword
is the same as the mathematical intersection we learned in grade school.
That is, given two sets of data, the intersection between the two sets
returns a unique list of common elements. As a more real world example, we
can use INTERSECT to determine a distinct list of say, RESPONDENT_IDs into

tables. INTERSECT usually outperforms the corresponding INNER JOIN. Going


back to our example, we could use an INNER JOIN between the two tables to
determine a distinct list of common RESPONDENT_IDs, but this is generally
slower than INTERSECT. Note that unlike an INNER JOIN, INTERSECT
performs its own DISTINCT without the need for the DISTINCT keyword. For
example, given two tables, RESPONDENTS_1 and RESPONDENTS_2,
containing thousands of respondent IDs, here is the INNER JOIN version of
the code to return a distinct list of common RESPONDENT_IDs. This query,
on my system, took about 2.5 seconds to execute. Now, here is the same
code, but using the INTERSECT keyword. Note that one complete query
appears above the INTERSECT operator and another complete query
appears below. The result set of both queries is intersected and a distinct list
of RESPONDENT_IDs is returned. This query completed in 0.207 seconds, or
about 1/10 of the time as the INNER JOIN. Another nice feature of the
INTERSECT is that you're not limited to two tables. All you have to do is add
another INTERSECT operator followed by another full query and so on. The
corresponding INNER JOIN code would require an additional table and a
modified ON clause that is much more complicated.

MINUS vs. JOIN

Let's talk now about the MINUS operator versus a JOIN. The MINUS operator
is the same as the set minus we learned in grade school, namely A\B or give
me everything in set A that does not appear in set B. Sometimes I become
confused with minus, so I like to think of it this way. Given two tables,
Current Month and Previous Six Months, if I want to find out which
respondent or patients or customers are new for the Current Month table, as
compared to the Previous Six Months table I use the MINUS keyword on
those two tables. MINUS generally performs better than the corresponding
join syntax. Similar to INTERSECT, MINUS performs its own DISTINCT. Let's
take a look at some SQL code. Here we're using a LEFT JOIN between two
tables, RESPONDENTS_2 and RESPONDENTS_3. RESPONDENTS_2 contains
all 2,000 RESPONDENT_IDs, whereas RESPONDENTS_3 is a subset of just the
first 100 IDs from 1-100. My goal here is to return a distinct list of
respondents that appear in RESPONDENTS_2, except for those that appear
on RESPONDENTS_3. As you see, this query executes in 0.238 seconds on
my system. Let's conclude our talk about MINUS versus a JOIN. This SQL
code actually uses the MINUS keyword between the two tables. As you can
see, compared with the code on the previous slide, this is a lot easier to look
at, and this code runs in 0.107 seconds, or about half the time of the SQL
code on the previous slide, so the takeaway of the previous two sections is
don't forget about the INTERSECT and MINUS operators. Everyone seems to
remember the UNION and UNION ARE operators, but tend to forget these
two very useful operators, INTERSECT and MINUS.

Correlated Subqueries

Let's talk briefly about correlated subqueries. In a correlated subquery a


subquery contains one or more columns from the outer query. This
necessitates the database perform loops as the columns from each row in
the outer query has to be fed into the subquery. Correlated subqueries
generally perform poorly, but there may be times when you can't avoid
them. Try to rewrite the query, if possible. For example, here's a correlated
subquery. As you can see, the subquery contains a mention of
A.RESPONDENT_ID, which is from the outer query. This code takes a little
more than 5 seconds to run on my system. Let's complete our talk about
correlated subqueries. Here is the rewritten version of the SQL code on the
previous slide. Take note that I have eliminated the correlated subquery in
favor of an INNER JOIN and the query executes at about 2.6 seconds, or
about half the time, so the takeaway from this is try to avoid correlated
subqueries if you can.

IN vs. EXISTS

Let's talk about the IN condition versus the EXISTS condition. Recall that the
IN condition is expecting a comma delimited list of values or subquery.
These values will be compared to the column appearing to the left of the IN
condition. The EXISTS condition, on the other hand, tests for the existence
of at least one row in the subquery and will return true the moment a row is
found. Note that the query within the EXISTS function needs to be correlated
with the outer query for it to be useful. In both cases, when there's a match
in the subquery the outer query uses that data. In general, if the inner query
is small as compared to the outer query, then try to use the IN condition. On
the other hand, if the inner query is large, as compared to the outer query,
then try to use the EXISTS condition. In one test I did I tried the four
permutations trying to compute the number of distinct female respondents
using the CANDYBAR_HISTORICAL_DATA, which is a large table, versus the
RESPONDENT_DIM table, which is a tiny table. As you can see in our chart,
our general rule actually doesn't work out. For the IN condition the blue
vertical bar is smaller than the red bar indicating that the query ran faster.
This means that the big outer query, indicated by the label BO, paired with a
small inner query, indicated by the label SI, ran faster. For the EXISTS
condition a small outer query with a big inner query ran faster.

Multi-Column IN vs. ANDs/ORs

Let's talk about the IN condition versus using ANDs and ORs. Many SQL
programmers use a series of ANDs and ORs in order to pull data based on
lists of values, but there is an Oracle specific syntax for the IN condition that
allows you to provide vectors and values. This is referred to as the multicolumn IN Condition and we'll show an example of this in just a moment.

Generally, the multi-column IN Condition performs slightly better than a


series of ANDs and ORs. For example, here's a typical SQL query using a
series of ANDs and ORs to pull a specific list of RESPONDENT_IDs,
CANDYBAR_IDs, and SURVEY_DATEs. As you can see, this code is
cumbersome and difficult to read. This query runs in just under 10 seconds
on my system. Let's seen an example of the multi-column IN Condition. As
you can see, our WHERE clause starts off with a comma delimited list of
columns and parentheses. This just lists all the columns we will be
subsetting by using the multi-column IN Condition. We next follow up with
the IN condition and its beginning LEFTPARAM. Next, we have a series of
comma delimited list of values, each in its own set of params. Each list is
followed by a comma except for the last list. We end the multi-column IN
Condition with a right parenthesis, ending the IN function. Now, this is read
like this, where RESPONDENT_ID equals 4 and CANDYBAR_ID equals 70 and
SURVEY_DATE equals 2013-03-12 or RESPONDENT_ID equals 6 and
CANDYBAR_ID equals 70 and SURVEY_DATE equals 2013-03-12 or and so on
and so on. This code runs in 9.632 seconds or just slowly faster than the
code on the previous slide. Despite not running that much faster, there is
another benefit of using the multi-column IN Condition, and that's how clean
this syntax appears on the page. Which would you rather look at, the code
showing on this slide or the ANDs and ORs code on the previous slide? For
me, it's the code on this slide.

The WITH Clause

In this section let's focus on the WITH clause, better known in the Oracle
world as the Subquery Factoring Clause. Now, for those of you who have
never seen this before, I discuss it in my Pluralsight lecture, Advanced SQL
Queries in Oracle and SQL Server. Please take the time to view that lecture
when you have a chance. It'll change your life. What are the benefits of
using the WITH clause? First, the WITH clause neatens up your code
especially if you have very large and complicated subqueries, as we shall
see in just a moment. Second, and more important, if you have the same
repeated SQL code in subqueries, the WITH clause allows Oracle to execute
the WITH clause once and use those results multiple times without
rerunning the subquery. If you do not put repeated subqueries in a WITH
clause, then Oracle may execute the subquery over and over again, burning
daylight. For example, here's a large query that does not use the WITH
clause. As you can see, the subquery pulling the male RESPONDENT_IDs,
displayed in bold font, is repeated several times. This query runs at a little
more than 15 minutes. Let's conclude our talk about the WITH clause. In this
SQL code I have pulled out the repeated subquery and placed it in a single
WITH clause called vmMALERESP. I've had access vmMALERESP where the
original query was, shown in red now. Believe it or not, this query runs in
about 44 seconds, as compared to 15 minutes for the SQL code on the
previous slide. I didn't actually believe this, so I shutdown and restarted my
Oracle instance and ran this code again, with a similar runtime, so the

takeaway from this, use the WITH clause if you have a lot of repeated
subqueries, especially large ones.

The APPEND Hint

Let's talk about the APPEND Hint to speed up INSERTs into a table. Now,
when the little rows from your table Oracle doesn't collapse the table to
squeeze out those blank rows, but when you insert new data into that table
Oracle will attempt to find those deleted rows and fill them in with the new
rows being inserted. As you can imagine, this does take time. Now, in order
to speed up INSERTs you can use the APPEND Hint, a database hint tells
Oracle to perform some action based on information it may not know, but
you do. For example, using the APPEND Hint as shown, after the INSERT
keyword, will prevent Oracle from searching for the deleted rows, but
instead just slap the new rows of data to the end of the table. Now, this may
look like an Oracle comment, but the addition of the plus sign after the first
asterisk indicates to Oracle that what's to follow is an Oracle hint. Make sure
though, that you spell this correctly otherwise, Oracle will ignore it without
so much as a worry. By using the APPEND Hint you may significantly reduce
your overall insertion times. For example, let's insert one million rows into
the table CANDYHIST_1. Here we are not using the APPEND Hint. This took
just under 29 seconds on my machine. Now, here is the same insert
statement, but I've added the APPEND Hint after the INSERT keyword. This
took just under 9 seconds on my machine. Note that I do suggest you use
the APPEND Hint even if the table has not had DELETEs performed there,
since I've seen insertion times greatly reduced in this case as well.

ON Clause vs. WHERE Clause

Let's talk about how to optimize SQL queries involving joins. In this section
let's talk briefly about the ON Clause versus the WHERE Clause. Now, there
is no significant reduction in runtime of one over the other, but please use
ON Clause to indicate join criteria and use the WHERE Clause to indicate
table subsetting criteria. Again, there is no significant runtime savings
between the two, but you will at least give Oracle a fighting chance to
optimize your query when you use the ON and WHERE clauses properly. For
example, here is an INNER JOIN in the old style specifying the JOIN criteria
within the WHERE clause. Note that the tables have a comma between them
instead of the keywords INNER JOIN or LEFT JOIN or RIGHT JOIN, and so on.
Here is the more modern version of the same query with the JOIN criteria
neatly specified on the ON Clause and the subsetting criteria neatly
specified on the WHERE Clause and. as I've mentioned before, if it takes you
longer to understand the old style query versus the more modern style, then
you should add that to your runtime because you just blew that extra time.

Order of Tables on FROM Clause

In this section let's focus on the order of the tables appearing on the FROM
Clause. The Oracle documentation recommends that you place tables with
the fewest rows first, followed by the next fewest, and so on when
performing JOINS. The number of rows though, is based on the subsetting
criteria being applied to it, and not just the total number of rows in each
table individually. Based on my test, placing the smaller table first on the
FROM Clause significantly reduces down the runtime. For example, here is a
simple CREATE TABLE statement joining the CANDYBAR_HISTORICAL_DATA
table against the table RESP_SUBSET, which contains 1,000
RESPONDENT_IDs and, as you see, this runs in just under 21 seconds on my
system. Here we have the same code, but the smaller table, RESP_SUBSET,
appears first in the FROM Clause and, as you can see, this runs in just under
11 seconds or about half the time. Again, these runtimes are without
indexes, just straight data in tables with the table containing the smaller
number of rows appearing to the left of the INNER JOIN keywords. Once we
start playing with indexes the order of the tables in the FROM Clause
shouldn't matter.

Cartesian Products

Let's talk about Cartesian products. When you use a Cartesian product you
are asking Oracle to produce all combinations of the rows in one table
against another table. As you can imagine, if both tables are large you will
produce a large amount of data, possibly taking up a lot of temporary work
space. Try to avoid Cartesian products if you can. Note that not all Cartesian
products are necessarily bad, but I've seen programmers accidently create
Cartesian products, especially when using the old JOIN syntax shown in the
previous slide. Again, that's just another reason to use the more modern
JOIN syntax. For example, here's a JOIN using the old syntax, but the word
clause specifying the JOIN condition was left off. In this case, all
combinations of CANDYBAR_HISTORICAL_DATA versus RESP_SUBSET will be
produced, well five million times one thousand or five billion rows of data.
Oops.

Summary

In summing up, what did we learn in this lecture? In the previous module we
focused on how changing the shape of your data can lead to big savings in
query runtimes. In this module we outlined a few changes to the SQL code
itself that can make small or large differences in query runtimes as well.
Simple changes, such as using the INTERSECT or MINUS operators instead of
JOINs, can lead to significant savings in runtime. We showed that deciding to
use IN versus EXISTS depends on the size of the data in the subquery. If the

inner query is smallest compared to the outer query, try the IN condition. If
the inner query is large, as compared to the outer query, try the EXISTS
condition. We saw some modest gains when using the Multi-Column IN
Condition versus the traditional ANDs and ORs construct, not only in terms
of runtime, but in readability. Using the WITH Clause, or Subquery Factoring
Clause, can significantly reduce runtimes if your query refers to the same
subquery repeatedly. Oracle may choose to store the results and just refer to
it the second time, third time, and so on rather than just blindly rerunning it
over and over and over again. We saw that adding the APPEND Hint to your
INSERT statement can significantly reduce INSERT runtimes. The order of the
tables on the FROM Clause, with the smaller table appearing first, can
possibly reduce down your runtimes. Note again, that this is if you're not
using any indexes on your tables. Finally, we talked about Cartesian
products and that you should avoid them, if you can, since they produce all
possible combinations of data from the tables and can yield a very large
result set. In the next module we finally introduce indexes, something
you've all been waiting for.

Overview of Indexes

Introduction

Hello and welcome back to the Pluralsight course Optimizing SQL Queries in
Oracle. My name is Scott Hecht. In the previous two modules we tried to
speed up our SQL queries by transforming our data into Third Normal Form,
as well as modifying the SQL code itself, and we did all of this avoiding
indexes and partitions. Now, that's about to change. In this module we do an
overview of the type of indexes available to the SQL programmer and how
they work. In the following modules we go into more detail about everything
introduced in this module so, let's get started.

Module Contents

First though, let's go over the module contents. First, we start off with an
explanation of what an index is. We then go over some of the varieties of
indexes available to you, namely, the B-Tree index, which is what I call the
747 of indexes, the Bitmap index, which is used when the number of distinct
values appearing within a column is small as compared to the total number
of rows in the table, the Function-based index, which is used if you're
applying a function to a column. We then talk about Index-Organized Tables,
which is a combination of table and index into one object. We'll then talk
about Bitmap Join Indexes used to perform the JOIN between tables upfront.
We'll then talk briefly about why gathering optimizer statistics is useful after
you've created an index. Although covered more fully in a later module, we

talk a little bit about how indexes and partitions interact. We then show you
a simple example involving indexes and then we end on a summary.

What is an Index?

What is an index? In this section we go over the basics of indexes. You can
think of an index as similar to the index in the back of a book. You do
remember what a book is? If you want to locate information quickly within a
book you can either scan every single page sequentially one after the other
to locate the desired information, which, as you can imagine, would take
some time or you can quickly look it up in the index and then flip directly to
the appropriate page, which is much much faster. Tell me more I can hear
you say. Indexes allow you to potentially avoid full table scans. That is, your
query avoids visiting every single row in your table. Indexes allow you to
jump to your desired row or rows just like the index at the back of the book.
Indexes may also help speed up joins between tables. Indexes can also
speed up COUNT(*) because Oracle will count the index entries rather than
the entire table. Since the index is slimmer than the entire table, this
process runs faster. Indexes could also help speed up your ORDER BYs, they
can also help speed up your GROUP BYs. Indexes are also useful with unique
constraints preventing you from accidentally inserting a duplicate row or
rows. All in all, indexes have the potential to make your queries run faster,
all that, and there are a variety of indexes to choose from. Now, I can hear
you say, indexes are great, let's use them on every column and every table,
yeah, that's the ticket. Well, that's not necessarily a good thing to do. Now,
there are programmers out there who go crazy when they first learn about
them resulting in index overload. Not every column or group of columns in
your table needs to be indexed. First, indexes take up space in the database.
The more indexes you create, the more space you take up. There are
instances where, if you create enough indexes the total space for the
indexes will exceed the total space for the tables. Second, indexes help if
you need to access a small portion of the data in a table. If you're constantly
accessing the entire contents of the table to perform say, some calculation,
then an index probably won't help you and Oracle will ignore it anyway.
Third, indexes and the WHERE Clause go hand in hand creating indexes on
all possible combinations of a columns of a table is unnecessary since, most
likely, you won't use those combinations of columns on a WHERE Clause. For
example, in our test data there is no need to create an index on the
combination of overall rating and taste rating because I'll never form a
WHERE Clause subsetting by both of those columns at the same time. This
is true for your tables as well. Analyzing the set of WHERE Clause
predocates can give you a very good indication of the indexes to create and
allows you to avoid creating unnecessary indexes. Fourth, when subsetting a
table Oracle will only consider using an index if about 15% or less of the
data will be returned. Anything more, Oracle may just scan the table
ignoring the index anyway. Fifth, changes to the data can be slow when
indexes are on the table than not. That is, INSERTs, UPDATEs, and DELETEs
can all be slow since, not only does the table have to be modified, but the

corresponding indexes as well. Finally, over the years hard drive disks have
decreased their access time significantly. For example, your laptop hard
drive probably spins at 5400 or 7200 RPMs, whereas the hard drives in the
data center at work spin at 15,000 RPMs. Recently, Solid-State drives have
come on the scene decreasing access times even more. As drives become
faster at accessing data the need for indexes to subset your tables
decreases and this is not hypothetical. I've been in companies where some
indexes were dropped because the combination of faster hard drives, along
with an appropriate partitioning scheme, allowed full table scans to occur
faster than when using an index. This doesn't mean though that indexes will
go the way of the dinosaur, since they have other attributes, as mentioned
on the previous slide.

B-Tree Indexes

Let's talk about some of the varieties of indexes available to you. In this
section let's talk about the B-Tree index. The B-Tree index is a general
purpose index that can be used in a variety of columns, whether they are
numeric, date or character. I call this type of index the 747 of indexes
because it is the most general workhorse index available. B-Tree indexes can
be thought of as an upside down representation of a tree whose leaf nodes
contain the row IDs, which point to the rows within the table itself. Let's see
what a B-Tree index looks like, at least conceptually, when placed on the
CANDYBAR_ID column. Recall that there are 250 different candy bars
available in the CANDYBAR_FACT table. Here's the top of our B-Tree, the
CANDYBAR_ID Index. Now, our first node represents CANDYBAR_ID number
1. Hanging off of this node is a series of leaves, each indicating a specific
row number or ROWID in the table. That is, these leaves list all those rows
associated with CANDYBAR_ID number 1. Moving on, here is the same thing
for CANDYBAR_ID number 2. As you can see, the leaves have different row
numbers associated with them because each row in the table can only be
associated with one candy bar, and so on. Here is CANDYBAR_ID number
249, and finally, CANDYBAR_ID number 250, the last candy bar. What
happens when your SQL query has where CANDYBAR_ID=1 in it? Assuming
that you'll be pulling back about 15% or less of the data in the table, based
on the WHERE Clause where CANDYBAR_ID=1, Oracle will consider using the
index to gather the row numbers associated with the CANDYBAR_ID number
1, and then quickly pull only those rows from the table. This can be much
much quicker than doing the full table scan. Let's continue our talk about
the B-Tree index. In this example let's see what a B-Tree index would look
like, conceptually, if we indexed both the RESPONDENT_ID and the
CANDYBAR_ID columns together. That is, as a composite index rather than
two separate indexes. Here we have the root of our composite index and for
RESPONDENT_ID number 1 we have all of the relevant CANDYBAR_IDs along
with the appropriate row numbers. For RESPONDENT_ID number 2 we have
all of the relevant CANDYBAR_IDs along with the appropriate row numbers,
and so on and so forth for all of the respondents. As you can see, the index
is small as compared to the entire table, so when searching for say,

RESPONDENT_ID=2 or CANDYBAR_ID=249, the database can spin through


the index much faster, gathering together the ROWIDs and then pulling the
data from the table.

Bitmap Indexes

Let's talk about some more varieties of indexes available to the SQL
programmer. In this section let's talk about the Bitmap index. A Bitmap
index is similar to a B-Tree index in functionality, except a Bitmap index is
stored differently than a B-Tree index. Conceptually, a Bitmap index uses a
continuous series of 1s and 0s where a 1 indicates row inclusion and the 0
indicates row exclusion with one bitmap for each value in the column being
indexed. Bitmap indexes are usually placed on a column whose total number
of distinct values is much less than the total number of rows on the table
itself. For example, RESPONDENT_GENDER has two distinct values, as
compared to the five million rows in our fact table. Two, is obviously much
less than five million. Bitmap indexes are frequently used in data
warehouses where the tables are not modified that often. As an example,
let's see how Oracle would, at least conceptually, create a Bitmap index on
the CANDYBAR_ID column. Recall that there are 250 distinct IDs in the
CANDYBAR_ID column, while the CANDYBAR_FACT table has 5 million rows,
so 250 is much less than 5 million, so you are justified in using a Bitmap
index. Here we have the top of the Bitmap index tree. For CANDYBAR_ID 7
we have a list of 0s and 1s, 5 million 0s and 1s to be exact. Now, wherever
you see a 1 it indicates that that particular row has CANDYBAR_ID set to 7.
Notice that the first two bits in the Bitmap are set to 1. This means that both
row 1, the first bit, and row 2, the second bit, have CANDYBAR_ID=7.
Continuing on, row 3 and row 4 are set to 0 indicating that those two rows
do not have CANDYBAR_I=7. Again, these 1s and 0s are indicators of which
rows are or are not a part of the CANDYBAR_ID=7 data. Continuing on, for a
CANDYBAR_ID that was 58 we have another series of 1s and 0s and again,
wherever there's a 1 it indicates that that row contains CANDYBAR_ID 58,
and so on all the way up to CANDYBAR_ID 250. Now, some of you may be
thinking that storing 5 million bits for each of the 250 candy bars will take
up a lot of space, but 5 million bits is 625,000 bytes, or about 610KB. For all
of those 250 CANDYBAR_IDs we'll add 149MB total and this gives you an
indication as to why Oracle recommends the total number of distinct column
values be much less than the total number of rows or the storage space will
be very large indeed. Finally, if your WHERE Clause contains several
columns being ANDed or ORed together, the use of bitmap indexes can
make the subsetting very fast, since Oracle can perform binary ANDs or
binary ORs between the bitmaps. This is very fast, as compared to a similar
maneuver when using B-Tree indexes. It's because of this that Oracle's
documentation states, "Bitmap indexes are primarily designed for data
warehousing or environments in which queries reference many columns in
an ad hoc fashion."

Function-Based Indexes

Let's talk about some more varieties of indexes available to the SQL
programmer. In this section let's talk about Function-Based indexes. Suppose
you went through a lot of trouble defining indexes on the column say,
CANDYBAR_NAME. If your SQL code frequently uses the UPPER function to
uppercase the CANDYBAR_NAME, the index will be skipped. This is due to
the fact that when the index is created the actual values are stored in the
index and not the uppercased values. The solution is to create an index that
takes into account the function that you will be using say, the UPPER
function. Now the index will be ignored when your SQL WHERE Clause asks
for WHERE UPPER CANDYBAR_NAME equals and the name of some candy
bar. Note that you could also apply an index to a more complicated
expression instead of just a single function. For example, 100 *
(TASTE_RATING/OVERALL_RATING) and in order for Oracle's Query Optimizer
to consider using the index you would have to use that exact function or
formula in your WHERE Clause, like so.

Index-Organized Tables

Let's talk about some more varieties of indexes available to the SQL
programmer. In this section let's talk about Index-Organized Tables. Recall
that for B-Tree and Bitmap indexes an entity, the index, is created separate
from the table itself and index-organized table, in contrast, combines both
the table data as well as the index itself into one object. An index-organized
table needs to have a primary key defined on it. In an indexed-organized
table the rows of data are stored within the index based on the primary key.
That is, unlike a B-Tree index, you are not storing a row ID pointing to
another table, you are storing the columns of data themselves. Indexorganized tables can be very useful with dimension tables like our
RESPONDENT_DIM and CANDYBAR_DIM tables, as well as hierarchical tables.
With that said, I've never seen index-organized tables actually used in
practice. Don't let that put you off though, we'll go over index-organized
tables in more detail later on in the course.

Bitmap Join Indexes

Let's talk about some more varieties of indexes available to the SQL
programmer. In this section let's talk about Bitmap Join indexes. Normally, a
Bitmap index is applied to one or more columns within a specific table. In
contrast, a Bitmap Join index allows you to create an index that pre-joins two
or more tables together. This allows for faster joins and, consequently, faster
queries. When the bitmap join index is coded you specify the join criteria
between the key columns of the tables. Just like for B-Tree and Bitmap
indexes, you still need to decide on one or more columns that will be used

on the WHERE Clause. For example, we can create a bitmap join index
between the CANDYBAR_FACT and RESPONDENT_DIM tables joining by the
RESPONDENT_ID, but our WHERE Clause will use the RESPONDENT_STATE
column in the RESPONDENT_DIM table as a subsetting criteria. As you see in
the graphic, for each state there is a bitmap associated with the
CANDYBAR_FACT tables RESPONDENT_ID, as well as a bitmap associated
with the RESPONDENT_DIMs RESPONDENT_ID. When requesting WHERE
RESPONDENT_STATE equals AL, Oracle can use both sets of row IDs without
having to join the tables when the query is executed. Since you are
affectively storing the resulting joins, queries that take advantage of bitmap
join indexes can be very fast.

Gathering Statistics

Let's talk about the importance of gathering statistics on your indexes. As


we've mentioned earlier, Oracle's Query Optimizer uses optimizer statistics
to decide the best execution plan for your SQL query. Let's talk more about
these statistics. After you go through all the trouble of creating indexes you
must gather statistics on the indexes otherwise, Oracle's Query Optimizer
won't consider them and you would have wasted your time and space in the
database. These statistics help Oracle create the best execution plan from
your SQL query, so if Oracle thinks an index is the way to go, Oracle will use
the index. If Oracle doesn't think an index is the way to go, Oracle will
ignore the index, but if Oracle has not optimizer statistics about the indexes
in your table, then Oracle won't be able to determine whether using an
index is the way to go or not. Once you gather stats on the indexes on your
table you should see a marked improvement in your SQL queries. Naturally,
this usually, although not always, goes back to the WHERE Clause. Now, in
order to gather statistics you use the procedures provided in the Oracle
DBMS_STATS package. We'll see an example of this in just a moment. Please
don't use the older ANALYZE command. It gathers inferior statistics as
compared to DBMS_STATS causing Oracle's optimizer to create a less than
optimal execution plan. Note that since version 10g, Oracle automatically
gathers statistics when you create or rebuild an index, but this is only true if
the table is not empty. Also, since version 10g, Oracle has an automated
task that runs nightly to gather updated optimizer statistics if your table has
changed significantly, rendering the current statistics stale. Since I don't
know what version of Oracle you're using, I will show you how to use the
DBMS_STATS package just in case and we'll show an example of how to
gather statistics later on in the module.

Index/Partition Interaction

Let's talk briefly about indexes and partitions. There is an interaction


between indexes and partitions. Partitions can be thought of as taking a
very large table and slicing it up into smaller tables. For example, we could

break up our CANDYBAR_FACT table into separate tables by say, survey


year, giving us 10 smaller tables, one for each year. If your queries pull data
from within a specific year, you would use only that specific table. In Oracle
partitions are similar to this, but are managed by the database, so you don't
actually have to break up the table into pieces although, you do have to
specify what these pieces are when you code your CREATE TABLE
statement. If you partition your table by say, survey year, you can still
create indexes on one or more of the columns. In one case, you can create
indexes within each partition. These indexes are called Local Partitioned
Indexes because they are local, or within, each individual partition. You can
think about this again, as if you had 10 separate tables, one for each year.
Each one of those tables could be given a separate index on it. Local
Partitioned Indexes are similar to that. On the other hand, when using
partitions you can create indexes that span across all the other partitions,
these are called Global Non-Partitioned Indexes. Finally, we can create
indexes that have a different partitioning scheme to that of the underlying
table. These types of indexes are called Global Partitioned Indexes. We talk
more about the interaction between indexes and partitions in the last
module in the course.

Example

Let's see a basic example of how to create indexes on two tables, gather
statistics on those tables, as well as see if the index makes a difference in
the runtimes of our SQL query. In this example we want to pull data from the
CANDYBAR_FACT table by inner joining against the small table containing
several RESPONDENT_IDs. Now, I ran this example three times, once with
only 10 RESPONDENT_IDs, again with 100, and then again with 1,000. Here
is the code to create a table called RESP_SUBSET that holds only the 10
RESPONDENT_IDs and I changed this for the run of 100 respondents and
then again, for the 1,000 respondents. Here is a simple INNER JOIN between
the CANDYBAR_FACT and the respondent subset table. Here are the results
of the three runs without any indexes. As you can see, the runtimes make
sense and range from over 2 seconds to over 5 seconds, but can we do
better? Now, in the code that follows we create indexes on the
RESPONDENT_ID column and gather statistics before running the INNER JOIN
code from the previous slide. First, let's create an index on the
RESPONDENT_ID column of the RESP_SUBSET table. You start with the
keywords CREATE INDEX followed by the name of the index. I normally name
my indexes with the characters IX_ followed by a shortened version of the
table name, followed by an underscore, and followed by a shortened version
of the column or columns I'm indexing. Here my index is named
IX_RESPSUBSET_RESPID. Then, you follow up with the keyword ON, followed
by the name of the table, a left perim, the name of the column you want to
index, followed by the closing param. Once this CREATE INDEX code runs,
your index has been created. Next, let's gather stats on the index. Since we
are running the procedure, GATHER_TABLE_STATS within the DBMS_STATS
package, we start off with the exact keyword followed by a blank. Next,

DBMS_STATS.GATHER_TABLE_STATS and a parentheses, the name of the


schema, called the owner here, the name of the table, RESP_SUBSET, and
followed by the percentage of the table you once scanned. When gathering
stats you can scan the entire table, a portion of it or you can let Oracle
decide by using the AUTO_SAMPLE_SIZE constant located within the
DBMS_STATS package. Note that attempting to gather stats on 100% of the
table could take quite a while if the table is very large. Oracle recommends
using the AUTO_SAMPLE_SIZE constant as shown. Once this executes, the
statistics for the table RESP_SUBSET are available for Oracle to use. Next,
we create an index on the RESPONDENT_ID of the CANDYBAR_FACT table
and gather statistics on the table CANDYBAR_FACT. Finally, we can run our
code. Again, note that I ran this with 10, 100, and 1,000 RESPONDENT_IDs in
the RESP_SUBSET table, taking note of the runtimes. Here are the results.
For comparison, I've repeated the results from the previous slide on the left.
On the right we have the results from the run with indexes. As you can see,
for 10 RESPONDENT_IDs we have gone from 2.5 seconds to 0.13 seconds, a
very nice improvement. The other two, 100 and 1,000 RESPONDENT_IDs,
show nice improvements there as well, but the runtimes seem to be coming
close the more RESPONDENT_IDs you use. This makes sense because you
are pulling back more data and Oracle's optimizer may have decided not to
use the indexes.

Summary

In summary, what did we just learn? In this module we introduced indexes


and talked briefly about the variety of indexes available to the SQL
programmer. Namely, we talked about the general purpose, B-Tree index,
the Bitmap index, and the Function-Based index. We also talked about
Index-Organized tables where the table is the index, and we talked about
pre-joining tables together using Bitmap Join indexes. We learned that if you
go through the trouble of creating an index, it's extremely important to use
an appropriate procedure from the DBMS_STATS package. In this module we
looked at the GATHER_TABLE_STATS procedure, used to gather statistics on
a single table, as well as discussed Oracle's automatic statistics gathering
functionality from Oracle 10g on forward. We also talked about the
interaction between indexes and partitions and again, we will come back to
this topic in a later module. Now we wound up the module with a simple
example involving the creation of indexes and gathering of statistics using
the GATHER_TABLE_STATS procedure. We saw that you can get a reduction
in the runtime of your SQL query when using indexes appropriately. Next, we
talk about how to use B-Tree indexes in more detail.

B-Tree Indexes

Introduction

Hello and welcome back to the Pluralsight course, Optimizing SQL Queries in
Oracle. My name is Scott Hecht and in this module I'd like to go over the BTree index in a little more detail than what was presented in the overview
module, so let's get started.

Module Contents

First though, let's go over the module contents. We'll talk about what a BTree Index is. We'll then talk about when you should use a B-Tree Index.
Next, we'll show you how to create a B-Tree Index on one or more columns.
We'll then show you how to list all the indexes on a table, as well as the
columns that are indexed. Next, in the event that you don't want the index
anymore, we'll show you how to delete the index. As mentioned in the
previous section, gathering statistics on your indexes is a vital step
otherwise, Oracle's Query Optimizer may not consider your indexes when
creating an execution plan. In this section we show you how to gather
statistics on the indexes on your table. With all of this talk about indexes,
how do you know Oracle is using the index when it runs your query? In this
section we give a high level overview of auto trace and expand plan, which
are two ways to determine if Oracle is using your indexes. Finally, we end
with a summary.

What is a B-Tree Index?

What is a B-Tree Index? Recall that we talked a little bit about B-Tree Indexes
in the overview. There, we said that the B-Tree index is a general purpose
index and can be placed on numeric, character, and date data types. This
makes the B-Tree index a very good general purpose index and is the index
available in most, if not all, the databases available out there, such as SQL
Server, SQLite, MySQL, and so on. Let's go over the CANDYBAR_ID column
example. Recall that a B-Tree index is like an inverted tree. Here we have a
root node indicating the top of our index on the CANDYBAR_ID column. Next,
all the ROWIDs for each value of the CANDYBAR_ID are gathered and placed
in the index. Why is this useful? Well, the amount of space used up by the
index, at least in this case, is much smaller than the total space taken up by
say, CANDYBAR_FACT, also, the row length is smaller since we are retaining
fewer pieces of information. Thus, the hard disk read/write armature can
spin through the index much faster than the entire table itself. Recall that in
our basic example in the previous module we placed an index on a single
column, the RESPONDENT_ID, but you're not limited to indexing a single
column. You can place an index on two or more columns. An index, on a
single column, is called a simple index, whereas an index on two or more
columns together is called a composite index. Remember that you should
look at your WHERE clauses in order to see which columns are being

referenced. If your WHERE Clause only references a single column, then a


simple index on that column should be considered. If your WHERE Clauses
reference two columns, then a composite index on those two columns
should be considered, and so on. Recall from the overview that we showed
what a composite B-Tree index looks like on the two columns,
RESPONDENT_ID and CANDYBAR_ID. As you see, each RESPONDENT_ID is
listed, as well as each CANDYBAR_ID within each RESPONDENT_ID, and the
leaves are of course, the ROWIDs associated with each specific
RESPONDENT_ID, CANDYBAR_ID pair.

When do you use a B-Tree Index?

Let's talk about when to use a B-Tree Index. Let's talk in a little bit more
detail about B-Tree Indexes, namely, when do you use them? All indexes are,
JOINs aside, twinned with your SQL WHERE Clauses. What does that mean?
When you create a table or tables you most likely know what SQL queries
you will be performing on these tables. What you should do is gather
together all of the word clauses you are likely to issue for the table and
determine if there are some common columns between them. For example,
we may need to subset our CANDYBAR_FACT table by say, the
LIKELIHOOD_PURCHASE column, in order to perform a data analysis on
those respondents who are most likely to purchase a particular candy bar in
the future. In this case, our WHERE Clause has indicated that we should
probably place an index on the column LIKELIHOOD_PURCHASE. Let's look a
little bit about indexes and the Primary Key constraint. Recall that we
created a Primary Key on the columns RESPONDENT_ID, CANDYBAR_ID, and
SURVEY_DATE, but when you specify that one or more columns are the
Primary Key within a table, Oracle will automatically create an index on that
column or those columns. You, in fact, don't have to create the index
separately using the CREATE INDEX syntax, which we show later on. Now,
you're probably saying that you will never create a SQL query that subsets
based on those three columns, but when an index is placed on the Primary
Key columns it prevents duplicated data from accidentally being inserted
into the table. Thus, the index is being used for more than just subsetting,
but for data integrity. Finally, the index placed on the Primary Key helps with
Joins involving the Primary Key columns. Recall that we have the
RESPONDENT_ID column in our CANDYBAR_FACT table, as well as the
RESPONDENT_DIM table, similar, for the CANDYBAR_ID column in both the
CANDYBAR_FACT and CANDYBAR_DIM tables, and finally, SURVEY_DATE, in
both the CANDYBAR_FACT table and the DATE_DIM table. The index on the
Primary Key columns in the CANDYBAR_FACT table help to quickly join the
fact table to the dimension tables, if that is part of your SQL query of course.
Indexes not only help speed up WHERE Clauses for subsetting, but they can
also help with Joins, as well as data integrity. Next, let's talk about
composite indexes. Recall that a composite index is an index created on two
or more columns and that a simple index is an index created for only one
column. Now, for composite indexes, the order of the columns within the
CREATE INDEX syntax matters. Now, as I mentioned, WHERE Clauses give

you a clue as to what indexes to create on what columns. If you create a


composite index on columns A, B, and C you don't need to then create a
composite index on A, B and you don't need to create a simple index on
column A. Oracle will still make use of the composite index on columns A, B,
and C when you reference columns A and B, in a WHERE Clause, or just
column A in a WHERE Clause. Note that columns A, as well as columns A
and B, are called leading columns. Now, Oracle's optimizer may also use the
composite index on columns A, B, and C when your WHERE clause
references B and C or just C. That is, Oracle may use the composite index on
columns other than the leading columns. I've heard them read over the
years a variety of opinions as to the order of the columns in a composite
index. Now, some gurus say that you should place the column with the
smallest number of distinct values first followed by the next most distinct,
and so on. Other gurus say that the order doesn't matter, and still, other
gurus say that it matters if your SQL query is accessing a range of values,
such as where OVERALL_RATING is greater than or equal to seven, and order
doesn't matter if you're accessing a specific value, such as where
OVERALL_RATING equals five. With all of that said, my take on this is that
you should order your columns in the index such that you cover the most
WHERE Clause predicates you can. That is, given the composite index on
columns A, B, and C I can get three WHERE Clause predicates out of it for
free, A, B, and C, as well as A, B, and A due to them being the leading
columns. Finally, B-Tree indexes are considered for use by the Oracle
Optimizer when used with the quality and range conditions. When using the
not equals or is not conditions B-Tree indexes won't be considered. To put it
another way, indexes are used to find data that's there and not data that
isn't there.

Creating a B-Tree Index

Let's talk about how to create an index. Let's take a closer look at the
CREATE INDEX Syntax specifically for the B-Tree index. Recall from our basic
example that we used a CREATE INDEX syntax to create an index on the
RESPONDENT_ID. This basic syntax creates a B-Tree index by default and
places the index itself in the default tablespace associated with your
schema. This tablespace is setup by the database administrator and was
associated with your schema when it was created. As we've seen, you
create an index by starting with the CREATE INDEX keywords. In the chart
you can also specify the UNIQUE or BITMAP keywords between CREATE and
INDEX. We talk about the BITMAP keyword in the next module. Now, you use
the UNIQUE keyword if the column or columns you are indexing are unique.
That is, a unique value, or set of values, appears on each row in the table for
the column or columns you are indexing. For example, we can specify the
UNIQUE keyword on the RESPONDENT_DIM table for the RESPONDENT_ID
column since each value of RESPONDENT_ID is unique for each row in that
table. The UNIQUE keyword will prevent you from inserting a duplicate
RESPONDENT_ID in the table. Without it, you'll be able to insert a duplicate
RESPONDENT_ID into that table, which is probably not what you want to

happen. Recall that if you specify the Primary Key constraint on


RESPONDENT_ID in the RESPONDENT_DIM table, then a unique B-Tree index
is created for you and you don't have to use the CREATE INDEX syntax.
Next, you follow up with the name of the index. As I've mentioned, I usually
name my indexes IX_ followed by an abbreviation of the table name,
followed by an underscore, followed by an abbreviation of the column or
columns I'm indexing. Note that you are limited to 30 characters for the
name of an index. Next, you follow with the ON keyword. After that you
follow with the table_index_clause shown on your screen. Here you provide
the name of the table you're going to index followed by a left parenthesis, a
list of one or more comma delimited columns to be indexed, followed by the
ending right _____paren, and a semi colon ending the CREATE INDEX syntax.
Now, on the next slide I would like to show you what the index_properties
look like. Let's finish looking at the syntax. Shown is the index_properties.
We talk about local and global partition indexes later on in the course, so
let's concentrate on the index_attributes part. Here are the index_attributes.
We'll concentrate on two attributes, the first is the tablespace attribute. By
default, the index is created in your own schema, as I have mentioned, but
as I have mentioned earlier, you can place your index in another tablespace,
if need be, by using the tablespace keyword followed by the name of the
tablespace. The next useful attribute we'll look at is the parallel_clause. As
you know, today's super-duper powerful computers can contain many
powerful processers. All this means is that more programs can be run in
parallel allowing for faster runtimes. This parallelism can be performed by
the CREATE INDEX command if the parallel_clause is included in the syntax.
By default, CREATE INDEX is not performed in parallel. As you can see in the
syntax, you can specify an integer after the parallel keyword. Either leave
this off or contact your database administrator for advice on this value. We'll
see an example of this in just a bit. Let's see some examples. At its simplest,
we can create a B-Tree index on a single column. Here we're creating an
index named INDEX_RESPDIM_RESPID on the RESPONDENT_ID column
within the RESPONDENT_DIM table. Take note that the column you are
indexing appears in the parentheses to the right of the table name. On the
other hand, since we know that the RESPONDENT_ID is unique across the
RESPONDENT_DIM table, we can include the UNIQUE keyword. Note that
since RESPONDENT_ID is the Primary Key you wouldn't necessarily do this.
Recall that the Primary Key constraint implies uniqueness, as well as nonnullness. Recall that we specified that the columns RESPONDENT_ID,
CANDYBAR_ID, and SURVEY_DATE make up the Primary Key on the table
CANDYBAR_FACT. If need be, we can create an index on those three
columns. As you see, we specify all three column names within the
parentheses, each separated by a comma, and on my system, this took
almost 27 seconds to run. Now, let's redo the previous example, but in this
case let's provide the PARELLEL keyword. As you can see, the runtimes went
down to about 16.5 seconds, which is a considerable savings on runtime.
Finally, if your database administrator has created a tablespace called say,
TBS_INDEXES, and has given you permission to use that tablespace, you can
store your indexes in this tablespace by specifying the tablespace keyword
followed by the name of the tablespace, TBS_INDEXES.

Listing the Indexes on a Table

During the course of creating tables and indexes you will occasionally forget
the name of the indexes you created or when starting a new job as a SQL
programmer you'd like to know the indexes that are on some of the tables
you'll be using. In this section we'll look at some of Oracle's data dictionary
views, which you can use to list your tables, indexes, as well as what
columns are indexed. Let's first remind you of the dictionary views,
ALL_TABLES and USERS_TABLES. The first is the view ALL_TABLES, which
lists all of the tables accessible to you. When using this view you normally
subset by selecting where the owner column is your schema name. On the
other hand, you can use the view, USER_TABLES and skip the owner
subsetting criteria. For example, here is a SQL query that lists all the tables
in my own SCOTT schema. As you see, I am subsetting WHERE OWNER=
SCOTT in tick marks, which will return my own tables. In this query I'm
pulling back the OWNER, TABLE_NAME, and TABLESPACE_NAME columns,
although there are many many more columns in both ALL_TABLES and
USER_TABLES. Here are the results. You will no doubt recognize those tables.
Equivalently, you can use the USER_TABLES view and skip the WHERE
clause. This SQL code returns the same list of tables as the SQL code above.
Let's talk about the dictionary views, ALL_TAB_COLUMNS and
USER_TAB_COLUMNS. Whereas ALL_TABLES just lists the tables, you can list
the names of the columns associated with those tables along with their data
type and nullability by querying ALL_TAB_COLUMNS or USER_TAB_COLUMNS.
For example, let's see the columns that make up my table CANDYBAR_FACT.
Note that we're sorting by COLUMN_ID in order to display the columns in the
order they appear on our CREATE_TABLE statement. As you see, you are
given the column name, the data type, the length of the data type, the
nullable column, which is Y if the column can contain NULLs, and otherwise.
Note that our Primary Key columns have nullable set to N, indicating that
these columns cannot contain NULLs. As stated, that criteria is part of the
Primary Key constraint. Here is an equivalent query using the
USER_TAB_COLUMNS view instead. On this slide let's concentrate on the
ALL_INDEXES and USER_INDEXES dictionary views. These views list all the
indexes accessible to you or just your own indexes. Note that you just get
back the name of the index and the table it's on, but no indication of the
column or columns involved in the creation of the index displayed. We'll look
into that on the next slide. In this SQL code we're pulling OWNER, SCOTTs
indexes, and asking for the columns OWNER, INDEX_NAME, TABLE_NAME,
and TABLESPACE_NAME. The results are as follows. As you can see, we have
an index called INDEX_CANDYBAR_FACT_RIDCIDSDT on the table
CANDYBAR_FACT, shown in red, among others. Here is the equivalent code
using the USER_INDEXES dictionary view and skipping the WHERE clause.
On this slide let's concentrate on ALL_IND_COLUMNS and
USER_IND_COLUMNS. These views are very helpful since they not only list all
the indexes on your tables, but also what columns were involved in the
creation of the indexes. In this SQL we're pulling OWNER, SCOTTs, data from
the ALL_IND_COLUMNS view and returning the COLUMNS, TABLE_NAME,
INDEX_NAME, and COLUMN_NAME. Take note that we're sorting by

TABLE_NAME, INDEX_NAME, and COLUMN_POSITION. It is the


COLUMN_POSITION column which allows you to determine the order of the
columns as they appear within the parentheses of the CREATE INDEX
syntax. Here are the results. For the CANDYBAR_FACT table we have the
index INDEX_CANDYBAR_FACTRIDCIDSDT, which was created by specifying
the columns RESPONDENT_ID, CANDYBAR_ID, and SURVEY_DATE in that
order, and are displayed across three lines in the output. Finally, here's the
equivalent code using the USER_IND_COLUMNS dictionary view.

Dropping a B-Tree Index

Let's talk about how to remove an index on a table. The syntax to drop an
index is DROP INDEX, followed by the name of the index, and ending with a
semicolon. Use this syntax to completely remove the index on the table.
Other indexes will remain on the table untouched. If you did an oops and
need the index again, you'll have to recreate it. There is no undelete
functionality. Don't forget that once you recreate the index you will have to
gather the statistics on it again, but recall that Oracle 10g on forward should
gather stats for you automatically if the table is not empty.

Gathering Statistics

Let's talk about gathering statistics on your tables. Recall that in the
overview we introduced the DBMS_STATS procedure, GATHER_TABLE_STATS,
and we showed an example of how to use it. In this section we'll go into a
little more detail on this particular procedure. Now, you can find the
information presented in this section in the Oracle Database manual
entitled, PL/SQL Packages and Types Reference. Remember to download the
PDF file of the version, as well as the release for the Oracle database you
are using. Now, there are several procedures available in the DBMS_STATS
package, most of which only a database administrator would use, but there
are two you should be aware of. GATHER_SCHEMA_STATS will gather
statistics on all the tables in specific schema. We won't go into this
procedure, but I just wanted you to be aware of it. The
GATHER_TABLE_STATS procedure gathers statistics on a specific table. For
the table itself, these procedures gather statistics such as, number of rows
on the average row length, for the columns within the table, the number of
distinct values, the number of NULLs, the minimum and maximum values,
and so on are also captured, and for the indexes on the table, the number of
distinct values, the number of levels on the index, and so on are also
captured by the stats gathering. Finally, additional statistics about your
systems IO and CPU are captured. Now, it's the combination of these
statistics that are used to determine the best execution plan for your SQL
query. Although we don't go into them here, your statistics are placed in the
dictionary views, ALL or USER _TAB_STATISTICS, ALL or USER
_TAB_COL_STATISTICS, and ALL or USER _IND_STATISTICS. Here is the syntax

for this procedure. Now, most of these parameters have reasonable


defaults, so you'll be ignoring most of them, and really only need to specify
the first two. Note that you must use the equal sign, greater than symbol
notation if you skip over one or more parameters listed. We showed this
notation in the examples earlier. Now, there are some parameters I do want
to point out though. ownname is the name of the schema where the table is
located. For my test database the owner is SCOTT. For you, it should be the
name you use when logging into the database. This entry, by the way,
should be in uppercase. tabname is the name of the table, in uppercase,
that you want to gather stats on, partname is the name of the partition you
want to gather stats on, estimate_percent is the percentage of the table you
want Oracle to spin through in order to gather statistics. As explained in the
overview, use the constant, DBMS_STATS.AUTO_SAMPLE_SIZE to allow
Oracle to decide how much of the table to spin through. Now, in Oracle
versions 9 and 10 the statistics gathered with AUTO_SAMPLE_SIZE were
suboptimal, but this has been improved upon in versions 11 and on and this
is probably the way to go. The degree parameter tells Oracle if you want to
gather the stats in PARELELL. By default, Oracle does not gather stats in
PARELLEL, but you can let Oracle decide by specifying the constant,
DBMS_STATS.DEFAULT_DEGREE. Again, you may want to have a
conversation with your database administrator about this particular
parameter. Finally, the cascade parameter determines if gathering statistics
is necessary for all of the indexes of the table. Prior to 10g this was set to
false, indicating no index stats are gathered, but from 10g on forward this
parameter is set to true by default. Using the procedure,
GATHER_TABLE_STATS, let's see some examples. First, let's issue a query
that finds the average OVERALL_RATING just for the RESPONDENT_ID
number 1. Note that the table CANDYBAR_HISTORICAL_DATA does not have
an index on the RESPONDENT_ID column and we'll have to spin through the
entire table. As you can see, this query took under 11 seconds to run on my
system. Now, let's create an index on the RESPONDENT_ID column and
gather statistics on the table and columns using the AUTO_SAMPLE_SIZE
constant. This will let Oracle decide how much of the table to spin through
to gather a reasonable set of statistics. As you see, the creation of the index
took 23ish seconds and the GATHER_TABLE_STATS procedure took about 35
seconds to run. Now, let's rerun the query at the top of the slide and, as you
can see, it now runs in 0.063 seconds, but can we do better? Let's continue
our example. This time let's remove the AUTO_SAMPLE_SIZE constant and
force in 100 to indicate that Oracle should scan the entire table, all 5 million
rows of it, in order to gather the best possible statistics. As you can see, this
procedure ran in 2 minutes and 18 seconds, which is considerably longer
than the 34 seconds for the AUTO_SAMPLE_SIZE. Reissuing that query now,
you can see it did run faster at 0.005 versus 0.063. Next, let's replace the
100 with a 1 to scan 1% of the table. The procedure ran in about 22 seconds
and our query runs worse than both previous queries at 0.209 seconds. The
takeaway, stick with the AUTO_SAMPLE_SIZE and, as I've mentioned many
times in the lecture, don't hesitate to talk to your database administrator for
advice.

Is my Index Being Used?

With all of this creating of indexes and gathering of statistics you think
Oracle would always use the index no matter what, but that's not the case.
Recall I said earlier that Oracle will consider using the indexes when you're
pulling about 15% or less of the data from a table. Anything more, Oracle
will most likely not use the index and scan the table. That's why I mentioned
that if your SQL queries always chug through the entire table, that is, no
subsetting is done, then there's no need for an index, except for maybe
JOINs and data integrity. In this section I want to show you two methods,
AUTOTRACE and EXPLAIN PLAN, you can use to determine if Oracle is using
indexes with your SQL query. Since this is not a course specifically on how to
use these two features, which will be a course on its own, we show you just
enough to help you determine if you might need an index or to determine if
your index is actually being used. First, before we use either of these
methods, you will need to ask your database administrator to create the
table, PLAN_TABLE in your schema. If you are the administrator, or are
following along at home, you can execute the CREATE TABLE statement in
the Oracle specific file, UTLXPLAN.SQL. Now, while both AUTOTRACE and
EXPLAIN PLAN get you similar output, AUTOTRACE executes the SQL query,
whereas EXPLAIN PLAN does not. This doesn't make too much difference if
your queries run fast, but if they take a long time to run, you may want to
stick with EXPLAIN PLAN, and I'll show both methods and you can decide. To
turn on AUTOTRACE enter, SET AUTOTRACE ON EXPLAIN and submit the
code, then run your SQL query. In the query shown I've decided to run a
query that will not use an index. In this case, I'm subsetting by
CANDYBAR_ID=123, but we did not place an index on CANDYBAR_ID, so
Oracle will have to spin through the entire CANDYBAR_HISTORICAL_DATA
table, and here is the output of AUTOTRACE. For beginners I usually stick to
the Operation and Name columns. In the Operation column I usually look for
the word FULL or, in this case, TABLE ACCESS FULL. This lets me know that
Oracle is going to perform a full table scan on the table appearing in the
Name column, CANDYBAR_HISTORICAL_DATA, in this case. This indicates to
me that there may be a need to add an index on the CANDYBAR_ID column
because it appears in the WHERE clause, as well as the predicate
information section below. Take note of the column Rows. As you see, Oracle
believes that it was pulling back 20,200 rows from the
CANDYBAR_HISTORICAL_DATA table, where CANDYBAR_ID=123. Now, the
actual value is 20,000 and not 20,200. It is because we are estimating the
statistics that Oracle believes this is the number or rows. This isn't a bad
thing, but take note that if your Rows column estimates are way off, you
may need to gather statistics again. Finally, when you're done with the
AUTOTRACE submit SET AUTOTRACE OFF and any subsequent query you
submit will not display the AUTOTRACE. On this slide, as well as the next,
let's look at EXPLAIN PLAN. Recall that AUTOTRACE actually executes the
SQL query. EXPLAIN PLAN, on the other hand, does not execute the query,
but gives you similar output to AUTOTRACE. Now, using EXPLAIN PLAN is a
multistep process. First, you enter the code EXPLAIN PLAN SET
STATEMENT_ID= and a name, and here I'm using TEST1 as the name. This is

used in the PLAN_TABLE and identifies the rows associated with each run of
the EXPLAIN PLAN. Next, we follow with the keywords FOR and your SQL
query. There is no output from this statement because EXPLAIN PLAN data is
actually inserted into the PLAN_TABLE. The next step is to issue this query,
which pulls out the plan information associated with your statement ID,
here, TEST1. The results are displayed on the next slide. Okay, let's see the
output from the PLAN_TABLE. Despite the formatting differences, this is
similar to the AUTOTRACE output. Under the Operation column I'm looking
for TABLE ACCESS, in this case, the TABLE ACCESS is by INDEX ROWID. To
the right is the name of the table being accessed and below is the name of
the index being used. This is in contrast to the previous example, which
indicated that Oracle was going to perform TABLE_ACCESS_FULL, or a full
table scan of the table, without the help of an index. Finally, unlike
AUTOTRACE, you will have to delete the relevant rows from the PLAN_TABLE.
Here I'm deleting rows WHERE STATEMENT_ID is TEST1. Note that there are
other methods of obtaining the EXPLAIN PLAN, so please check out the
Oracle manuals for more on this topic.

Summary

In summary, what did we just learn? We learned about the B-Tree index and
when it can be used. We saw how to create a B-Tree index using the CREATE
INDEX statement. We explored a variety of dictionary views, specifically,
ALL_TABLES, ALL_TAB_COLUMNS, ALL_INDEXES, ALL_IND_COLUMNS, as well
as the schema-specific versions. Although we did not look into them, please
peruse the Oracle documentation for the dictionary views, ALL_TAB_STATS,
ALL_TAB_COL_STATS, and ALL_IND_STATS. We looked into the DBMS_STATS
package, specifically the GATHER_TABLE_STATS procedure. Just as a note,
remember to transition over to DBMS_STATS if you are using the older
ANALYZE command, as GATHER_TABLE_STATS does a much better job and,
as I have mentioned before, from Oracle 10g on forward the CREATE INDEX
syntax gathers stats if the table is not empty and if your tables have
changed significantly, the automated Oracle task should update your stats
nightly, if your database administrator has kept this task running. Again,
please have a conversation with your database administrator. We
determined if an index is being used by looking at the output from both
AUTOTRACE and EXPLAIN PLAN and in the next module we look at the
Bitmap Index in more detail.

Bitmap Indexes

Introduction

Hello and welcome back to the Pluralsight course, Optimizing SQL Queries in
Oracle. My name is Scott Hecht and in this module I'd like to go over the

Bitmap Index in a little more detail than what was presented in the overview
module, so let's get started.

Module Contents

First though, let's go over the module contents. We'll talk about what a
Bitmap Index is. We'll then talk about when you should use a Bitmap index.
Next, we'll show you how to create a Bitmap Index on one more columns, as
well as show you several examples. Next, in the event that you don't want
the index anymore, we'll show you how to delete the index. In the previous
module we introduced AUTOTRACE and EXPLAIN PLAN and in this module
we'll take a look at the AUTOTRACE output for some queries involving
Bitmap Indexes. Finally, we end with a summary.

What is a Bitmap Index?

In this section we'll explain what a Bitmap Index is. You use Bitmap Indexes
when the column you want to index contains a small amount of distinct
values called the cardinality, as compared to the total number of rows in the
entire table. For example, the cardinality of the respondent gender column
is 2, M for male, F for female, and 2 of course, is much much less than 5
million, the total number of rows in our CANDYBAR_HISTORICAL_DATA table.
Recall for B-Tree indexes you can index a single column or multiple columns.
A single column index is called a simple index, whereas a multicolumn index
is called a composite index. The same is true for Bitmap indexes and the
syntax, which we'll see in a moment, is similar to B-Tree Indexes. As we saw
in the overview, a Bitmap Index is represented a bit differently from a B-Tree
Index. Let's take a look at a Bitmap Index for the CANDYBAR_ID column. As
you can see, for each individual CANDYBAR_ID value, there are 250 of them
ranging from 1-250, a Bitmap is created with each bit representing which
rows have or don't have that particular CANDYBAR_ID value. Internally,
Oracle doesn't store Bitmap Indexes quite like this. Now, let's look at how
Oracle handles two Bitmaps when used within one query. Recall from school
that you learned how to perform computations on binary numbers such as
ANDing two binary numbers together and ORing two binary numbers
together. Now, ANDing two binary numbers together is performed bitwise,
that is each bit in the first binary number is ANDed with the corresponding
bit in the second binary number, and so on. Similar for the OR operator. A
similar computation is performed between Bitmap Indexes on two or more
columns in a single SQL query. Again, let's take a look at the representation
of a Bitmap Index. In this graphic we are displaying the case where
CANDYBAR_ID is 7 and in this graphic we're displaying where respondent
gender is F. Both of these Bitmap Indexes, by the way, are on the table,
CANDYBAR_HISTORICAL_DATA. Now, given the following WHERE Clause
predicate, where CANDYBAR_ID is 7 and RESPONDENT_GENDER=Function, if
Oracle's Optimizer decides to use both Bitmap Indexes the AND operator will

cause Oracle to perform a binary AND between the two Bitmaps. Here is the
result after I ANDed the two Bitmaps, shown above, together. Now,
anywhere you see 1 indicates a row where CANDYBAR_ID is 7 and
RESPONDENT_GENDER is F. This type of binary computation can be
extremely fast. We show a more detailed example of this, as well as the
AUTOTRACE, later on in the module.

When do you use a Bitmap Index?

Let's go over when you should use a Bitmap Index. Bitmap Indexes are
generally used in Data Warehouses where you are creating reports or
analyzing the data. If you are building an online transaction processing
database, or OLTP database, then Bitmap Indexes may not be right for you
because this type of database tends to change very frequently. Bitmap
Indexes are useful if your SQL programmers are performing a lot of Ad-Hoc
queries on the database. Generally, this involves a WHERE Clause
containing subsetting criteria on more than one column with ANDs and ORs
between them, as shown on the previous slide. The AND, as well as OR
operators on columns that have Bitmap Indexes on them, have binary ANDs
and ORs performed between the Bitmaps. All of this occurs before the table
is even accessed by the query. Because Oracle can quickly combine several
Bitmaps together, Oracle state that it's usually best to create Bitmap
Indexes on individual columns rather than composite columns. You should
consider using Bitmap Indexes when the degree of Cardinality of a column is
low, or modestly low, as compared to the total number of rows in the table.
For example, the number of distinct values in the CANDYBAR_ID column is
250. The number of values in the RESPONDENT_GENDER column is 2 and so
on. In fact, in one Oracle document it mentions that a column with 10,000
distinct values is a candidate, even for a table containing 1 million rows or
1% of the rows. With that said, you should not use Bitmap Indexes on
columns that are distinct or have a very large number of distinct values.
Instead, consider using a B-Tree index. One great thing about Bitmap
Indexes is that NULL values appearing within a column are part of the index.
This is not true for B-Tree indexes. If you have a WHERE Clause predicate
using the IS NULL comparison condition, a Bitmap Index on that column
could be considered. Contrast this with a B-Tree index, which won't consider
the index if the IS NULL condition is specified. Another benefit of Bitmap
Indexes is that they tend to take up much less space than a corresponding
B-Tree index.

Creating a Bitmap Index

Now, let's take a closer look at the CREATE BITMAP INDEX Syntax. As you
can see, the Syntax chart is the same as for B-Tree indexes, but in this case
you will add the keyword BITMAP between the keywords CREATE and INDEX.
The rest is the same. Here you provide the name of the table, as well as the

column or columns in parentheses. On the next slide we talk about the index
properties. Let's look at the rest of the CREATE BITMAP INDEX Syntax. Again,
the rest of the syntax is the same as for B-Tree indexes and the keywords,
TABLESPACE and PARELLEL are available for Bitmap Indexes as they are for
B-Tree indexes. On this slide, let's see some examples on how to create
Bitmap Indexes. To create a Bitmap Index, the code is similar to how we
created a B-Tree index, but you include the keyword BITMAP between the
keywords CREATE and INDEX, as shown on your screen. Note that this code
does not work because you are not allowed to create a unique Bitmap Index.
If you need an index on a column that is unique, then use a B-Tree index as
outlined in the previous module. Here is an example of how to create a
composite Bitmap Index here on the columns RESPONDENT_ID,
CANDYBAR_ID, and SURVEY_DATE. This code took about 54 seconds to
complete, but please heed Oracle's comment mentioned earlier, that
because Bitmaps can be combined quickly it's usually best to create simple
indexes rather than composite indexes. Here's the same code as above, but
with the PARELLEL keywords added, so that the index is created in
PARELLEL. This code, in contrast, took about 25 seconds to complete.
Finally, here is an example of how to use the TABLESPACE keyword when
creating a Bitmap Index, if that's something you feel you need to do.

Dropping a Bitmap Index

Let's talk about how to remove a Bitmap Index on a column. The syntax to
drop an index is DROP INDEX, followed by the name of the index, and
ending with a semicolon. Note that there is no DROP BITMAP INDEX
statement, just DROP INDEX. If you did an oops and need the index again,
you'll have to recreate it. There is no un-delete functionality. Depending on
the version of the database you are working with, you may need to gather
the statistics on your index.

Is my Index Being Used?

Now, let's see how we can determine if a Bitmap Index is being used. As
we've seen before, you use AUTOTRACE or EXPLAIN PLAN to determine if
your SQL code is picking up your index. This is the same for Bitmap Indexes
as it is for B-Tree indexes. I won't go over it again, but I do want to show you
what the execution plan looks like when a Bitmap Index is picked up. First,
here is the SQL code. Note that this SQL code should pick up the Bitmap
Indexes created on the CANDYBAR_ID column shown earlier and here is the
output, trimmed down a bit. As you can see, the telltale words, TABLE
ACCESS BY INDEX ROWID, indicates that an index is being used on the table
indicated to the right, CANDYBAR_HISTORICAL_DATA, but, which index is
being used? Looking at the last bold line to the right of the text, BITMAP
INDEX SINGLE VALUE, indicates that the Bitmap Index,
BMIX_CANDYHISTDATA_CANDID is being used. Note that if no index is picked

up, bitmap or otherwise, you will see TABLE ACCESS FULL instead, which is a
dead giveaway that an index is not being used at all. Finally, the line,
BITMAP CONVERSION TO ROWIDS, indicates that Oracle is translating the
bits set to 1 in the bitmap to their corresponding ROWIDS, so that those
particular rows can be accessed directly from the table. On this slide let's
take a look at another AUTOTRACE example. Now, recall in the section
entitled, what is a Bitmap Index, earlier in this module, I showed an example
of Oracle binary ANDing two Bitmap Indexes given the WHERE Clause,
WHERE CANDYBAR_ID=7 and RESPONDENT_GENDER=F. I'd like to show you
what the execution plan looks like when Oracle uses two Bitmap Indexes,
one on CANDYBAR_ID and the other on RESPONDENT_GENDER. As shown, I
create one Bitmap Index on the CANDYBAR_ID column and another on the
RESPONDENT_GENDER column. Now, this SQL code contains the WHERE
Clause predicate, WHERE CANDYBAR_ID=7 AND RESPONDENT_GENDER=F,
so what does the execution plan look like in this case? As you can see in the
output, Oracle is performing a TABLE ACCESS BY INDEX ROWID on the
CANDYBAR_HISTORICAL_DATA table, as shown on line 2, but on lines 5 and 6
you'll note that both indexes are being picked up. So far, so good. Next, if
you look at line 4 you will see that Oracle's performing the binary AND
between the two Bitmap Indexes, as indicated by the words, BITMAP AND.
Finally, the conversion from bits to ROWIDS is performed. Note that this
query ran in a little more than 4 seconds, but the corresponding query using
B-Tree indexes ran in about 8 seconds, giving us about 50% savings.

Summary

In summary, what did we just learn? We learned about the Bitmap Index and
when it can be used. We saw how to create a Bitmap Index using the
CREATE BITMAP INDEX statement. We saw an execution plan that used a
Bitmap Index and saw what that looks like. In the next module we look at
additional index types.

Additional Index Types

Introduction

Hello and welcome back to the Pluralsight course, Optimizing SQL Queries in
Oracle. My name is Scott Hecht and in this module I'd like to go over several
additional index types in a little more detail than what was presented in the
overview module. Specifically, we'll talk about Function-Based Indexes,
Bitmap Join Indexes, and finally, Index-Organized Tables so, let's get started.

Module Contents

First though, let's go over the module contents. We'll start off by talking
about Function-Based Indexes, what they are, and how to create them. We'll
then move on to the Bitmap Join Indexes, and we'll end the module
discussing Index-Organized Tables, and finally, we end with the summary.

Function-Based Indexes

Let's talk first about Function-Based Indexes. What is a Function-Based


Index? Recall that when you create an index on a column you place the
name of the column within the parentheses on the CREATE INDEX or CREATE
BITMAP index syntax, then, in your SQL, when you use that column, Oracle's
Optimizer decides whether to use that index or not, but if you apply a
function to the column, Oracle's Optimizer does not consider the index at all.
Function-Based Indexes allow you to create an index on a function applied to
a column, as well as an expression involving one or more columns. This will
allow Oracle's Optimizer to consider using the index rather than just
dismissing it outright. In order for the index to be considered though, the
function or expression must appear as defined in the CREATE INDEX or
CREATE BITMAP index syntax. For example, in the RESPONDENT_DIM table
the column RESPONDENT_NAME is not capitalized, but in proper case. Thus,
while the following WHERE Clause will find the appropriate row, the index on
RESPONDENT_NAME will be ignored because you applied the function UPPER
to it, but we can fix that using our old friend, the CREATE INDEX Syntax. You
just replace the column name in the parentheses with the function or
expression involving that column or columns. Let's AUTOTRACE a pull from
the RESPONDENT_DIM table using a specific name. At this point, there is no
Function-Based Index on the RESPONDENT_NAME column. Note that the
function UPPER is being applied to the column RESPONDENT_NAME. As you
can see, and probably what you expected, Oracle performs a full table scan.
Continuing our example from the previous slide, let's create an index on the
RESPONDENT_NAME, this will not be a Function-Based Index though. Here is
the CREATE INDEX statement you're all familiar with by now. Here is the
same SQL from the previous slide using the UPPER function around the
RESPONDENT_NAME and again, because we use the UPPER function the
index is not used, as you can see in the execution plan. This is because we
did not use a Function-Based Index, just a run of the mill index. Okay, let's
continue our example. Next, let's create a Function-Based Index using the
UPPER function applied to the RESPONDENT_NAME. Here is the CREATE
INDEX statement. As you can see, all you have to do to create a FunctionBased Index is to add the function around the name of the column, UPPER,
left paren, RESPONDENT_NAME, right paren. Here is the same SQL from the
previous slide using the UPPER function around the RESPONDENT_NAME.
Note that the syntax shown on the WHERE Clause matches that of the index
and, in this case, the execution plan shows that the table,
RESPONDENT_DIM, is being accessed by the Function-Based Index,
INDEX_RESPDIM_RESPNAME. Let's conclude our example. Let's see an

example of Function-Based Indexes using an expression involving two


columns. Here is the CREATE INDEX Syntax and you'll notice that instead of
a function we're providing a mathematical formula involving two columns,
TASTE_RATING and OVERALL_RATING, and here is the SQL I used. Note that
the syntax shown on the WHERE Clause matches out of the index and you'll
notice that our index is being used. On this slide let's look at the syntax to
create a Function-Based Index, as well as drop it. As mentioned before, you
can use the standard CREATE INDEX or CREATE BITMAP INDEX Syntax to
create a Function-Based Index and as shown, here is the standard syntax
we've seen before, CREATE INDEX or CREATE BITMAP INDEX, followed by the
name of the index, the keyword ON, and the name of the table. Then, within
the parentheses, you specify a function, surrounding column or a formula
involving one or more columns. Although not shown here, you can still use
the PARELLEL and TABLESPACE keywords if need be. The column within the
parentheses is replaced by either a function surrounding the column or an
expression involving one or more columns, and finally, you can drop a
Function-Based Index using DROP INDEX, followed by the name of the index.

Bitmap Join Indexes

In this section let's talk about Bitmap Join Indexes. What is a Bitmap Join
Index? A Bitmap Join Index is used to create an index between two tables
based on a JOIN criteria. This effectively pre-joins the two or more tables
together upfront when the index is created, as opposed to joining the two
tables together when your SQL is executed. By computing and storing the
JOIN ahead of time, the JOIN can be avoided at execution time, but Bitmap
Join Indexes may take quite a while to build, so please take that into account
if you plan on using them. Now, the creation of the Bitmap Join Index usually
involves a JOIN between a fact table and one or more dimension tables. In
our examples below we use the CANDYBAR_FACT table and the
RESPONDENT_DIM dimension tables. Recall that when I've described B-Tree
and Bitmap Indexes, the CREATE INDEX or CREATE BITMAP INDEX code
involves not only the name of the table, but the name of one or more
columns that will appear in the WHERE Clause. This is true as well for
Bitmap Join Indexes. Recall from the overview that I showed a graphical
representation of a Bitmap Join Index using the RESPONDENT_STATE as the
column being indexed. In our examples below I will also use the
RESPONDENT_STATE column. One important caveat before your create a
Bitmap Join Index is that your dimension table, or tables, must have a
Primary Key constraint on the column used to join the fact and dimension
tables together. In our examples I will be joining the RESPONDENT_ID
column between the CANDYBAR_FACT and RESPONDENT_DIM tables. You
can either create your dimension table upfront with a Primary Key constraint
or you can use the ALTER TABLE command to do the same thing. Since
we've already discussed the Primary Key constraint on the CREATE TABLE
statement earlier in the course, I'll show you how to use the ALTER TALBE
syntax in just a few minutes. Now, let's first assume that we have no
indexes and no constraints on the CANDYBAR_FACT and RESPONDENT_DIM

tables. Let's submit this query joining the two tables together while
simultaneously subsetting by the RESPONDENT_STATE column for
Connecticut and Washington State. As you probably guessed, both tables
will be accessed by a full table scan, as shown in the bold font in the
execution plan, and this query takes just under 3 seconds to run on my
system. On this slide let's redo the example on the previous slide, but let's
add the appropriate indexes. Note that we are not creating a Bitmap Index
yet. In the code shown we've created an index on the RESPONDENT_ID
column on both the CANDYBAR_FACT and RESPONDENT_DIM tables. Since
we know that we are also going to subset the RESPONDENT_DIM table by
RESPONDENT_STATE, let's add an index on that column as well, shown on
the third line. Again, here is our query. Note that I'm joining by
RESPONDENT_ID and subsetting by RESPONDENT_STATE. Here is the
execution plan. It's a little complicated, but note that table access is BY
INDEX ROWID for both the CANDYBAR_FACT and RESPONDENT_DIM tables,
rather than full table scans. Note also, that the index
IX_CANDYFACT_RESPID, the index on the RESPONDENT_ID in the
CANDYBAR_FACT table is used, as well as IX_RESPDIM_RESPSTATE, the index
on the RESPONDENT_STATE in the RESPONDENT_DIM table. This query took
only 1.935 seconds to execute, which is about 1 second faster than the
query on the previous slide. On this slide let's create a Bitmap Join Index
between the CANDYBAR_FACT and RESPONDENT_DIM tables. First, based on
the caveats for the Bitmap Join Index, we must have a Primary Key on the
dimension table. Here I'm using the ALTER TABLE syntax instead of the
CREATE TABLE syntax, since the table, RESPONDENT_DIM, already exists. In
the code shown I just want to add a Primary Key constraint to the
RESPONDENT_ID, so start off with the keywords, ALTER TABLE, followed by
the name of the table, RESPONDENT_DIM here. Next, follow with the
keywords ADD CONSTRAINT, followed by a name for the Primary Key, here
I've used PK_RESPDIM_RESPID. Next, follow up with the keywords Primary
Key, then the column or columns in parentheses that will serve as the
Primary Key, here I'm providing the column, RESPONDENT_ID, and end with
a semicolon. Execute this code to force the RESPONDENT_ID to be the
Primary Key of the RESPONDENT_DIM table. Next, since I know that I will be
subsetting by RESPONDENT_STATE, I also create an index on this column, as
shown on the previous slide, as well as here. Next, let's create the Bitmap
Join Index, which will link the CANDYBAR_FACT table and the
RESPONDENT_DIM table based on the column RESPONDENT_ID. Start off
with CREATE BITMAP INDEX, followed by the name of the index. Here, it's
BMJI, for Bitmap Join Index, _CANDY_FACT_RESPDIM. Next, follow up with the
keyword ON, as usual, and the name of the fact table, here,
CANDYBAR_FACT. Here is where the syntax veers off a bit. In parens list one
or more comma delimited columns that will be found in your WHERE Clause.
Here, I'm specifying the RESPONDENT_STATE. Now, you may find this
unusual, since the RESPONDENT_STATE column is not in the
CANDYBAR_FACT table. True, but it is in the RESPONDENT_DIM table, which
we are joining to the CANDYBAR_FACT table. Take note of the alias B. Next,
we follow up with a FROM clause indicating which two tables are to be joined
together. Here we're joining the fact table CANDYBAR_FACT alias to A, and
the dimension table, RESPONDENT_DIM, alias to B. Next, we provide the

JOIN criteria where A.RESPONDENT_ID=B.RESPONDENT_ID. You'll note that


the Bitmap Join Index uses the older notation for joining two tables, commas
instead of INNER JOIN syntax. Once executed, our Bitmap Join Index prejoins the two tables together by RESPONDENT_ID and allowing you to subset
quickly by the RESPONDENT_STATE. Here is the execution plan for our query.
As you can see, showing in bold, our Bitmap Join Index is being used and our
query ran in just under 1 second, or about 1/3 to 1/2 of the time as the
previous slides. Let's see how to create and drop Bitmap Join Indexes. Note
that even though we're creating a Bitmap Join Index we still use the same
code as before, CREATE BITMAP INDEX, but with the addition of the JOIN and
a column or columns from the dimension table. In general, to create a
Bitmap Join Index, code CREATE BITMAP INDEX, followed by the name of the
index, for me this begins with BMJI_. Next, follow up with the ON clause
naming the fact table, as well as one or more columns from the dimension
tables in parentheses. Remember that the columns in parentheses are to be
specified in your WHERE Clauses in order to capture the Bitmap Join Index.
Next, follow up with the FROM keyword, followed by the name of your fact
and dimension table. Here I'm aliasing the fact table to A and the dimension
table to B. Finally, provide a WHERE clause used to join these two tables
together. Again, in the ON clause you specify the name of the fact table, but
in parentheses you specify one or more comma delimited lists of dimension
table columns. Note that you can include more than one dimension table,
although, we only show one in the syntax below. Finally, to drop the index,
as usual, it's DROP INDEX, followed by the name of the index.

Index-Organized Tables

Let's talk about Index-Organized tables. What is an Index-Organized Table?


Recall that normally an index is stored separately from its associated table.
Also, an index stores ROWIDs associated with its corresponding table. When
an index is used it's the ROWIDs that are used to pull the data from the
table itself. Now, an Index-Organized Table stores the table data in a
modified B-Tree format. Recall that the first index we were exposed to was
the B-Tree index. This means that an Index-Organized Table is, in fact, its
own index and table combined. One requirement is that an Index-Organized
Table must have a primary key constraint on it. With an Index-Organized
Table data access is faster since the table and the index are one in the
same. Index-Organized Tables allow for faster primary key access when your
SQL query involves exact column matching or range searching. Although we
talk about partitioning later on in the course, please note that IndexOrganized Tables can be partitioned just like regular tables, which are also
called, by the way, heap organize tables. Since and Index-Organized Table is
its own index and table, overall storage space will be reduced. On this slide
let's see how easy it is to create a basic Index-Organized Table. As you see,
I'm issuing a CREATE TABLE statement ensuring that I have an appropriate
primary key constraint. Here, the RESPONDENT_ID is being used as the
primary key. Finally, to tell Oracle I want this table to be an Index-Organized
Table, I end the CREATE TABLE statement with the keywords, ORGANIZATION

INDEX. In order to add data to this table I issue an INSERT statement from
my original RESPONDENT_DIM table. Note that you can add additional
indexes on an Index-Organized Table and here I added an index on the
RESPONDENT_STATE column. Recall, I mentioned that in the later versions
Oracle will gather statistics when you create an index only if there is data in
the table. Since our Index-Organized Table contains no data and we insert
data afterwards, you should probably run DMBS_STATS to gather the stats
on the table. Let's see an example. Here I'm computing the average
OVERALL_RATING by RESPONDENT_STATE for the state of Connecticut. Take
note that I'm using RESPONDENT_DIM_IOT, our Index-Organized Table now.
As you see, the RESPONDENT_STATE index is being used. This query ran in
0.712 seconds, as compared to 1.19 seconds when using the
RESPONDENT_DIM table instead of the RESPONDENT_DIM_IOT IndexOrganized Table. As a comparison, a full table scan between the
CANDYBAR_FACT and RESPONDENT_DIM tables runs in a little over 3
seconds on my machine. As you can see, there is some runtime savings
here. Let's talk about the OVERFLOW keyword for Index-Organized Tables.
Recall I mentioned towards the beginning of the course that thinner tables
are better, since the row length is smaller and the disks read/write armature
has less to scan over enabling the database to move from row- to-row much
faster. When creating an Index-Organized Table you can specify the syntax
INCLUDINGcolumn_nameOVERFLOW. The column_name indicates that itself,
as well as all the column names prior to it, are to be included in the IndexOrganized Table itself. The remaining columns following column_name are
moved to an overflow area. Now, the Index-Organized Table can be much
thinner than the full table, allowing for faster queries and we'll show an
example of this in just one moment. Finally, there is no difference in your
SQL queries, just in the CREATE TABLE statement, as we'll see next. Here is
our CREATE TABLE statement specifying that the table is an Index-Organized
Table. Take note that I have rearranged the columns in this table so that I
have the primary key RESPONDENT_ID first, the RESPONDENT_STATE
second, followed by the remaining columns. At the bottom of the CREATE
TABLE statement you'll also see that I've included the keywords INCLUDING
RESPONDENT_STATE OVERFLOW. This tells Oracle that any columns, up to
and including RESPONDENT_STATE, are to be in the Index-Organized Table
proper, but the remaining columns, RESPONDENT_NAME,
RESPONDENT_ADDR, and so on, are to be placed in an Oracle generator
overflow area. What this means is that the RESPONDENT_DIM_IOT table
contains only two columns and is very thin. The remaining columns are still
accessible even though they have been placed in the overflow area. You
don't actually see the overflow area, by the way, but it's there and access to
the remaining columns is transparent to you. Now, when running the code
on the previous slide our runtime is now 0.443 seconds, as compared to
0.172 seconds.

Summary

In summary, what did we just learn? In this module we learned about


Function-Based Indexes and how they can be used in a function or on an
expression involving one or more columns. Using the Function-Based Index
allows Oracle to consider using the index rather than just dismissing it
wholesale. We also learned about Bitmap Join Indexes, which allows us to
pre-join two or more tables together allowing for faster joins at runtime, but
remember that you still need to choose one or more columns to index.
Recall that we indexed the RESPONDENT_STATE column. Finally, we
discussed Index-Organized Tables. Remember, an Index-Organized Table is
the index and the index is the table. It sounds very poetic. We briefly discuss
the OVERFLOW keyword allowing you to separate out the less used columns
to an OVERFLOW table allowing for faster runtimes because Oracle has to
scan fewer columns. In the next module we present an overview of
partitions.

Overview of Partitions

Introduction

Hello and welcome back to the Pluralsight course, Optimizing SQL Queries in
Oracle. My name is Scott Hecht and in this module I'd like to present an
overview of Oracle partitions, so let's get started.

Module Contents

First though, let's go over the module contents. We start off taking about
what a partition is, why use it, and where partitions are stored. We then talk
about the variety of partitions available, such as simple partitions, such as
the range and list partitions. Another partitioning option available is hash
partitioning, but we don't talk about that is this lecture, so please see the
Oracle manuals for more on this option, and composite partitions, such as
the range partition with a list sub-partition, and so on. We talk briefly about
Partition-Wise Joins, which can make joining tables with our without a similar
partitioning scheme much faster. Although covered in more detail in the last
module of the course, we talk briefly about the interaction between indexes
and partitions. We show a simple example and we end with a summary.

What is a Partition?

What is a partition? On this slide let's go over some basic facts about
partitions. Recall that I mentioned smaller tables usually result in faster SQL
execution times. Smaller can mean both the number of columns or the

number of rows or both. When dealing with partitions it's the large number
of row we're more concerned with. Given a very large table that defies fast
SQL execution times, one idea you may have is to slice and dice the large
table into several smaller tables. For example, break apart your large table
into one table for each SURVEY_YEAR say. You may have several tables, but
they will be much smaller. The problem with this is that when you write your
SQL code you'll have to UNION several SQL queries together, one for each
year you're trying to pull back. As you can probably imagine, this can be a
management nightmare. Now, partitioning an Oracle is similar, in concept,
to what I just described, except that the act of breaking the table apart is
handled automatically by Oracle. Also, when writing a SQL query pulling by,
say again, SURVEY_YEAR, Oracle handles gathering data from only those
partitions you are requesting, effectively ignoring the unneeded partitions.
When Oracle selects certain partitions based on your SQL query it's called
Partition Pruning. Now, you tell Oracle to partition your table by specifying
an appropriate column or columns, called the partitioning key. In the
example below we'll use the SURVEY_YEAR to partition the table into yearly
partitions. Then, when you load your data, Oracle will break apart the data
into yearly partitions based on the SURVEY_YEAR. One partition for 2004,
one for 2005, and so on until the last partition for 2013. Again, everything is
handled by Oracle behind the scenes, from breaking up the data to pulling
the correct data for a SQL query, all you have to do is tell Oracle how to
partition the data. Now, recall that I talked briefly about tablespaces and
how they are a logical concept associated with the underlying physical
database files used to store tables, indexes, and other fun things. Partitions
can be placed in separate tablespaces themselves allowing the database
administrator to place SURVEY_YEAR 2013 say, in one tablespace located on
one set of disks, SURVEY_YEAR 2012 in another tablespace located on a
completely different set of disks, and so on. This spreads the workload
across many disks allowing for possible faster SQL query execution. Even if
you don't have separate disks for each partition, your SQL code will benefit
from partitioning. Now, you're probably thinking that managing partitions
can be a nightmare. Evert time there's a new year you have to add another
partition. In fact, Oracle allows you to create partitions upfront, even for
data that you don't have yet. Those partitions will be initially empty of
course, until you have new data and insert it into the partition table.
Partitions can also have sub-partitions, effectively, partitions of partitions.

Varieties of Partitions

Let's talk about the varieties of partitions available to you. In general, there
are two big groups of partitions, simple and composite. Simple partitions
include the range, list, and hash partitions. Range partitioning partitions the
table by range of values. For example, the column SURVEY_DATE would be a
good candidate for a partitioning key. List partitioning partitions the table by
a specific list of values, as opposed to a range of values for range
partitioning. For example, the column SURVEY_YEAR would be a good
candidate for a partitioning key since it takes on the distinct values 2004,

2005, and so on. The previous two partitioning methods rely on partitioning
keys used on a WHERE clause to make use of Partition Pruning, but some
large tables just don't have appropriate candidates for a partitioning key.
These tables may contain a huge amount of additional information removed
from the main fact table because these columns are accessed infrequently,
but you can always access these columns by joining the two tables together
by an appropriate column or columns, such as say, RESPONDENT_ID. Hash
partitioning partitions the table into a selected number of partitions based
on a reasonable column, or columns, used as the partitioning key. Now, as
mentioned earlier, we won't go into detail about hash partitioning in this
course, so please see the Oracle manuals for more information. Composite
partitions are partitions of partitions. For example, Range-List partitioning
will partition first by range partitioning and then sub-partition by list
partitioning within each range partition. An example of this may be
SURVEY_DATE, the range partition, and RESPONDENT_GENDER, the list subpartition. As another example, List-List composite partitioning will partition
by list partitioning using one partitioning key, for example,
LIKELIHOOD_PURCHASE, and then sub-partition by list partition again, using
a different partitioning key, for example, RESPONDENT_GENDER, and of
course, there are many more combinations. I won't list them all here, they
can be found in the Oracle SQL reference manual. Recall that I mentioned
there are two types of partitioning schemes, simple and composite. On this
slide let's concentrate on the range partitioning using the SURVEY_DATE to
break the table into partitions. Here we have a very large table with a
bazillion rows of data. Our SQL queries are running like snails and everyone
is getting up in arms. Since most of the queries that SQL developers are
submitting to the database involve subsetting by range of SURVEY_DATEs,
we decide to partition our very large table by rand partitioning with
SURVEY_DATE as the partitioning key. Here is one partition, which contains
just January 2004's data, nothing else, and here is February 2004's data,
March 2004's data, and so on until we get to the last partition of December
2013. When a SQL developer issues this query, subsetting by SURVEY_DATE,
only the relevant partitions are involved and the remaining partitions are
ignored, so in this query, January 1, 2004 to March 31, 2004 means that the
first three partitions are touched by the database and the rest are ignored.
Now, despite what you see in the graphic, when you partition the large table
you are not making copies of it. You define the partitioning scheme when
you use the CREATE TABLE syntax upfront and then you load the data into
the table. All partitioning is handled automatically by Oracle using the
partitioning key. Recall that I mentioned there are two types of partitioning
schemes again, simple and composite. On this slide let's concentrate on the
composite range list partitioning scheme using the SURVEY_DATE to break
the table into partitions and using the RESPONDENT_GENDER to further
break these partitions into sub-partitions. Here again, we have a very large
table with a bazillion rows of data. Now, most of the queries that SQL
developers are submitting involve subsetting by SURVEY_DATE, as well as
RESPONDENT_GENDER, so we decide to partition our very large table by
month range of SURVEY_DATE, as we did on the last slide, as well as the
RESPONDENT_GENDER. Here is one partition, which contains just January
2004's data, nothing else, and here is where our sub-partitions come in. The

partition, January 2004, is further partitioned into males and females, and
here is February 2004's data, March 2004's data, and so on until we get to
the last partition of December 2013. Now, when the developer issues this
query, subsetting by SURVEY_DATE and RESPONDENT_GENDER, Oracle will
only scan the appropriate partitions and sub-partitions and skips the rest.

Partition-Wise Joins

Let's talk about Partition-Wise Joins. As you know, Oracle can run queries in
Parallel, or concurrently, so that while your query is running someone else's
query can run as well. Given two partition tables, say an orders table and an
inventories table being joined together, Oracle can break this join up into
several parallel joins based on the partitioning key. Now, if both tables have
the same partitioning key, then this type of parallel join is called a full
partition-wise join. If both tables are partitioned differently, or if one table is
not partitioned at all, then this type of join is called a partial partition-wise
join. In either case, Oracle may attempt to join these two tables together in
parallel reducing the overall runtime of your SQL query.

Partition/Index Interaction

Let's talk about indexes and partitions. Although we discuss them in a


separate module in the course, indexes and partitions do, in fact, interact
with each other. Recall then, you can consider partitions as breaking up a
large table into individual smaller tables. Now, there are three ways indexes
can be applied to partitions. One way is to create an index within each
individual partition. These are called Local Partitioned Indexes. Recall I
mentioned at the beginning of this module that you could take a very large
table and break it apart into several smaller tables. Now, indexing each of
these several smaller tables is similar to creating a Local Partitioned Index
on a partition table, that is, one index per partition or, as another option,
you can create one index spanning across all the partitions. This is called
Global Non-Partitioned Indexes and is similar to creating a single index on
the very large table. Finally, you can create an index with its own
partitioning scheme, independent of a partitioning scheme used by the
underlying table, this is called Global Partitioned Indexing.

Examples

Let's do an example of partitions, specifically list partitioning. Let's create a


new table that will be filled in with the data from the
CANDYBAR_HISTORICAL_DATA table and let's List Partition our new table by
SURVEY_YEAR. Currently, we have 10 SURVEY_YEARs ranging from 2004 to

2013. In the example below we create one partition for each SURVEY_YEAR,
as well as an additional partition to hold data for the SURVEY_YEAR 2014,
even though we don't have that data yet. Here is the code. It's the standard
CREATE TABLE that contains all the columns and data types to mimic the
CANDYBAR_HISTORICAL_DATA table, but here we call it CANDYHIST_PART.
After the ending right paren for the columns definitions we provide the
keywords PARTITION BY LIST to indicate that we are partitioning the table
using the List Partitioning. Next, in parentheses we provide the name of the
column that serves as our partitioning key, SURVEY_YEAR, in this case. Now,
this column should be used in your SQL queries WHERE clauses more often
than not, which is one of the reasons for partitioning by it. Next, we follow
up with the left paren and we then define each partition giving it a unique
name. For example, we start off with the keyword PARTITION, followed by
the partitioning name, P_2004. Next, we tell Oracle the value or values that
make up each partition by entering the VALUES keyword along with one or
more comma delimited values for SURVEY_YEAR. In this case, we're entering
a single value, 2004, and so on and so on. Now, although I don't show it, I
follow up with an INSERT INTO clause inserting the data from the original
CANDYBAR_HISTORICAL_DATA table into the PARTITION table,
CANDYHIST_PART. Let's conclude our simple example. Now, I'll show some
test SQL in just a minute, but I want to remind you before I do that there are
no indexes on either CANDYBAR_HISTORICAL_DATA or CANDYHIST_PART. Any
indexes that were there were removed for this test. Now, here's a simple
SQL query, but note that I'm subsetting WHERE SURVEY_YEAR is 2004. Since
this SQL query is accessing the CANDYBAR_HISTORICAL_DATA table, Oracle
will have to do a full table scan to find all the 2004 data. Note that this
query runs in just under 10 seconds on my machine. Next, same exact
query, but it's accessing CANDYHIST_PART instead, our partition table. Note
that this query, in comparison, runs in under 2 seconds on my machine.
Since I removed indexes, the difference is purely due to Oracle skipping over
all the partitions except for the one associated with 2004, and note that we
didn't have to create individual tables ourselves. Oracle does all of this
under the hood using partitions. Note that when I added a Bitmap Index on
the SURVEY_YEAR for the CANDYBAR_HISTORICAL_DATA table and ran the
query on the left again, the performance still did not reach that of the query
on the right. Finally, be aware that you can combine the concepts of Third
Normal Form, as well as indexes on your partitions in order to speed up your
queries even faster.

Summary

In summary, what did we just learn? We learned that partitioning the table is
very similar to breaking a very large table up into separate, smaller tables
except that Oracle handles all of this under the hood. We learned about
simple partitions, such as the range and list partitioning schemes, and we
also mentioned the hash partitioning scheme. We learned that you can
combine two portioning schemes together to get a composite partitioning
scheme, such as a range partition with a list sub-partition. We also learned

that Oracle can execute a join between two partition tables in Parallel called
Partition-Wise Joins, resulting in faster SQL query runtimes. We talked briefly
about the interaction between indexes and partitions and we'll learn more
about this in the last module in the course, but in the next module we'll look
into simple partitions in much more detail.

Simple Partitions

Introduction

Hello and welcome back to the Pluralsight course, Optimizing SQL Queries in
Oracle. My name is Scott Hecht and in this module I'd like to take an indepth look into the list and range partitions, so let's get started.

Module Contents

First though, let's go over the module contents. We start off discussing the
List Partition. We describe what a list partition is, outline how to create list
partitions, and then show you how to add to a preexisting list partition, as
well as deleting one or more partitions from it. We'll then move onto the
range partition, describe what it is, how to create it, and how to add to and
delete from it, and we end with a summary.

What is a List Partition?

Let's talk about list partitioning. What is list partitioning? List partitioning
allows you to create partitions based on the discreet values of a single
column. Unlike range partitioning, list partitioning only allows for a single
column to be with the partition key. For example, you can partition by say,
the LIKELIHOOD_PURCHASE column, which takes on 10 distinct values. You
can also list partition by say, SURVEY_YEAR, since that column takes only
distinct values from 2004 to 2013. Recall that our example in the overview
used the SURVEY_YEAR column to partition the
CANDYBAR_HISTORICAL_DATA table by list partitioning. Note that you can
also partition by more than one list value. In our overview example we
created one partition for each individual SURVEY_YEAR. We could have
created a list partition with two, three or more years, and we see an
example of this later on in the module. You can also create additional
partitions based on a discreet value, which doesn't currently exist. For
example, we can setup partitions for SURVEY_YEAR 2014, 2015, 2016, and
so on even though they don't yet exist in our table. Once you load the new
year's date in it'll be placed in the correct partition without you having to tell

Oracle what to do. When using list partitioning you can make use of the
DEFAULT keyword instead of a specific value or list of values. This indicates
to Oracle that this particular partition is to hold all of the data that is not
defined by the other partitions. For example, if we have a partition defined
with the DEFAULT keyword instead of the list of values 2014, 2015, and 2016
when we load the new year's data into the table it will automatically be
placed in this DEFAULT partition. Recall that we've talked about tablespaces
several times in this course. When using partitions you can place each
partition in its own tablespace, if need be. Don't forget that one of the goals
of partitioning is to speed up your queries. This means that you should
choose a column as the partitioning key based on the majority of your
WHERE Clauses. When using this column in your WHERE clauses Oracle will
perform Partition Pruning based on the values of this column.

Creating a List Partition

Let's take a look at the Oracle syntax chart for creating list partitions. Within
the CREATE TABLE statement, after the last column has been defined and
after the ending right paren, you type, PARTITION BY LIST, followed by a
single column in parentheses. You then follow up with a left paren and one
or more partitioning clauses used to give a name to each partition. Next,
you define the values associated with each partition shown as the
list_values_clause in the syntax chart. Now, the list_values_clause is defined
as so. You can enter the keyword VALUES, then you enter in one or more
comma delimited values based on the partitioning key. Next, you end with a
right paren.

Examples of List Partitions

Let's see some examples. First, let's see a basic list partition example using
the SURVEY_YEAR as the partitioning key. You start by defining your table,
here, CANDYHIST_PART, along with your columns. After the last column is
defined, and after the last right paren, you code PARTITION BY LIST
SURVEY_YEAR, and in parentheses you specify one or more PARTITION
clauses. Here I have one PARTITION for each SURVEY_YEAR. After the
PARTITION clause I specify the name of the PARTITION, for example, P_2004,
P_2005, and so on. Take note that even though 2014 does not appear in our
test data, I can define a partition for it upfront. This partition will remain
empty until SURVEY_YEAR 2014 data is inserted into the table. Now, let's
see an example in where the partitions are defined with more than one
value. Also, let's see the DEFAULT keyword in action. As you can see, we
again follow our CREATE TABLE statement with a PARTITION BY LIST clause
and specify the column SURVEY_YEAR in parentheses. We then specify each
PARTITION clause, followed by the name of the partition, as well as the
VALUES keyword. In parentheses we specify the values 2004, 2005, 2006 for
the first partition, 2007, 8, and 9 for the second partition, and we specify the

keyword DEFAULT for the last partition. Note that this last partition, called
P_DEF, will contain the data for those several years not specified in the
other partitions. Now, let's see the example on the previous slide with each
partition pointing to different tablespaces. As you can see, it's the same
syntax as shown in the previous slide with the addition of the TABLESPACE
keyword followed by the name of the tablespace.

Adding to a List Partition

Let's talk about how to add an additional List Partition to a table that is
already List Partitioned. That is, you've already submitted your CREATE
TABLE code with a PARTITION BY A LIST clause and you've loaded the table
with the data and now you want to add an additional partition. No problem.
As I've mentioned before, you can create partitions in your CREATE TABLE
statement for data that does not yet exist. Recall from our examples that we
added an additional partition for 2014, even though that data does not yet
exist. Now though, let's see how we can tell what partition exists on a
partition table. There are two nice dictionary views you can use. The first is
ALL_TAB_PARTITIONS, which allows you to see all the tables you have access
to, and USER_TAB_PARTITIONS, which focuses only on your own tables and
their partitions. For example, let's determine what partitions are on the table
CANDYHIST_PART. As you can see, we SELECT the TABLE_NAME, the
PARTITION_NAME, the HIGH_VALUE column, as well as the
PARTITION_POSITION. The PARTITION_NAME is the name you gave the
partition. For less partitioning the high value column displays the values
used in the VALUES clause when you defined the partitions, and the
PARTITION_POSITION is an integer representing the order of the partitions in
the CREATE TABLE statement. I always sort by this last column and, as you
can see, CANDYHIST_PART is partitioned into 10 pieces ranging from
SURVEY_YEAR 2004 on forward. Let's continue our talk on how to add
partitions to a partition table. Now, to add a partition to a partition table use
the ALTERTABLE statement along with its ADDPARTITION clause. Let's add a
partition to hold the 2014 data using ALTERTBLE. Note that this assumes
that we don't already have a partition for 2013, and here is the code to do
that. You start with ALTER TABLE followed by the name of the table you want
to add a partition to, you follow up with the keywords, ADD PARTITION,
followed by the name of the new partition, P_2014 in this case, then you
follow up with the VALUES clause and in parens you place the value or
values that will make up the new partition. In this case, I specify only 2014.
Note that this code assumes that you did not use the DEFAULT keyword
when you first defined the partition table. We talk about that on the next
slide. If we now look at the USER_TAB_PARTITIONS dictionary view, you'll see
that P_2014 has indeed been added as one of the partitions of the table.
Let's conclude our talk on how to add partitions to a partition table. Now, the
code to add a partition is a little different if you've used the DEFAULT
partition keyword. You still use ALTERTABLE, but in this case, you must use
SPLIT PARTITION in order to split the DEFAULT partition into two pieces, the
desired new partition, and the new DEFAULT partition. The reason is because

the DEFAULT partition could contain the data that should be in the new
partition you are defining, say SURVEY_YEAR 2014's data. By splitting up the
DEFAULT partition your new partition will contain its data, if any, and the
remaining data, say SURVEY_YEAR 2015, will be placed in the new DEFAULT
partition. Start with ALTER TABLE followed by the name of the table. Follow
up with SPLIT PARTITION followed by the name of the DEFAULT PARTITION,
P_DEF, in this case. Next, specify the VALUES keyword along with one or
more values you want to SPLIT off from this DEFAULT partition. Here, it's
2010, 2011, and 2012. Next, follow up with the INTO keyword and, in
parens, you name the new partition, as well as the name of the new
DEFAULT PARTITION. Note that I just used the same name, P_DEF, for the
DEFAULT PARTITION, but I called my new partition, P_012. Taking a look at
USER_TAB_PARTITIONS now, you'll see that we have a new partition called
P_012 in the third position, as well as the DEFAULT PARTITION, P_DEF, in the
fourth. Take note of the values appearing in the column HIGH VALUE for
P_012, 2010, 2011, and 2012.

Dropping a List Partition

On this slide let's see how we can drop a list partition from a table. You have
two choices here. The first drops the partition, as well as all the data in that
partition. This may not be what you want, so caution is advised. The code is
very simple. ALTER TABLE, followed by the name of the partition table, the
keywords DROP PARTITION, and finally, the name of the partition you want
dropped, and as you can see, P_DEF is no longer listed in
USER_TAB_PARTITIONS for our table. The second, and less drastic option, is
to merge two partitions together. This has the effect of reducing the number
or partitions by one and merging the data from both partitions into one
partition. The syntax for that is, ALTER TABLE, followed by the name of the
partition table, the keywords, MERGE PARTITIONS, and a comma delimited
list of the existing partitions you want merged. In this case, P_456 and
P_789. Next, follow up with the keywords INTO PARTITION and the name of
the single, new MERGE PARTITION. In this case, I call the new MERGE
PARTITION P_456789. Finally, a quick look at USER_TAB_PARTITIONS and you
see that P_456 and P_789 are no longer there and they have been replaced
with the new partition, P_456789, containing the years 2004 to 2009, as
shown in the HIGH VALUE column.

What is a Range Partition?

Let's talk about range partitions. What is range partition? Range partitioning
allows you to create partitions based on a range of values. A good
partitioning key would be the column SURVEY_DATE, since it contains a wide
range of values, as opposed to SURVEY_YEAR, which contains several
discreet values. For range partitioning you use the VALUES LESS THAN
Clause instead of the VALUES Clause used in list partitioning. According to

the Oracle syntax charts, partitions can be defined using one or more
columns. Contrast this with list partitioning, which only allows you to use a
single column as the partitioning key. Recall that we use the DEFAULT
keyword in list partitioning to create a catchall partition. For range
partitioning you use the MAXVALUE keyword instead. We show an example
of this in just a moment. Just like for list partitioning, you can create
partitions for values that do not yet exist in the database. Again, just like for
list partitioning, each range partition can be placed on a separate
tablespace if need be. Don't forget that one of the goals of partitioning is to
speed up your queries. This means that you should choose a column or
columns as the partitioning key based on the majority of your WHERE
clauses. When using this column or these columns in your WHERE clauses,
Oracle will perform Partition Pruning based on these values.

Creating a Range Partition

Now, let's take a look at the Oracle syntax chart for creating range
partitions. Within the CREATE TABLE statement, after the last column has
been defined, you type PARTITION BY RANGE, followed by one or more
comma delimited columns in parentheses to serve as the partitioning key.
Follow up with a left paren and then with one or more partition clauses used
to give a name to each partition. You then define the values based on the
range_values_clause. Now, the range_values_clause is defined as so, you
enter the keywords, VALUES LESS THAN, followed by a left paren, then enter
in one or more comma delimited constant values based on the column or
columns you've chosen to be your partitioning key. If SURVEY_DATE is your
partitioning key, you can enter in either, date literals or you can use the
TO_DATE function to specify a specific date constant. Next, end with a right
paren to close the VALUES LESS THAN clause. Continue to do this until the
last VALUES LESS THAN clause. Finally, end the partition by a range
statement with a final right paren and then with a semicolon. Now, be
careful here because the value specified in the VALUES LESS THAN clause is
not included in the partition. For example, if I specify January 1, 2014 this
indicates that values less than January 1, 2014 should be placed in the
partition. We see a detailed example of this in just a moment.

Examples of Range Partitions

Let's see some examples. First, let's see a basic range partition example.
You start by defining your table here, CANDYHIST_PART, along with your
columns. After the last column is defined you code PARTITION BY RANGE
(SURVEY_DATE) and in parentheses you specify one or more PARTITION
clauses. As you can see, I'm specifying one PARTITION clause for each year
associated with the SURVEY_DATE column. After the PARTITION clause I
specify the name of the partition, for example P_2004, P_2005, and so on.
Next, you follow up with VALUES LESS THAN and in parentheses you specify

a value below which will be placed in the partition. For example, for P_2004 I
am specifying Date '2005-01-01' or January 1, 2005. This means that rows
containing a SURVEY_DATE less than January 1, 2005, or everything from
2004 and back, will be placed in partition P_2004. Now, for P_2005 I specify
'2006-01-01' or January 1, 2006, indicating that rows containing a
SURVEY_DATE less than January 1, 2006 will be placed in the partition
named P_2005. Note that partition P_2005 will not contain the data for
P_2004 even though that partitions date range is below January 1, 2006. If
you code your partitions in order, as shown, Oracle is smart enough not to
do that. Take note that even though 2014 does not appear on our test data, I
can define a partition for 2014. This partition will remain empty until data
from 2014 is inserted into the table. Next, let's see how you can create a
catchall partition when using range partitioning. In this code I specify an
additional partitioning called P_REST using the keyword MAXVALUE in the
VALUES LESS THAN clause. This indicates that any survey date from January
1, 2015 on forward, inclusive, will be placed in this catchall partition. Next,
let's see an example of how to specify a tablespace when using range
partitioning. In this code I follow the VALUES LESS THAN clause with the
keyword TABLESPACE along with the name of the tablespace I want this data
loaded into. Here I'm only specifying the TABLESPACE keyword for partition
P_2004, but you get the general idea.

Adding to a Range Partition

Now, let's talk about how to add an additional partition to an already


partitioned table. You've already submitted your CREATE TABLE code with a
partition by range clause and you've loaded the table with the data and now
you want to add an additional partition. No worries. Just as a reminder, don't
forget that you can specify additional partitions in your CREATE TABLE
statement upfront, even if the data doesn't' exist yet. Those partitions will
be made empty, of course, until you insert the appropriate data into the
table. Let's see the output of the USER_TAB_PARTITIONS view when the table
is range partitioned. Here is the same code we saw in the list partitioning
section and, as you can see, CANDYHIST_PART is partitioned into 11 pieces
based on the SURVEY_DATE column. Note that the column HIGH VALUE now
contains a TO_DATE function associated with our date literal for the
partition. Oracle does this automatically, so you don't have to. The last
partition, P_REST is just specified as MAXVALUE, indicating that this is the
catchall partition for SURVEY_DATEs from January 1, 2015 on forward. Now,
let's see how to add an additional partition to an already partitioned table.
Similar to adding a partition when a table is list partitioned, you can use the
ALTERTABLE ADD PARTITION clause. Now, assuming that we do not have a
MAXVALUE catchall partition, let's add a range partition for the year 2014.
Start with ALTER TABLE followed by the name of the range partition table.
Next, follow up with the keywords ADD PARTITION followed by the name of
the partition, P_2014, in this case. Next, follow up with the VALUES LESS
THAN clause and in parentheses specify the value below which you want to
appear in this new partition. In this case, I'm specifying January 1, 2015

indicating that any rows of data with a SURVEY_DATE prior to this will be
placed in this new partition. Note that data already associated with a
previously defined partition will not be included in this new partition. Now,
taking a look at the USER_TAB_PARTITIONS, you can see that our new
partition, P_2014, has been added. Let's complete our talk on how to add an
additional partition to a partitioned table. Now, if you use the MAXVALUE
keyword, adding a new partition is slightly different than shown in the
previous slide. This is similar to list partitioning when the DEFAULT keyword
is used. In this case, you use ALTERTABLEs SPLIT PARTITION AT clause. For
example, start with ALTER TABLE followed by the name of the table to split.
Next, follow with SPLIT PARTITION and the name of an existing partition that
you want to split up. In this case, I am splitting the catch all partition,
P_REST, into two pieces. Now, follow up with the AT keyword and in parens,
the value below which you want the split to occur. In this case, I am
specifying AT (DATE January 1, 2015. Because I want my new partition to
contain all values below January 1, 2015 that may already exist in the
partition P_REST. Next, follow up with the keyword INTO and in parens,
specify the name of the new partition, P_2014 here, and the name of the
new catchall partition. In this case, I'm just using P_REST again. Looking at
the output of the USER_TAB_PARTITIONS you'll see that P_2014 is defined
with a HIGH VALUE of January 1, 2015 and P_REST, the catchall partition, is
set to MAXVALUE.

Dropping a Range Partition

Now, let's see how we can drop a range partition from a table. Similar to list
partitioning, you have two choices here. The first drops the partition, as well
as all the data in that partition, and again, this may not be what you want,
so again, caution is advised. The code is very simple, ALTER TABLE followed
by the name of the partition table, the keywords, DROP PARTITION, and
finally, the name of the partition you want axed. As you can see, P_REST is
no longer listed in USER_TAB_PARTITIONS for our table. The second and less
drastic option is to merge two partitions together. This has the effect of
reducing the number of partitions by one and merging the data from both
partitions into one partition. The syntax for this is ALTER TABLE followed by
the name of the partition table. Next, the keywords MERGE PARTITIONS
followed by a comma delimited list of partitions. In this case, P_2004 and
P_2005. Next, follow up with the keywords INTO PARTITION and the name of
the new merge partition, in this case, I called the new merge partition
P_OLD_DATA. Finally, a quick look on USER_TAB_PARTITIONS shows that
P_2004 and P_2005 have been merged and replaced by P_OLD_DATA and
the column HIGH VALUE reflects this.

Summary

In summary, what did we just learn? In this module we learned about simple
partitioning using list and range partitioning. You use less partitioning when
your WHERE Clauses frequently specify a column containing discreet values,
such as our SURVEY_YEAR column or you use range partitioning when your
WHERE Clauses frequently specify a column that contains many values,
such as our SURVEY_DATE column. We saw how to add additional partitions
to a list and range partition table. In this case, you would use ALTER TABLE
ADD PARTITION when the DEFAULT and MAXVALUE keywords were not used.
When either DEFAULT or MAXVALUE were used you would use the ALTER
TABLE SPLIT PARTITION syntax instead. To drop partitions for list and range
partitioning use the ALTER TABLE DROP PARTITION syntax if you want the
partition, as well as the data to be dropped. Again, use the ALTER TABLE
MERGE PARTITON syntax to merge two partitions together without dropping
the data itself. In the next module we discuss composite partitions.

Composite Partitions

Introduction

Hello and welcome back to the Pluralsight course, Optimizing SQL Queries in
Oracle. My name is Scott Hecht and recall from the last module that we
talked about basic partitions and how they can speed up your queries using
Partition Pruning. In this module I'd like to talk about composite partitions or
partitions within partitions, so let's get started.

Module Contents

First though, let's go over the module contents. We'll explain what a
composite partition is. We'll list some of the varieties of composite
partitions. For example, the List/List composite partition, the Range/List
composite partition, and we'll list a few of the other composite partitions
available. In the previous module we went through, in detail, how to create,
add to, and drop partitions. Since you get the idea by now, in this module
we'll just show you two examples. The first is how to create a List/List
composite partition. That is, a list partition with a list subpartition and the
second is how to work with a Range/List composite partition. That is, a range
partition with a list subpartition. In the third example we'll look at output
from the AUTOTRACE when joining two likewise partitioned tables together
and we end on a summary.

What is a Composite Partition?

Let's talk about composite partitions. What is a composite partition? A


composite partition is a partition that contains partitions itself. The partitions
contained within the partitions are also known as subpartitions. For
example, we can partition by SURVEY_YEAR, as we did in the previous
module, or we can also subpartition by RESPONDENT_GENDER, assuming
that these two columns occur in your SQL WHERE Clauses much of the time.
We use these two columns in our example of composite List/List partitioning
below. As another example, we can partition by SURVEY_DATE and then
subpartition by the LIKELIHOOD_PURCHASE column, assuming these two
columns occur in your SQL WHERE Clauses much of the time. This is an
example of composite Range/List partitioning. As usual with partitions, you
can create subpartitions upfront that don't exist in the data yet. Oracle has a
nice feature called the Subpartition Template. This allows you to code your
subpartitions in one spot and Oracle will apply them within all partitions.
We'll see an example of this shortly. Now, you can override the Subpartition
Template feature for any partition you want. We see an example of this as
well. Note that you don't have to use the Subpartition Template at all, but it
makes coding the subpartitions much much easier, especially if you want
them defined in the same exact way. Although I've said it many times
already, you are partitioning and now subpartitioning your table, so that
your queries can hit the smallest amount of data possible. This is usually
based on your WHERE Clauses, so be sure that your partition columns, as
well as your subpartition columns appear in the vast majority of your
WHERE Clauses. If you partition by a column that isn't used in your WHERE
Clauses, Partition Pruning just won't occur.

Varieties of Composite Partitions

Let's talk about the varieties of composite partitions available. Oracle's


composite partitioning allows for nearly any combination of partitioning
scheme within the subpartitioning scheme. For example, a composite
List/List partitioning scheme allows for a list partition with list subpartitions.
The composite range list partitioning scheme allows for a range partition
with list subpartitions. There are others of course, but they depend on your
version of Oracle. For Oracle 10g release 1 and release 2 you could have a
range partition with either a list or hash subpartition. For Oracle 11g release
1 you could have a range or list partition with a range, list or hash
subpartition. For Oracle 11g release 2 on forward you can have a range, list
or hash partition with a range, list or hash subpartition. Again, as I've said
before, make sure that your choice of composite partitions reflects the vast
majority of WHERE Clauses specified in your queries in order for Partition
Pruning to be successful.

Example #1 - List/List

Let's see an example of composite List/List partitioning. In this example we


will list partition by SURVEY_YEAR and list subpartition by
RESPONDENT_GENDER. Here is our CREATE TABLE statement containing
columns and data types. Now, at the end of it we enter PARTITION BY LIST
SURVEY_YEAR, but instead of following up with a left paren, as we did in the
last module, we follow up with the subpartition scheme, SUBPARTITION BY
LIST (RESPONDENT_GENDER). We immediately follow up with the keywords,
SUBPARTITION TEMPLATE and the left paren. Now, within the SUBPARTITION
TEMPLATE we specify our lists of partitions in a similar way we created list
partitions in the last module, but instead of using the PARTITION keyword we
use the SUBPARTITION keyword followed by the name of the subpartition,
shown in blue, P_MALE, P_FEMALE, and P_OTHER on this case, followed by
the VALUES clause and a comma delimited list of the values to be associated
with this subpartition. We end with a right paren, shown in red, and then we
code the PARTITION BY LIST partitions. Here I'm creating two lists partitions,
P_OLD, associated with survey years 2004 to 2008, and P_NEW, associated
with survey years 2009 to 2013, each appearing in a VALUES clause itself,
and we end everything with wide paren, as usual. Now, for each partition
there will be exactly three subpartitions because of how we defined the
Subpartition Template. Let's continue our composite List/List partitioning
example. Now, recall in the previous module that we enumerated the
partitions by querying the ALL_TAB_PARTITIONS or USER_TAB_PARTITIONS
views. When using subpartitions these views only get you so far. For
example, let's first see what USER_TAB_PARTITIONS tells us. Note that I'm
not brining in the table name to save space in the output, but I am including
the composite column, which returns a yes if the table contains
subpartitions or no otherwise. I'm also including the column
SUBPARTITION_COUNT, which indicates how many subpartitions there are
under a particular partition and here is the output. As you see, the column
COMPOSITE is yes for both partitions, P_OLD and P_NEW. Also, the
SUBPARTITION COUNT displays a value of 3 indicating that there are 3
subpartitions under each partition. Unfortunately, this doesn't tell us what
the subpartitions are called and how they're define. Let's continue with our
composite List/List partition example. Now, in order to enumerate the
subpartitions of a table you must use the ALL_TAB_SUBPARTITIONS or
USER_TAB_SUBPARTITIONS dictionary views. In this query I'm including the
columns TABLE_NAME, PARTITION_NAME, SUBPARTITION_NAME, the
HIGH_VALUE column, as well as the SUBPARTITION_POSITION column, which
is similar to the PARTITION_POSITION column. Take note though, that I'm
sorting the output by PARTITION_NAME and SUBPARTITION_POSITION. Note
that we have three rows for each of the two partitions, P_OLD and P_NEW.
Now, despite the name we gave our subpartitions in the Subpartition
Template, Oracle names them as PARTITION NAME followed by an
underscore followed by the name we gave our subpartition. For example,
P_NEW_P_MALE and so on.

Example #2 - Range/List

Let's see an example of composite Range/List partitioning. In this example


we partition by ranges using the SURVEY_DATE column and list subpartition
using the LIKELIHOOD_PURCHASE column. Here is the CREATE TABLE
statement. We start off defining our columns and data types as usual. We
then specify PARTITION BY RANGE (SURVEY_DATE) followed by this
subpartitioning scheme, SUBPARTITION BY LIST (LIKELIHOOD_PURCHASE).
Next, we follow up with the keywords SUBPARTITION TEMPLATE and the left
paren. Within the SUBPARTITION TEMPLATE we specify our subpartitions by
providing the keywords SUBPARTITION, the name of the subpartition, and
the VALUES clause. Note that, in this case, instead of having one
subpartition for each value of the LIKELIHOOD_PURCHASE column, which
ranges from 1-10 by the way, I'm setting up subpartitions based on how my
SQL developers may actually pull the data for the queries. In this case, I
have one partition for values 1, 2, and 3 and a subpartition called
P_NOT_LIKELY. That is, the respondent is not likely to purchase the candy bar
and so on and so forth. You get the point. Finally, we end the template with
a right paren and then provide all of our range partitions, just as we did in
the previous module. Let's continue our composite Range/List example. Let's
look at the output from USER_TAB_PARTITIONS first and then
USER_TAB_SUBPARTITIONS. Take note that I have removed some stuff in
order to fit the output under the slide, but the output from
USER_TAB_PARTITIONS does not show us the subpartition definitions. On the
other hand, the output from USER_TAB_SUBPARTITIONS displays the
partitions, subpartitions, and the high values for each subpartition. Again,
take note that the SUBPARTITION NAME is a concatenation of our partition
names and our subpartition names. Let's continue our composite Range/List
example. Now, let's suppose that in 2004 the column
LIKELIHOOD_PURCHASE was on a 7 point scale instead of the 10 point scale.
If that's the case, our subpartition definitions from the template will be
wrong, at least for 2004. Now, we can override the Subpartition Template by
providing a parenthesized list of subpartitions after the definition of the
2004 list partition P_2004. Here is the code and it is very similar to the code
on the previous slide. I have the Subpartition Template just as before, but I
also overrided under partition P_2004 shown in red. Take note that the
subpartition P_NOT_LIKELY is associated with the values 1 and 2.
P_SOMEWHAT_LIKELY is associated with the values 3 and 4, and finally,
P_VERY_LIKELY is associated with the values 5, 6, and 7. Although we have
the same number of subpartitions, three, which is not a requirement by the
way, we have ensured that the subpartitions associated with 2004 have
appropriately defined a meaningful subpartitions. Based on the SQL code on
the previous slide let's now look at USER_TAB_SUBPARTITIONS. As you can
see in red, partition P_2004 has three subpartitions defined under it and the
HIGH VALUE column indicates the correct values for the
LIKELIHOOD_PURCHASE column. The rest of the partitions are defined
exactly as our subpartition template specified. Let's continue our composite
Range/List example. Note that the definition of partition P_2004 on the
previous few slides did not include a catchall partition. On this slide let's add
that catchall subpartition to partition P_2004. Note that this is very similar to
adding a partition shown in the previous module. As usual, we start with
ALTER TABLE followed by the name of the table. When working with

composite partitions we use MODIFY PARTITION to indicate that we're


making a modification to a specific partition, P_2004 in this case. Next, we
use the ADD SUBPARTITION keywords to add a subpartition under partition
P_2004. Recall our partition P_DEFAULT and defining it using the values
clause with the keyword DEFAULT in parens. As you can see in red, we now
have a new subpartition called P_DEFAULT with the HIGH VALUE column set
to DEFAULT. Please see the Oracle manuals for more on how to add to or
delete from a subpartition.

Example #3 - AUTOTRACE

In this example let's take a look at what AUTOTRACE displays when first
subsetting from a partition table and joining two likewise partitioned tables
together. On this slide let's subset from a partition table. Here is a very
simple query pulling data from the CANDYHIST_PART table subsetting for
SURVEY_YEAR 2010. Since this table is partitioned the output from
AUTOTRACE, as well as EXPLAIN _____ will be slightly different. As you can
see in this abbreviated output, the words, PARTITION LIST SINGLE appears
indicating that Oracle is pulling from a partition. There are other keywords
you may encounter, such as PARTITION LIST ITERATOR, PARTITION RANGE
SINGLE, and so on, but the telltale sign that Oracle is pulling from one or
more partitions is the keyword PARTITION. Let's conclude our third example.
On this slide let's take a look at what AUTOTRACE displays when joining two
likewise partition tables together. Here I'm joining the CANDYHIST_PART
table, which is list partitioned by SURVEY_YEAR, and the DATEDIM_PART
table, which is a likewise list partition by SURVEY_YEAR. Both partitioning
schemes are the same and here is the query. As you can see in this
abbreviated output that the words PX COORDINATOR and so on, are
displayed indicating that Oracle's performing this join in Parallel using
Partition-Wise joins.

Summary

In summary, what did we just learn? We learned that you can subpartition
partitions using the subpartition by keywords. We saw that the subpartitions
you can use with partitions is version dependent. Please check the Oracle
SQL reference manual for your Oracle version and release. We learned how
to easily create subpartitions by specifying the Subpartition Template and
we also learned how to override it within the partition definition. We saw
how to add a catchall subpartition using the ALTER TABLE MODIFY PARTITION
ADD SUBPARTITION syntax. Finally, we saw what the output of AUTOTRACE
looked like when joining two likewise partitioned tables together. We saw
that the keywords PARTITION RANGE, as well as keywords beginning with PX,
indicated that Oracle was taking into account the partitioning scheme for
subsetting, as well as joins. In the next module we discuss the interaction
between indexes and partitions.

Partition/Index Interaction

Introduction

Hello and welcome back to the Pluralsight course, Optimizing SQL Queries in
Oracle. My name is Scott Hecht and in this module I'd like to talk about how
indexes and partitions interact. That is, how do you create an index on a
partition table and how does a change in partition structure affect indexes,
so let's get started.

Module Contents

First though, let's go over the module contents. We'll start off by describing
how partitions and indexes interact and move on to how you can create
indexes on partitions and talk specifically about Global Non-Partitioned
Indexes, Local Partitioned Indexes, and Global Partitioned Indexes. We'll then
talk about how a change in the partitioning scheme on a table can affect the
indexes. We'll then talk about how to repair broken indexes when a change
to the partitions has occurred and we end with a summary.

Partition/Index Interaction

Let's talk about the partition/index interaction. How do partitions and


indexes interact? Well, there are three indexing schemes you can create on
partition tables, all or which use either the same or similar code to our
familiar CREAT INDEX syntax. The first index scheme is called Global NonPartitioned Indexes. Despite its fancy name, this is just the indexes we
learned in the earlier modules. Now, here we have a table that's been
partitioned into five pieces. When you create an index on a column in this
table using the syntax we showed you earlier, you're creating an index that
spans across the entire table regardless of the partitioning scheme, but this
is the same as if the table were not partitioned at all. The Oracle
documentation mentions that this type of index scheme is used with online
transaction processing or OLTP databases, as opposed to a data warehouse.
With that said, check your WHERE Clauses and see if this type of indexing
scheme might be useful to you. The next indexing scheme is called Local
Partitioned Indexes. Given our table again, partitioned into five pieces, when
you create a Local Partitioned Index on this partition table one index is
created within each individual partition. That is, the index inherits the same
exact partitioning scheme you placed on the table itself. The Oracle
documentation recommends this indexing scheme for data warehouses and
is most likely the one you want to use. Now, to create a Local Partitioned

Index you add the keyword LOCAL to the CREATE INDEX syntax. We show an
example of this in just a moment. The last indexing scheme is called Global
Partitioned Indexes and again, given our table partitioned into five pieces,
Global Partitioned Indexes create an index partitioned independently of the
tables underlying partitioning scheme. That is, the index has its own
partitioning scheme that doesn't necessarily match with that of the tables.
Oracle's documentation recommends this for OLTP rather than data
warehousing. We won't discuss this indexing scheme in this lecture, so
please peruse the appropriate Oracle documentation for more about it.

Global Non-Partitioned Indexes

Let's talk about applying indexes to partitions. On this slide let's focus on
Global Non-Partitioned Indexes and, as I mentioned, despite the fancy name
of Global Non-Partitioned Index, is the same as the indexes we learned
about earlier in the course. That is, a Global Non-Partitioned Index placed on
a column of a table spans the entire table whether that table is partitioned
or not, so let's take a quick look at how to do this. Recall that in the previous
module we created the list partition table, CANDYHIST_PART, which is
PARTITION BY LIST on the SURVEY_YEAR column. Partition P_456 is
associated with the values 2004, 2005, and 2006. Partition P_789 is
associated with the values 2007, 2008, and 2009. Finally, PARTITION_DEF is
associated with the keyword DEFAULT indicating that the remaining values,
2010, 11, 12, and 13 will be loaded into this partition. Now, to create an
index or a Globally Non-Partitioned Index, to use its full and glorious name,
on the column LIKELIHOOD_PURCHASE you just code the displayed CREATE
INDEX statement. This is just the same CREATE INDEX code we've used in
the modules on indexes. Note that depending on the version of Oracle
you're running, you may need to gather stats after this statement
completes. Let's conclude our chat about Global Non-Partitioned Indexes. On
this slide I want to show you the output from the system views,
USER_INDEXES, USER_IND_COLUMNS, and the new view,
USER_IND_PARTITIONS. We'll talk about that last one in just a moment. The
output from USER_INDEXES for the table CANDYHIST_PART just displays the
name of the index, as you would expect. The output from
USER_IND_COLUMNS, as a reminder, displays the columns that are indexed
on a table. Here the column LIKELIHOOD_PURCHASE is indexed on the table
CANDYHIST_PART and that index is, of course,
GNIX_CANDYHISTPART_LIKPRC. So far, so good. Now, let's talk about the
views ALL_IND_PARTITIONS and USER_IND_PARTITIONS. These views are
used to display all of the indexes associated with each partition. As you see,
there is no output for the index GNIX_CANDYHISTPART_LIKPURC because we
created that index as a Global Non-Partitioned Index, so it stands to reason
that it wouldn't be in this view.

Local Partitioned Indexes

Let's continue our talk about applying indexes to partitions. Next, let's talk
about Local Partitioned Indexes. As a reminder, this type of index inherits
the same partitioning scheme as the table itself. For reference, here is our
CREATE TABLE statement along with the PARTITIONED BY LIST clause. To
create a Local Partitioned Index all you have to do is include the LOCAL
keyword at the end of the CREATE INDEX statement, as shown. Let's
conclude our talk about Local Partitioned Indexes. On this slide let's see the
output of the view USER_IND_PARTITIONS. As you see, there are three rows
displayed, one row for each partition. Contrast this to the output from the
Global Non-Partitioned Index, which had no output at all. This makes sense
since a Local Partitioned Index would be partitioned using the same scheme
as the table itself. Now, please take a look at the column labeled STATUS.
This indicates whether the index associated with that particular partition is
USABLE or UNUSABLE. If marked USABLE, Oracle can use this index to,
hopefully, pull data faster from within that partition. If the index is marked
UNUSABLE, Oracle cannot use the index and may scan all of our rows within
that partition. As you see, all of our partitions are marked USABLE, but let's
see under what circumstances one or more indexes will be marked
UNUSABLE.

Changes in Partitions Affect Indexes

Let's talk about how changing a partition will affect indexes. What's the
effect on an index when there's a change in partitioning scheme. Well, recall
that we have the following CREATE TABLE statement partitioning by list and
creating three partitions. Next, I'll create a Local Partitioned Index on the
LIKELIHOOD_PURCHASE column, as shown with the LOCAL keyword. As
we've seen, this is the output with all partitions marked as usable. Fine so
far, what could possibly go wrong? Let's see what the effect is of adding an
additional partition to our table, CANDYHIST_PART. As we've seen before,
here's the code to split the default partition, P_DEF, into partition P_012
holding survey years 2010, 11, and 12. The remaining SURVEY_YEAR, 2013,
will be stuffed into the new partition, P_DEF. Now, by submitting this code
we effectively alter the partitioning scheme of the table. Now, let's take a
look at the output from USER_IND_PARTITIONS. As you can see, our two new
partitions, P_012 and P_DEF, are marked as UNUSABLE. That just can't be
good. How do you turn unusable indexes into usable indexes? By using the
ALTER INDEX statement for our index LPIX_CANDYHISTPART_LIKPUC we can
use the REBUILD keyword for each partition to rebuild the index. Once
complete, the index will be usable. Note that you will have to rebuild each
partition one at a time, as shown and the output from USER_IND_PARTITIONS
now indicates that the index is usable across all partitions. Let's conclude
this section with a brief discussion of gathering statistics on a partition. With
all the changes to our partition table you will still need to gather stats on
any new partition using the DMBS_STATS procedure, GATHER_TABLE_STATS.
To do this, add the parameter PARTNAME to the GATHER_TABLE_STATS

procedure call and provide the name of the partition you want to gather
stats on. As shown, I have added partname=> and in tick marks, P_012, to
GATHER_TABLE_STATS in order to gather stats for the partition P_012.

Summary

In summary, what did we just learn? We learned that there are three types
of indexes you can place on a partition table. A Global Non-Partitioned Index
is just our familiar index from earlier modules, in that the index is created
across the entire table regardless of a partitioning scheme. A Local
Partitioned Index is similar to creating one index per partition and the Global
Partitioned Index is an index with its own partitioning scheme, separate from
that of the tables. To create a Local Partitioned Index use the LOCAL keyword
on the CREATE INDEX statement. In order to determine what partitions are
indexes, as well as which indexes are usable, we looked at the dictionary
views, ALL_IND_PARTITIONS, as well as USER_IND_PARTITIONS. If an index
already exists on a partition table, making modifications to the tables
partitioning scheme marks the indexes for those partitions unusable. To fix
this use the ALTER INDEX REBUILD syntax on each partition.

You might also like