Oracle Database 10g OLAP Performance Tips
Oracle Database 10g OLAP Performance Tips
by Mark Rittman
Part 1 | Part 2
Related Podcast: Podcast: Mark Rittman - Oracle Openworld in Retrospect
The OLAP Option to Oracle 10g gives you the ability to store multidimensional cubes of
data in your Oracle database, and perform OLAP queries on them using OLAP DML,
regular SQL or query tools such as Oracle Discoverer Plus OLAP. Part 1 and 2 of this
article provides tips and best practices for designing, loading, aggregating and querying
Oracle OLAP cubes and in addition takes a look at some of the new features coming with
10g Release 2.
Introduction
The OLAP Option for Oracle Database 10g 10.1 is a recent addition to Oracles data
warehousing capabilities, and gives you the ability to define multidimensional cubes of
data and store them in special OLAP datatypes. Derived from Oracle Express Server, the
OLAP Option builds in to the Oracle database the capabilities of stand-alone OLAP
servers such as Oracle Express Server, Hyperion Essbase, and Microsoft Analysis
Services. When you copy your data into an OLAP cube and then run a routine to
aggregate the data, you can provide fast, flexible reporting for your users that frees them
from the constraints of traditional relational reporting.
Just as with regular relational data warehousing, though, you need to give some thought
as to the best way to design your OLAP cube, and how you should go about configuring
the database, specifying storage, loading the cube, aggregating it, and then providing fast,
predictable query responses for your users. Parts 1 and 2 of this article look at some tips
and techniques for obtaining the best performance from the OLAP Option to Oracle
Database 10g Release 1, and looks forward to some of the new features coming in 10g
Release 2.
Sample dataset
The techniques and examples within this article use sample data and analytic workspace
definition that you can download and install into a suitably configured Oracle database.
To use this dataset and follow along with the examples, you will need to ensure that the
following software is installed:
The sample data and AWM10g template files can be downloaded from here:
and are based on the Global Widgets sample OLAP dataset used in the forthcoming
Oracle Press book, Oracle Discoverer 10g Handbook, authored by Michael and Darlene
Armstrong-Smith, and with contributions from the author of this article.
Instructions on how to install the sample data are provided in the readme.htm document
that accompanies the sample data. The sample data download includes scripts to create
user accounts and load them with source data, together with a number of Analytic
Workspace Manager 10g template files that will be referred to later in the article. When
using these template files, you should ensure that you connect to the GSW_AW account
so that the source data and the analytic workspace are kept separate.
The following tables are created and populated in the GSW schema:
Table Name
GS_CUSTOMER_DIM
Description
GS_PRODUCT_DIM
GS_CHANNEL_DIM
GS_TIME_DIM
GS_SALES_LOAD_2
User Standard
Customer > City > District > Region > All Customers
Product
User Standard
Channel
User Standard
Time
Time Standard
Cube Measures
Dimensions Used
Sales Order Quantity, Ship Quantity Customer, Product, Channel, Promotion, Time
The sample data setup scripts create two users:
GSW, the schema that holds the tables containing the source data
for the Global Widgets analytic workspace, and
GSW_AW, the schema that will contain the analytic workspace.
All timings within this article were taken using the following hardware and Oracle
versions:
This article will now take you through the steps that need to be considered when building
an efficient Oracle Database 10g OLAP analytic workspace.
dimension members but also those at higher levels of aggregation. With these figures it is
possible to create the Cartesian product, which reflects all possible combinations.
To take the Global Widgets Customer dimension as an example, this dimension has a
single hierarchy that has five levels, with the data for each level being sourced from
columns in the GS_CUSTOMER_DIM table, like this:
Level
Customer
City
GS_CUSTOMER_DIM.CITY_NAME
District
GS_CUSTOMER_DIM.DISTRICT_NAME
Region
GS_CUSTOMER_DIM.REGION_NAME
Product
Channel
Next, these totals need to be multiplied together to give the total number of possible cells
for each measure:
-------------11401
You can therefore calculate how sparse or dense the cube is likely to be by dividing the
number of actual dimension combinations by the number of possible combinations in the
source data. (Note: This does not consider any pre-summarisation that you might build
into your cube.)
(11,401 actual / 251,494,308 potential) * 100 = 0.005% dense
There is no set rule for when a cube is considered sparse or dense, but I usually
consider 30 percent to be a good cut-off point and with a cube with this degree of
sparsity, you would want to use one of the sparsity-handling features in Oracle 10g OLAP
to reduce the amount of space used to store null (or NA, as its termed in Oracle OLAP)
data.
The template file GSW_AW_SPARSE.XML contains a logical dimensional model for the
Global Widgets cube where certain of the dimensions are marked as being sparse.
dimension, but very few for all the others, and nine times out of ten, it is the Time
dimension that is considered dense, with all the other dimensions considered sparse.
Analytic Workspace Manager generates a composite, a structure that holds just the
dimension member combinations for the dimensions you designate as sparse, together
with an index between the composite and the base dimension values, and dimensions
measured within your cube by this composite rather than the individual, sparse
dimensions. This will then considerably reduce the amount of disk space that your
measures will take up. Note that Analytic Workspace Manager 10.1 creates one
composite per measure in your cube, but Analytic Workspace Manager 10.2 gives you the
additional option to create a global composite, a single composite that covers all
measures in a cube. Creating a global composite reduces build time and storage space
required, but is only suitable when your measures share the same sparsity characteristics,
the aggregated composite will not be too large (less than 50m values), multi-writer
support is not being used, and compression (detailed as follows) is not being used.
For cubes that are particularly sparse (the Global Widgets cube would fall into this
category), you can also make use of a new feature introduced with version 10g of Oracle
OLAP, known as compression, or Compressed Composites. Choosing to use
compression will tell Analytic Workspace Manager to create the composite with the
compression option, and this will, for certain types of sparse data, again significantly
reduce the amount of storage required for the cube. With Oracle Database 10g Release 1,
there are a number of restrictions on using compression, such that you cannot partially
summarise your cube, and the only allowed aggregation method is SUM; however, these
restrictions have largely been lifted in Oracle Database 10g Release 2.
An additional benefit of marking cubes as sparse and using compression where
appropriate is that the amount of time required to load and aggregate your cube can be
significantly decreased.
To give you an idea of the space and maintenance time savings that sparsity handling can
produce, using the Global Widgets sample data and with all dimension levels
presummarised, the final size of the cubes after loading with a full set of data was as
follows:
Cube Description Template Name
GSW_AW_DENSE.XML
Dense cube
Sparse Cube
GSW_AW_SPARSE.XML
46 minutes
1.074 Gb
16 minutes
3 minutes
As you can see, handling sparsity effectively can significantly reduce the amount of disk
space and time that is required to maintain your cubes, and if you are able to take
advantage of the new compression feature in Oracle Database 10g, this can again bring
your disk and time requirements down to just a fraction of that originally planned.
You should not, however, define all cubes as sparse regardless of the degree to which data
is sparse. The composites that Oracle OLAP uses to handle sparsity combine members
from the dimensions that you mark as sparse and require unravelling when used in
queries. Although Oracle OLAP handles this process transparently, it still takes up some
time, and the denser your cube is, the longer the process will take. Therefore, if you have
a cube which is for example 80 percent dense, adding an unnecessary composite will
decrease both maintenance and query performance.
The accepted, good practice is usually to leave the Time dimension as dense, and,
therefore, outside the composite, as this ensures that all time values are clustered together,
improving the runtime performance of time-series analysis. However, this can lead to a
big increase in the size of the analytic workspace if daily data is used. And, from
speaking to contacts within Oracle OLAP development, their current practice is to define
all dimensions as sparse, including Time, which involves a small increase in query time,
but a big decrease in build time and required disk space.
Ordering of dimensions
You might have noticed that the Implementation Details tab of the Create Cube dialog
(Figure 1, above) that you can specify the order of dimensions within a cube. The order
of dimensions is important as this can have a significant impact on the time it takes to
load, and query, your cube.
You should put the dimension with the most members (the fastest varying dimension)
at the top of your dimension list, followed in order by the next fastest varying, such that
the last dimension in the list has the least amount of members (the slowest varying
dimension); note however that compressed composites will automatically reorder
dimensions to get optimal build performance. This operation is not possible with all types
of aggregation, as it would sometimes change results (for example with the LAST
aggregation operator).
When Oracle OLAP comes to store values for the measures in your cube, it clusters
together in pages at the start of the available storage area those values relating to the
first dimension in your list - the fastest varying one, the dimension with the most
members - and then gradually fills out the rest of the storage space with measure values
corresponding to the other, progressively slower varying dimensions. Typically, you
chose the Time dimension as your fastest varying dimension, and by ordering it at the
start of your dimension list you ensure that measure values corresponding to all the
different time dimension members - days of the week, or months in a year, for example are physically stored close together, making data retrieval faster.
It also makes sense to try and ensure that the order in which your source data is held and
then loaded corresponds to the order in which your cube dimensions have been listed. If
you have control over the order of records in the source data file, then you can create the
data file to match the cubes in your analytic workspace. Alternatively, you can create a
view over your source tables and then use an ORDER BY clause in the SELECT
statement to reorder the source data; otherwise, you may need to choose between
optimizing for loads and optimizing for queries when defining the dimension order of
cubes in your analytic workspace
Storage considerations
Analytic Workspaces are held in LOBs within regular relational tables which can be
identified through their AW$ prefix. Like any other tables, they are created within a
tablespace which has one or more datafiles associated with it. Operations on Analytic
Workspaces also make use of temporary tablespaces to store changes to multidimensional
objects before the changes are committed, and therefore you will need to factor the use of
regular and temporary tablespaces into your physical database implementation plans.
It is good practice to create separate tablespaces for each analytic workspace, and to
specify an initial size for the datafile corresponding to the likely size of your Analytic
Workspace, taking into account how you will handle sparsity. By pre-creating your
datafile to the likely size for your Analytic Workspace, you will avoid the overhead
associated with extending the datafile when your Analytic Workspace is loaded and
maintained.
For the Global Widgets sample data, a suitable tablespace definition would be as follows:
CREATE TABLESPACE gsw_aw_data nologging
DATAFILE 'gsw_aw_data.dbf'
SIZE 2000M REUSE
AUTOEXTEND ON
NEXT 8M
MAXSIZE UNLIMITED
EXTENT MANAGEMENT LOCAL
SEGMENT SPACE MANAGEMENT AUTO
autoallocate;
Note the following clauses to the CREATE TABLESPACE statement:
The schema that will contain your analytic workspace can either use the default
temporary tablespace, or you can create a new one for use with analytic workspace
operations. This can be an individual temporary tablespace for each schema that contains
analytic workspaces, or you can create one that is used by all schemas that hold analytic
workspaces. The advantage of separating this usage out is that if the temporary tablespace
becomes very big (during an EIF file import, for example) you can create a replacement
one, alter the relevant users to make this their new default temporary tablespace, then
drop the old one, without affecting all the other non-OLAP using users in the database.
A typical temporary tablespace definition for use with analytic workspaces would be as
follows:
CREATE TEMPORARY TABLESPACE gsw_aw_temp
TEMPFILE 'gsw_aw_temp.tmp'
SIZE 5000M REUSE
AUTOEXTEND ON
NEXT 5M
MAXSIZE UNLIMITED
EXTENT MANAGEMENT LOCAL
UNIFORM SIZE 1m;
Oracle OLAP uses the default temporary tablespace for the user to store all changes to the
data in an analytic workspace, whether the changes are the result of a data load, what-if
analysis, forecasting, aggregation, or some other analysis. An OLAP DML UPDATE
command moves the changes into the permanent tablespace and clears the temporary
tablespace. Oracle OLAP also uses temporary tablespace to maintain a private session
view of the analytic workspace to accommodate multiple users running queries and
changing dimension status at the same time. Note that you can minimize the amount of
temporary tablespace used by connecting to your Analytic Workspace in read-only mode,
rather than read-write or read-write exclusive mode.
Note that you should specify a uniform extent size for the temporary tablespace, rather
than have Oracle increase the size of the next extent automatically. In version 9i of Oracle
OLAP, as each analytic workspace was held in a small number of LOBs, the
recommendation was to make the uniform extent size between 1MB and 8MB; however,
with Oracle Database 10g, each OLAP object is held in its own LOB, and as AW/XML
templates can create thousands of objects per analytic workspace, each of which now
requires its own TEMP segment, the uniform size of each extent should now be sized
between 512K and 2MB.
To illustrate the benefits of preallocating disk space to datafiles, the following
maintenance times were observed for the Global Widgets cube. Note that these timings
are not directly comparable to other sets of timings and should only be compared to each
other.
Template File
Initial
datafile size
GSW_AW_SPARSE 50 Mb
GSW_AW_SPARSE 1000 Mb
Initial
tempfile size
Tablespace
creation time
Maintenance
Time
50Mb
4 seconds
23 minutes
900 Mb
41 seconds
14 minutes
The conclusion from this is that although the preallocation of space can take a small
additional amount of time, it is faster to allocate disk space at this stage than to do so
incrementally as the cube is maintained.
The System Global Area (SGA) is a shared memory region that contains data and control
information for your Oracle instance. The most important components within the SGA
are the Buffer cache, the Shared pool, the Java pool and the Large pool, and from Oracle
Database 10g onwards you can have Oracle automatically manage and tune the sizes of
these components using Automatic Shared Memory Management, which is enabled via
the SGA_TARGET parameter (below).
The Program Global Area (PGA) again contains data and control information, but for a
single server process. It is private memory created when a server process is started, and
when you use Oracle OLAP in relational mode it is used by sorts, order bys, hash joins
and so on. Like ASMM, PGA memory can also be managed automatically for you by the
PGA Advisor when the WORK_AREA_SIZE_POLICY database parameter is set to
AUTO, which is the default with Oracle Database 10g.
When you are using Oracle OLAP in multidimensional mode (i.e. working with analytic
workspaces) you will need to be aware of an area of memory called the OLAP Page Pool
that is used as the paging cache; generally you will want this paging cache to be sized
such that as much of your OLAP work is done in memory as possible, rather than being
paged to disk.
When you are running your instance in dedicated server mode the OLAP Page Pool is
part of the User Global Area (UGA) which is in turn part of the PGA. If
WORKAREA_SIZE_POLICY is set to AUTO, the PGA Advisor will automatically size
the OLAP Page Pool Size up to 50 percent of the PGA_AGGREGATE_TARGET value,
and once this limit is reached, every subsequent user will acquire the bare operating
minimum of around 4MB. For large OLAP applications (i.e. 8GB and above) you may
want to monitor V$AW_CALC (detailed later) to ensure that users do not exceed 200MB
to 500MB, depending on OS, hardware and database configuration; if they do exceed this
figure, you may wish to manually set the OLAP_PAGE_POOL_SIZE parameter for a
session, defining the minimum size for this area, if performance has degraded.
When are running in shared server mode however, the UGA is part of the SGA and any
changes to its default size have to be made by you manually by setting
OLAP_PAGE_POOL_SIZE. In addition, regardless of whether you are operating in
dedicated or shared server mode, when the OLAP Page Pool is full, Oracle will use the
Buffer Cache (part of the SGA) as swap space, which at a certain point can in fact
improve the performance for large batch jobs (tip from Oracle development, untested by
the author). Once the Buffer Cache is full however, as with any process data will start to
be swapped to disk and therefore, it is good practice to size the SGA and PGA
appropriately if you are looking to optimise the performance of your OLAP application.
To specify values for the PGA, SGA and (for when you are in Shared Server mode, the
OLAP Page Pool), you can use the following database parameters:
SGA_TARGET
When this is set to any value > 0, it defines the target amount of
memory that should be available to ASMM to assign to the Buffer
Cache, Shared pool and other automatically tuned SGA
components. This parameter should be set to between 50 percent
and 60 percent of the total available memory on your server when
working with OLAP data.
SGA_MAX_SIZE
This parameter sets the upper boundary for SGA_TARGET.
Ensure that you increase this accordingly to accommodate any
increases to SGA_TARGET
PGA_AGGREGATE_TARGET
This parameter sets a target for all PGA memory that Oracle will
try and keep within. PGA_AGGREGATE_TARGET is important
to OLAP developers as it defines (via the PGA Advisor) the size of
the OLAP Page Pool when working in dedicated server mode,
therefore you will want to increase it (to say 200MB - 400MB,
higher up to 40 percent of total memory if necessary) when
working with analytic workspaces in this scenario. Note that the
initial setting for PGA_AGGREGATE_TARGET is set at 20
percent of the SGA_TARGET when you first enabled ASMM.
WORKAREA_SIZE_POLICY
Used in conjunction with PGA_AGGREGATE_TARGET, enables
the PGA advisor which automatically sizes areas such the OLAP
Page Pool Size within the PGA (when in dedicated server mode)
OLAP_PAGE_POOL_SIZE
As noted above, the OLAP Page Pool size is managed for you
automatically by the PGA Advisor when running in dedicated
server mode, but is set manually when running in shared server
mode.
TYPE
------------------------------------ --------------------------------------sga_target
big integer
1G
SQL> show parameter pga_aggregate_target
NAME
TYPE
VALUE
------------------------------------ --------------------------------------pga_aggregate_target
big integer
400M
SQL> show parameter olap_page_pool_size
NAME
TYPE
VALUE
------------------------------------ --------------------------------------olap_page_pool_size
SQL> SELECT
2
(pool_size, 1))
3
4
5
6
PGA inuse')
7
8
9
10
11
12
FROM
13 UNION
14 SELECT
1024 || ' KB'
info
15
FROM
16
WHERE
17 UNION
18 SELECT
||
big integer 0
* 100
|| '%' info
DUAL
'Total PGA Inuse Size: ' || VALUE /
v$pgastat
NAME = 'total PGA inuse'
'Total OLAP Page Size: '
19
1024, 0)
20
21
22
INFO
---------------------------------------------------------------Total PGA Inuse Size: 11462 KB
Total OLAP Page Size: KB
OLAP Pages Occupying: %
SQL> SELECT vs.username, vs.SID,
2
ROUND (pga_used_mem / 1024 / 1024, 2)
|| ' MB'
pga_used_mb,
3
ROUND (pga_max_mem / 1024 / 1024, 2)
|| ' MB'
pga_max_mb,
4
ROUND (pool_size / 1024 / 1024, 2) ||
' MB' olap_mb,
5
ROUND (100 * (pool_hits pool_misses) /
pool_hits, 2)
6
|| ' %' olap_ratio
7
FROM v$process vp, v$session vs, v$aw_calc
va
8
WHERE session_id = vs.SID AND addr = paddr;
no rows selected
Now attached the GSW_AW analytic workspace.
SQL> exec dbms_aw.execute('aw attach
gsw_aw.gsw_aw');
Now run the above statements again. You can see from the statement output how much
space is taken up by the OLAP Page Pool as a whole for the instance.
PL/SQL procedure successfully completed.
SQL> SELECT
'OLAP Pages Occupying: '
2
||
ROUND (( ((SELECT SUM (NVL
(pool_size, 1))
3
FROM v$aw_calc))
4
/ (SELECT VALUE
5
6
FROM v$pgastat
WHERE NAME = 'total
PGA inuse')
7
8
9
10
11
12
13
14
1024 || ' KB'
info
15
16
17
18
19
1024, 0)
20
21
22
),
2
)
* 100
|| '%' info
FROM DUAL
UNION
SELECT
'Total PGA Inuse Size: ' || VALUE /
FROM v$pgastat
WHERE NAME = 'total PGA inuse'
UNION
SELECT
'Total OLAP Page Size: '
|| ROUND (SUM (NVL (pool_size, 1)) /
|| ' KB' info
FROM v$aw_calc
ORDER BY info DESC;
INFO
---------------------------------------------------------------Total PGA Inuse Size: 21107 KB
Total OLAP Page Size: 5271 KB
OLAP Pages Occupying: 25%
You can also run the following query to view PGA and OLAP Page Pool usage for a
particular user:
SQL> SELECT vs.username, vs.SID,
2
ROUND (pga_used_mem / 1024 / 1024, 2)
|| ' MB'
pga_used_mb,
3
ROUND (pga_max_mem / 1024 / 1024, 2)
|| ' MB'
pga_max_mb,
4
ROUND (pool_size / 1024 / 1024, 2) ||
' MB' olap_mb,
5
ROUND (100 * (pool_hits pool_misses) /
pool_hits, 2)
6
|| ' %' olap_ratio
va
USERNAME
SID PGA_USED_M PGA_MAX_MB
OLAP_RATIO
---------- ---------- ---------- ------------------- ----------------------------------------GSW_AW
141 9.72 MB
10.37 MB
5.15
MB
98.53 %
OLAP_MB
number of available
number of OLAP sessions *
MAX_SERVERS
any directories that OLAP
=
=
AUTO
name of the undo
Conclusion
Oracle OLAP, though based on Express Server technology, is new as an option to the
Oracle RDBMS and as time progresses techniques and approaches are being developed to
optimise data loads, aggregation and user queries. This article sets out some tips and best
practices for designing your cube. In part 2 of this article, we will cover loading,
aggregating, and querying Oracle OLAP cubes, and highlight some of the new features
coming with 10g Release 2. As adoption of Oracle OLAP continues, more techniques and
best practices will be documented and I would be more than interested in hearing any
feedback or approaches that readers have used.
-Mark Rittman is a Certified Oracle Professional DBA and works as a consultant on
Oracle BI and Data Warehousing projects. Mark also chairs the UK Oracle User Group
BI & Reporting Tools SIG and is an Oracle ACE.
Mark would also like to thank Heiko Becker, Chris Chiappa, Jameson White, and
Anthony Waite for their contributions to and technical review of this article.