Oracle 8i Analytical Functions Guide
Oracle 8i Analytical Functions Guide
Introduction
The purpose of this article is to introduce some of the new analytical functions that were introduced in ORACLE 8i. After
reading Oracle’s documentation on the functions, I feel certain that most users will, or did have, trouble understanding
exactly what some of the options are. The windowing clause options, in particular, were poorly documented and required
a lot of testing to determine exactly what the options were and even more importantly, when they were permitted.
Numerous examples are contained in this article to explain the various options.
All the functions are not covered here due to time. The regression analysis functions should be self-explanatory after
understanding the functions covered in this article. Not being a statistician, some of the statistical functions were avoided
like the plague.
With my special interest in SQL, these new functions also provided a far superior way of specifying complex queries, plus
listing aggregates with the details used to compute the aggregates. Included below are numerous examples, and in certain
cases, execution statistics are listed to illustrate the significant performance improvements that can be attained with the
new functions.
Objective of Functions
While these functions can be implemented by utilizing standard SQL, the benefits are:
simplicity of specification
• reducing network traffic
• moving processing to server
• provide superior performance over previous SQL functions
Simplicity
In the early days of Oracle Corporation I would demo ORACLE to prospective clients and tell them that the beauty of the
relational approach is any query could be formulated with SQL. Fortunately only one client ever asked me (using
ORACLE’s demo database) to list the sum of salaries by department and compare that against all other departments (i.e.
what percentage of the total company salaries, each department’s sum represented). Having demonstrated ORACLE for
years, I immediately hedged by saying you must first create a view. The following view was required for the solution:
The problem here was having the appropriate views created in advance.
Another problem encountered years ago was trying to phrase a query to find the top 10 salesmen. To illustrate let’s look at
the top 2 salaries in the EMP table.
www.nyoug.org                                                 1                                             212-978-8890
SELECT * FROM emp e1
WHERE EXISTS
      (SELECT null FROM emp e2
       WHERE e2.sal > e1.sal
        AND e1.rowid != e2.rowid
       HAVING COUNT(*) <2)
SQL 2
Specifying SQL like this is beyond the average user, and it’s inefficient because there is no way to inform ORACLE what
we are trying to achieve.
The following query uses a standard aggregate with a GROUP BY to produce the sum of salaries per group defined by the
same job and deptno.
        DEPTONO        JOB
        10             CLERK             1300
        10             MANAGER           2450
        10             PRESIDENT         5000
        20             ANALYST           6000
        20             CLERK             1900
        20             MANAGER           2975
        30             CLERK             950
        30             MANAGER           2850
        30             SALESMAN          5600
        Table 1
www.nyoug.org                                                2                                             212-978-8890
The following illustrates an analytical function that produces the same sum but lists it with all the details.
The main difference at this point to recognize is that the analytical aggregates do not compress the groups of rows into a
single row as does the standard aggregate. That means the analytical functions can also be applied to a SQL module
containing a GROUP BY. However, when the SQL module does have a GROUP BY the only columns or expressions that
can be referenced by the analytical functions are the columns/expressions that are being grouped, plus the other
aggregates.
Partitions
The analytical functions operate on groups of rows called partitions. The syntax for the SUM analytical function is as
follows:
The PARTITION clause is optional. If the PARTITION clause is not used the set of rows operated on by the analytical
function is the entire result set. This is analogous to the standard aggregate when there is no GROUP BY clause. For
example, the following uses the SUM analytical function to retrieve the total salaries for all EMP rows, and lists it with
each individual EMP row, allowing us to determine what percentage of the total salaries an EMP’s salary is.
        EMPNO         PERCENT
        7369          .027562446
        7499          .055124892
        7521          .043066322
        7566          .102497847
www.nyoug.org                                                 3                                              212-978-8890
        7654          .043066322
        7698          .098191214
        7782          .084409991
        7788          .103359173
        7839          .172265289
        7844          .051679587
        7876          .037898363
        7900          .032730405
        7902          .103359173
        7934          .044788975
        Table 3
Execution Plan
So what’s really happening within ORACLE? The execution plan for SQL 4 in figure 1 shows the sorting used to produce
the output of the analytical function in step 2. After the normal criteria and grouping (if a GROUP BY is part of the
syntax), a scan and sort is performed on the result set to produce the analytical function output.
Figure 1
SQL 2 above illustrates retrieving the top 2 highest paid employees in the EMP table. ORACLE now provides two
analytical functions that ranks the rows in the result set based on a set of columns. There are two functions because
ranking semantics has two categories: one where rank values are skipped due to ties and one that doesn’t skip values. The
functions are RANK and DENSE_RANK. DENSE_RANK is the one that doesn’t skip values.
To illustrate the idea of skipping values, the following SQL ranks the EMP rows by SAL using both functions.
The results are displayed in Table 4. Check where the SAL values are the same. The first location is highlighted in yellow.
Both EMPNO = 7521 and 7654 have a SAL of 1250. Both the RANK and the DENSE_RANK give the SAL values the
same rank; but it’s the subsequent SAL values where the ranking is different. With RANK, since two rows ties for a rank
of 4, the rank of 5 is skipped, making 6 the next rank value, whereas with DENSE_RANK rank values are not skipped.
The row is highlighted in green (dark shading).
www.nyoug.org                                                 4                                               212-978-8890
        EMPNO       SAL      RANK      DENSE_RANK
        7369        800      1         1
         7900       950      2         2
         7876       1100     3         3
         7521       1250     4         4
         7654       1250     4         4
         7934       1300     6         5
         7844       1500     7         6
         7499       1600     8         7
         7782       2450     9         8
         7698       2850     10        9
         7566       2975     11        10
         7788       3000     12        11
         7902       3000     12        11
         7839       5000     14        12
        Table 4
It should be noted that providing ties with the same rank is important, since they both have the same value.
The syntax for the RANK and DENSE_RANK functions are the same. The syntax follows:
The RANK function itself does not take an argument. As always, the PARTITION clause, which groups rows of the result
set for the input to the analytical function, is optional. If omitted the entire result set is the partition. The RANK and
DENSE_RANK require specifying the ORDER BY, since the rows must be sorted by the columns the ranking is applied
to. As with the standard ORDER BY, the collation order can be specified with ASC or DESC for each ORDER BY
column/expression. And also like the standard ORDER BY, nulls can appear last or first for each order by item. The
default depends on whether you are ordering by ASC or DESC. If ordering by ASC, by default nulls will appear last, and
the reverse for DESC.
Note that the Data Warehousing Guide shows the “[collate clause]”. Who knows what they were thinking, but just
disregard it.
From Query
Ever wonder why ORACLE introduced the ability to place a SQL statement in the FROM clause of a SQL module?
Initially it provided a means of sidestepping the creation of a view. The real significance is the ability to filter the results
of a SQL statement relative to the selected items. This becomes especially important with analytical functions, since they
cannot appear in the WHERE clause. The work-around is to embed the SQL in the FROM clause of another SQL module
and then reference the result set in the WHERE clause. For instance, the analogous SQL to produce the results of SQL 2
appears in SQL 7.
www.nyoug.org                                                  5                                               212-978-8890
The main query, whose results are ranked, is highlighted in bold. In order to return only the top 2, the query must be
embedded in a FROM clause and have the WHERE clause filter the rows.
First note that the intention is to return the top 2 paid employees. That means we must sort by SAL, and the sort must be
in descending order since the first sort row will get the rank of 1. If the rows are sorted in ascending order the rank of 1
identifies the bottom paid employees.
To highlight the difference between RANK and DENSE_RANK, consider what SQL 7 would have produced if 2
employees tied for 1st place. Those 2 employees would both have a RANK value of 1, and EMPNO=7788 and 7902 would
have a RANK value of 3. But if we used the DESNSE_RANK function both EMPNO = 7788 and 7902 would have a
DENSE_RANK of 2. So the criterion “rank_value <=2” works for the RANK function, but would have produced the
wrong answer if DENSE_RANK was used.
Certain types of top-bottom queries are more complex when the top or bottom members are based on an aggregate. For
example, the TIME_SHEETS table lists the hours worked per project per employee. To list the top 5 employees who
worked the most hours would require the following solution:
SELECT *
FROM
     (SELECT emp_seq , SUM (hours ) AS sum_hrs
     FROM time_sheets
     GROUP BY emp_seq )
WHERE 5 >=
     (SELECT count (count (* ) )
     FROM time_sheets
     GROUP BY emp_seq
     HAVING SUM (hours ) > sum_hrs )
SQL 8
Unfortunately SQL 8’s execution would not finish in “your lifetime”. ( I executed the SQL for over 24 hours and then
cancelled.) SQL 8 requires grouping the entire table for each employee and then for each employee, the correlated
subquery would have to recompute the total hours per employee and filter out those that did not work as many hours. The
count is then compared against 5, since we want to list only the top 5 workers. To make this work you have no choice but
to embed the initial GROUP BY in the FROM clause of the main SQL module, otherwise there is no way to reference the
sum of hours for an employee in the correlated subquery.
SELECT *
 FROM (SELECT emp_seq, SUM(hours),
      RANK () OVER (ORDER BY SUM(hours) DESC) AS rnk
       FROM time_sheets
       GROUP BY emp_seq)
 WHERE rnk <= 5
SQL 9
www.nyoug.org                                                 6                                              212-978-8890
Only one grouping of the data is necessary. And performance is reasonable for a TIME_SHEETS table with 13,939,925
rows. The execution statistics are listed in figure 2.
Figure 2
Note that SQL 8 and 9 did not account for NULLs. In both cases you can simply eliminate the NULLs with a WHERE
clause, or in SQL 8, you can order the results and request NULLs to appear first or last. The ORDER BY clause in SQL 9
allows the same type of NULL handling.
One final example ranks the employees by their hiredate and birthdate in descending order enabling us to obtain the last
10 employees hired, and if there is a tie, the youngest employee is ranked lower. SQL 10 below uses standard SQL.
The complexity of specifying SQL 10 is not intuitive, though it does make sense if you consider the request carefully. It’s
basically the subquery that’s difficult. As with the RANK function, if the primary columns, HIREDATE is equal, then the
tie breaker is the BIRTHDATE column. So we OR a criterion stating that if the HIREDATE’s are equal, then the
BIRTHDATE of the subquery must be less than that of the outer query. For example, the subquery returns, per each
employee in the outer query, the number of employees that have a more recent hiredate, plus, when the hiredate is the
same, the employee with the lesser birthdate. The complexity only increases as the number of columns involved in the
ranking increases. But not so with the RANK function. SQL 11 accomplishes the same task and is trivial compared to
SQL 10. Adding more columns for the ranking only means adding the column to the ORDER BY clause of the RANK.
The execution of SQL 10 was over 30 minutes while using the RANK function in SQL 11 took a fraction of a second.
(The EMPLOYEES table contains 15,000 rows.)
Ranking Subtotals
When performing data analysis using the CUBE or ROLLUP functions, often it’s the subtotals and totals that need to be
ranked. The key to specifying the ranking involves the GROUPING function which allows us to determine when the row
contains a subtotal or total. GROUPING of a column that is part of the ORDER BY clause of the RANK function returns
1 when the NULL is due to a subtotal or total.
Using the EMP and DEPT tables the listing of the average salary by department, all departments, job and all jobs is
simple. To filter out the details, use the HAVING clause.
www.nyoug.org                                               7                                             212-978-8890
      DECODE(GROUPING(job), 1, 'All Jobs', job) AS job,
      COUNT(*) "Total Empl",    AVG(sal) * 12 "Average Sal",
      RANK() OVER (PARTITION BY GROUPING(dname), GROUPING(job)
            ORDER BY COUNT(*) DESC) AS rnk
FROM emp, dept
WHERE dept.deptno = emp.deptno
GROUP BY CUBE (dname, job)
HAVING GROUPING(dname) = 1 OR GROUPING(job) = 1
SQL 12
Windowing Functions
Certain analytical functions operate on a subset of rows within a partition. These subsets are referred to as windows.
There are two types of windows that can be specified; a physical or logical window. Physical means a specific number of
rows, whereas logical means the window is based on the ORDER BY value (only one column/expression can occur in the
ORDER BY in certain circumstances). The syntax to specify a window follows the ORDER BY syntax (the ORDER BY
is mandatory):
The ROWS keyword refers to physical window and RANGE, the logical window. The other keywords are relative to the
current row. But it’s the current row that has different meanings for physical and logical windows.
Logical Windows
To better understand the difference between physical and logical windows, let’s start with the logical window, since
physical windows should be simple enough to understand.
The following query uses the EMP table to list the sum of salaries for employees with a lower or equal salary. The logical
window only specifies an upper limit.
www.nyoug.org                                               8                                            212-978-8890
         EMPNO        SAL        SUM_SAL
         7369         800        800
         7900         950        1750
         7876         1100       2850
         7521         1250       5350
         7654         1250       5350
         7934         1300       6650
         7844         1500       8150
         7499         1600       9750
         7782         2450       12200
         7698         2850       15050
         7566         2975       18025
         7788         3000       24025
         7902         3000       24025
         7839         5000       29025
         Table 7
The rows in yellow (shading) both have the same SUM_SAL value. This is the key to understanding logical windows.
The point here is that CURRENT ROW refers to all rows have the same value of the ORDER BY column. Since both
highlighted employees have the same SAL, both values are added to the sum for EMPNO=7521.
To further illustrate the point, the following query computes the sum of the DEPTNO values (forget the query makes no
sense).
The yellow (light shaded) highlighted row in table 8 has other DEPTNO values of 30, but the window is based on equal or
www.nyoug.org                                             9                                            212-978-8890
less values of SAL, since the ORDER BY is on SAL. The red (dark shaded) rows have the same SAL value, so the
SUM_DEPTNO value is the same for both rows.
Date Intervals
If the ORDER BY is over a date column, it would helpful to specify an interval without having to consider the actual
physical values. When using a logical window (only with logical windows) specification and the ORDER BY
column/expression is a date, you can easily specify date intervals in terms of days, months or years. This feature gives you
the ability to specify sliding date windows for requests, such as summarizing outstanding invoices. Combine this with the
CASE function and you can easily request invoices “30 days outstanding”, “60 days…”, etc.
To illustrate some of the interval syntax, I downloaded historical stock pricing for ORCL from ’01-Dec-00’ to ’14-Dec-
01’. The moving average for 30 days is returned in SQL 15, along with the average for the next 30 days from the current
date.
When BETWEEN is not used, the value supplied is considered the start-point by ORACLE and the end-point if the
current row. So PRV_30 averages the stock prices from 30 days preceding the current row. FOL_30 averages the price
from the current row till 30 days following.
If you want to compare PRV_30 and FOL_30, embed the SQL in a FROM clause. For example if SQL 15 was embedded
in a FROM clause, a criterion could be applied to the outer query to return only those rows where the difference between
PRV_30 and FOL_30 is more than 25% of PRV_30. Other types of analysis can easily be performed to compare an
increase in the moving average with the change in volume.
The Data Warehousing Guide illustrates the INTERVAL syntax using DAYS/MONTHS/YEARS. Drop the S in the time
categories to compile without error. I couldn’t find anything in the SQL Reference Manual.
ORACLE provides two other functions to assist in the specification of a time interval; NUMTODSINTERVAL and
NUMTOYMINTERVAL. The syntax is as follows:
The DS in NUMTODSINTERVAL stands for Day or Second. The YM stands for Year and Month. So if you want to use
another numeric column as the first parameter of the NUMTO_DS_INTERVAL, you can. Using the STOCK_QUOTES
table, you can specify a logical window as:
www.nyoug.org                                               10                                             212-978-8890
SELECT emp_seq, effective_date, sal,
MAX(sal) OVER (ORDER BY effective_date DESC
RANGE BETWEEN 1 PRECEDING AND CURRENT ROW) AS Max_Sal
FROM sal_history
SQL 16
So in the logical world, what does “1 PRECEDING” mean? Using the previous knowledge that was also not documented
well, the CURRENT ROW should refer to the group of rows having the same EFFECTIVE_DATE since that’s what we
ordered by. Does ‘1 PRECEDING’ mean the previous logical group? The results of the query are displayed in table 9.
The rows in the same logical group are highlighted with the same color. If ‘1 PRECEDING’ actually meant one logical
row preceding the current row, then MAX_SAL for 1001 should be 500, but instead it’s 300 which is the maximum SAL
for that logical group. The same goes all the other logical groups.
So to make sense out of this, you first have to consider what the rows in the partition are ordered by; a date column. It
turns out that since the sort column is a date column ‘1 PRECEDING’ means ‘1 DAY PRECEDING’. To check this out,
change the 1 to a 5 since ’11-JAN-01’ is 5 days after ’06-JAN-01’.
www.nyoug.org                                               11                                            212-978-8890
         Table 10
Now what happens when the ORDER BY column is a numeric? The following is similar to SQL 17 except the ORDER
BY is by SAL.
If you look at the results it’s clear that ‘1 PRECEDING doesn’t mean 1 logical row. Just like the date field, it means units
of SAL. SQL 19 uses a value of 100.
Now the results make sense. Logical appears to always refer to the value of the ORDER BY. That might explain why
logical windows are limited to one ORDER BY column/expression when a specific numeric value is given for the
PRECEDING keyword. The next logical question is what about sorting by a character column. This is something else that
is never mentioned in the manuals. I tried the following SQL to see what it would generate.
And all it generated was error “ORA-00902: invalid datatype”. So I guess we should assume that you just can’t do that;
but as you’ll see you can sort by character columns when the window is a physical window.
Physical Windows
Physical windows are pretty straightforward, except for when the window is limited by the number of rows. For instance,
www.nyoug.org                                                12                                            212-978-8890
you can specify the end-points as either the boundaries of the partition, or a specified number of rows. Just use ROWS
instead of RANGE to indicate a physical window. SQL 20 is rewritten below as a physical window instead.:
The results are as you would expect. So where would you use a physical window? A good example is historical data. For
example, the SAL_HISTORY table contains a history of all salaries per employee. To determine the amount of each raise
requires sorting the rows per employee in descending order and then comparing the current row with the next row. Since
the last row in each partition (by EMP_SEQ) is the first salary assigned the employee, there was no raise, thus returning
zero. We must eliminate the last row of each partition.
The LAST_VALUE function allows us to select the last row in the window. FIRST_VALUE selects the first row.
The MIN function is included to get the date per employee when the employee was first given a salary. We can use that to
compare with the EFFFECTIVE_DATE. If they are equal then we don’t return the row. The results in table 12 illustrates
the data from SQL 22.
Each partition is shaded in a different color. The first SAL_HISTORY row for each employee has the
EFFECTIVE_DATE and FIRST_SAL in bold making it easy to see which row to exclude.
Recall that in order to compare the aggregate with the column we need to embed the query in a FROM clause and then use
a WHERE clause to filter out the first SAL_HISTORY row per employee. The final solution is SQL 23.
SELECT *
FROM (SELECT emp_seq, sal, effective_date, sal - LAST_VALUE(sal) OVER
                  (PARTITION BY emp_seq ORDER BY effective_date DESC
                ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) AS raise,
www.nyoug.org                                              13                                            212-978-8890
     MIN(effective_date) OVER (PARTITION BY emp_seq ORDER BY effective_date)
                  AS first_sal
     FROM sal_history)
WHERE effective_date != first_sal
SQL 23
How would you specify that query without the analytical functions? And more important, how much of a performance
gain do you get? SQL 24 performs the same task as SQL 23 but doesn’t use analytical functions. It requires a self-join in
order to get the a SAL_HISTORY row joined to the previous SAL_HISTORY row. The self-join isn’t simple because the
EFFECTIVE_DATEs have to be joined via a correlated subquery.
Figure 3 shows the execution statistics where “SQL 1: /TUTORIAL” in the figure is SQL 24 above, and “SQL 2:
/TUTORIAL” is SQL 23 above. The performance is significantly better in all aspects.
Figure 3
Defaults
If you look carefully you’ll find a small note in the Data Warehousing Guide indicating what the default is, when the
windowing clause is omitted from a windowing function. The default is:
But this only occurs for a windowing function. RANK, for instance, is not a windowing function, plus some functions
such as SUM, AVG, MIN, etc can be either used as a windowing function or not. This makes it difficult to know what
will happen by default.
The following SQL uses the SUM function but does not specify a PARTITION, ORDER BY or windowing clause. By
default, the PARTITION is the entire result set.
The result of the SUM function in SQL 25 is the total of all salaries. The default windowing clause does not apply here
because there is no ORDER BY. Recall that to specify a windowing clause, you must have an ORDER BY clause.
www.nyoug.org                                              14                                            212-978-8890
By adding an ORDER BY clause to SQL 25, we get SQL 26:
It looks like TOT_SAL is a running total, but not exactly. Because an ORDER BY is specified, the default windowing is
applied. That means for a given row, all SAL values from the beginning of the partition up to the current row will be
summed.
Because of the default window being a logical window, rows with duplicate ORDER BY values will be considered as a
single row. For example, the yellow shaded rows have the same SAL; 1250. Therefore both rows will have the same
TOT_SAL value which is a sum of all SAL values for rows prior to the SAL equal to 1250, plus the sum of the rows with
a SAL of 1250.
RATIO_TO_REPORT
The RATION_TO_REPORT function computes the percentage of the column/expression to the total of
column/expression for all rows in the partition. An ORDER BY is not permitted, which in turns means a window clause is
not permitted.
In the following example, we query the total hours per employee per project, plus list what portion of the total hours
worked by an employee, were the hours worked on a project. ORACLE must first compute the sum of hours per project
per employee and then total the sums prior to comparing the hours on a project to the total. Note that the parameter to the
function does not have to be listed separately on the SELECT list as was done on SQL 27. Also note that the parameter to
RATIO_TO_REPORT is an aggregate.
www.nyoug.org                                               15                                            212-978-8890
      RATIO_TO_REPORT(SUM(hours))
            OVER ( PARTITION BY emp_seq ) AS ratio
FROM time_sheets
GROUP BY emp_seq, proj_seq
SQL 27
LAG/LEAD
The LAG and LEAD functions are analogous to the FIRST_VALUE and LAST_VALUE functions, in the sense that each
of the functions returns a specific value from another row in a partition. FIRST_VALUE – a windowing function –
references the first row in the window and returns the value of the paramater. LAG – not a windowing function – can
reference any row previous to the current row using an optional offset.
LAG has 1 mandatory and 2 optional parameters. The first parameter is the item in a row to return; the second parameter
is the offset from the current row that identifies the row the value is returned from; the last parameter is the default to
return if the offset moves outside of the partition.
The FIRST_VALUE function in SQL 28 will always return the same value since there is only one partition. LAG will
always return the SAL value from 2 rows previous to the current row.
CASE
I have to mention this function because it can be a big help in specifying queries that needs to group rows based on
complex criteria or specify complex criteria in the WHERE clause. For example, grouping unpaid invoices by the amount
of days past due requires subtracting the invoice date from the current date, and then using that value to group the row in
categories such as “30 days late”, “60 days late”, etc.
www.nyoug.org                                               16                                             212-978-8890
                  AS period,
      SUM(amount) AS amount
FROM invoices
WHERE paid_date IS NULL
GROUP BY CASE WHEN sysdate-inv_date >             90 THEN '90 days overdue'
      WHEN sysdate-inv_date > 60 THEN             '60 days overdue'
      WHEN sysdate-inv_date > 30 THEN             '30 days overdue'
      WHEN sysdate-inv_date > 0 THEN              'less than 30 days overdue' END
SQL 29
         PERIOD                      AMOUNT
         30 days overdue             4301
         60 days overdue             6255
         90 days overdue             1012
         less than 30 days overdue   10302
         Table 15
Now imagine making this request without the CASE function? SQL 30, below, uses the DECODE function to obtain the
same results as SQL 29, but with a great deal more complexity. And imagine someone else trying to figure out what SQL
30 means after you leave?
SELECT DECODE (SIGN(sysdate-inv_date – 90), -1, DECODE(SIGN(sysdate-inv_date-60),-1,
         DECODE(SIGN(sysdate-inv_date-30), -1, ‘less than 30 days overdue’,
                ’30 days overdue’),’60 days overdue’),’90 days overdue’) AS period,
        SUM(amount) AS amount
FROM invoices
GROUP BY DECODE (SIGN(sysdate-inv_date – 90), -1, DECODE(SIGN(sysdate-inv_date-60),
        -1, DECODE(SIGN(sysdate-inv_date-30), -1, ‘less than 30 days overdue’,
        ’30 days overdue’),’60 days overdue’),’90 days overdue’)
SQL 30
Also note that the CASE function can appear within the WHERE clause allowing the specification of complex criteria.
CUME_DIST
You all have been part of this type of ranking. When you got your SAT scores you know that you were in the top 10%
perhaps, or when you pay taxes you might feel better to at least know that you’re in the top 1% of income.
CUME_DIST is one of the analytical functions used to determine the number of values in a sorted list that came before or
are equal to the current value. The exact definition of the function is:
The ORDER BY is mandatory, since a sorted list is required. The value of CUME_DIST ranges from greater than 0 to 1.
Using a simple table of student scores, the following query returns the CUME_DIST of the score for each student.
www.nyoug.org                                             17                                            212-978-8890
        STUDENT_ID         SCORE      CUME_DIST
        1                  45         .083333333
        4                  50         .166666667
        7                  58         .25
        3                  63         .333333333
        12                 69         .416666667
        6                  72         .5
        9                  76         .583333333
        2                  85         .75
        8                  85         .75
        10                 87         .833333333
        11                 92         .916666667
        5                  98         1
        Table 7
The highest grade is determined by a CUME_DIST of 1. If the CUME_DIST column is multiplied by 100, then we have
the percentile. Student 5 would be in the 100 percentile, meaning that he did as well or better than 100% of the students.
To differentiate CUME_DIST from RANK, the difference is that RANK doesn’t inform you of a row’s value relative to
the rest; it really gives the position of the value in the list. CUME_DIST on the other hand is a relative value; if the ASC
ORDER BY is used it informs you that that portion of the set that has a value less than or equal to the row’s value. If the
DESC option was used it informs you of that portion of the set that has a value greater than or equal.
Summary
•   simplicity
•   efficiency
•   able to apply multiple analytical functions with different partitioning of the data
•   ability to display aggregates along with the detail data used to derive the values
•   great way to move details to warehouse while simultaneously storing the aggregates with the details.
•   can list aggregates on the same row where each aggregate can be derived from a different group of rows using
    partition and window clauses.
BIO
Edward Kosciuzko is a principal with Sequel Consulting, Inc., and can be reached at 973-226-7835.
www.nyoug.org 18 212-978-8890