Subquery
Subquery
A subquery is a SELECT query inside another query. It’s also called a nested query
in SQL.
SELECT salary
FROM some_other_table
That’s the subquery because it’s a SELECT query inside another query.
Let’s say you had to display a report in your application that showed all employees
that had an above average salary. To find these records, you would need to do two
things:
SELECT AVG(salary)
FROM employee;
you should get this result:
AVG(SALARY)
51200
The average salary is 51,200. Now, we can use this value to find all of the
employees whose salary is larger than this average of 51,200.
ID LAST_NAME SALARY
2 ANDERSON 60000
4 LANDY 82000
7 BROWN 93000
9 CONNOR 52000
This gives us the result we want. However, this is done in two steps.
Using a subquery will let you do this in one step. That means you won’t need to
adjust the query each time a new employee is entered to update the average salary.
It will run correctly every time.
Using a subquery in a WHERE clause means that we want to use the results of a query
as a WHERE clause for another query.
We can do this in a single step using a subquery. Our query would look like this:
Line 1: this shows the SELECT clause. These are the columns that are displayed when
the query is run: id, last_name, and salary.
Line 2: this is the FROM clause. We are selecting data from the employee table.
Line 3: This is the WHERE clause. We want to see records where the salary is
greater than something. We open a bracket here, which is closed later in the query.
Line 4: This is another SELECT clause, which selects the AVG of the salary column.
This is the start of the subquery – a query within a query.
Line 5: Another FROM clause, which is related to the subquery. It’s also the
employee table.
Line 6: We close the brackets for the subquery, and end with a semicolon.
In this query, the bold part is the subquery, or inner query. The unbolded part is
the outer query. The subquery is also indented, to make it easier to ready and
easier to identify in your script that it’s a subquery.
ID LAST_NAME SALARY
2 ANDERSON 60000
4 LANDY 82000
7 BROWN 93000
9 CONNOR 52000
It’s the same result as running it in separate queries, which we saw earlier. But
it’s run in a single query, which means it’s easier for you to run and will
properly cater to any changes in the data.
The subquery finds the maximum salary from the employee table: SELECT MAX(salary).
Also, the WHERE clause in the outer query says WHERE salary =. This means that the
salary of the employee needs to be equal to the maximum, which is from the
subquery.
ID LAST_NAME SALARY
7 BROWN 93000
This shows Brown with the highest salary. You can check this by running a query
using ORDER BY:
This means that your subquery has returned more than one row. The subquery can be
run by itself, which shows these results:
SELECT salary
FROM employee
WHERE last_name LIKE 'C%';
SALARY
21000
52000
If these values were used in your query instead of the subquery, it would look like
this:
That’s why you get the ORA-01427: single-row subquery returns more than one row.
Your subquery has returned more than one row, and the outer query expects a single
value (because of an = sign). Resolve this by using an IN operator of changing your
subquery.
Example: IN Operator
You can match several values in a subquery by using an IN operator. The IN operator
will check that the value you specify matches any of the values inside the IN
operator.
ID LAST_NAME SALARY
5 CHARLESTON 21000
9 CONNOR 52000
There are all kinds of uses for subqueries. Whenever you think you need to use the
result of one query as an input or a filter in another query, a subquery is
probably the best way to do it.
Let’s say you had this query which showed you the average salary per department:
SELECT dept_id, ROUND(AVG(salary), 2) AS avg_salary
FROM employee
GROUP BY dept_id;
The results are:
DEPT_ID AVG_SALARY
1 55333.33
2 51333.33
3 48000
You have two columns here: dept_id and avg_salary, with three results Let’s say you
wanted to treat these results as a table without actually creating a table. You can
do this by using this query as a subquery in the from clause.
SELECT dept_id, avg_salary
FROM (
SELECT dept_id, ROUND(AVG(salary), 2) AS avg_salary
FROM employee
GROUP BY dept_id
);
Running this query will give you the same results:
DEPT_ID AVG_SALARY
1 55333.33
2 51333.33
3 48000
However, you can use the results of this subquery to join to other tables:
SELECT
sub.dept_id,
d.dept_name,
sub.avg_salary
FROM
(
SELECT dept_id, ROUND(AVG(salary), 2) AS avg_salary
FROM employee
GROUP BY dept_id
) sub
INNER JOIN department d ON sub.dept_id = d.id;
This query does a few things.
First, we’re selecting columns from two tables: the sub and d tables. The sub table
is actually the result of the subquery, which is the dept_id and avg_salary
columns.
We are then joining that subquery to the department table. The subquery has been
named sub, and it is then treated just like a table or a view. This type of query
is called an inline view.
What is an inline view? It’s a subquery that’s inside the FROM clause of a query.
It’s called an inline view because it acts just like a view object, but no view
object is created on the database.
If a subquery in the WHERE clause acts as a filter, and a subquery in the FROM
clause acts as a view, then a subquery in the SELECT clause acts like a column that
is to be shown in your output.
To use a subquery in your SELECT clause, you add it in the place of a column.
For example, let’s say you wanted to show the average salary alongside each
employee record.
Let’s say you wanted to see each employee’s salary and the percentage of that
salary compared to the average. You can use the earlier query, which showed each
employee along with the average salary for all employees:
SELECT id, last_name, salary, (
SELECT AVG(salary)
FROM employee
) AS avg_salary
FROM employee;
ID LAST_NAME SALARY AVG_SALARY
1 SMITH 40000 51200
2 ANDERSON 60000 51200
3 JONES 45000 51200
4 LANDY 82000 51200
5 CHARLESTON 21000 51200
6 JOHNSON 51000 51200
7 BROWN 93000 51200
8 HARDEN 29000 51200
9 CONNOR 52000 51200
10 PIERCE 39000 51200
You could adjust your query like this to show the percentage of the average salary:
SELECT id, last_name, salary, (
SELECT AVG(salary)
FROM employee
) AS avg_salary,
ROUND(salary / (
SELECT AVG(salary)
FROM employee
), 2) AS pct_avg_salary
FROM employee;
However, this is running the same subquery twice, which is a waste. The results
are:
SELECT id,
last_name,
salary
avg_salary,
ROUND(salary/avg_salary, 2) AS pct_avg_salary
FROM (
SELECT id, last_name, salary, (
SELECT AVG(salary)
FROM employee
) AS avg_salary
FROM employee
);
This query will find all employee records, show the average salary for each of
them, and then use the results of that in the outer query.
This shows that you can use several levels of subqueries.
Correlated Subqueries
The final concept on subqueries that I’ll cover is a correlated subquery.
What’s a correlated subquery? It’s when a subquery refers to a column that exists
in the outer query. The subquery and the outer query are said to be correlated, as
they are linked to each other.
So far, our subqueries are independent queries, with the results used inside an
outer query. However, you can refer to a column in the outer query from within the
subquery.
An example of this would be a query to find the employees with a salary greater
than the average salary in their department.
To find the average salary in the department, we can use this query:
DEPT_ID AVG(SALARY)
1 55333.33333
2 51333.33333
3 48000
Now, to find the employees with a salary greater than the average salary in their
department, we can use a subquery. The subquery needs to match one of these
averages here to the employee record, based on their department ID.
However, this is not just a simple average. It looks for the average where the
dept_id matches the employee’s dept_id from the outer query.
This allows the query to find the average for each employee’s department, and check
if their salary is greater than this average.
Conclusion
Subqueries are powerful tools to use in your SQL queries. It adds to the complexity
of a query, but if your requirements say that you need a certain data set, often
the only way to do that is using a subquery.