0% found this document useful (0 votes)
25 views5 pages

SQL Aggregates and Joins Explained

The document explains how to use SQL for grouping data with the GROUP BY clause and filtering results using the HAVING clause after aggregation. It also covers the process of joining tables to combine related data, specifically using INNER JOIN, and highlights the importance of specifying columns when tables have identical names. Additionally, it discusses updating and deleting records in a database while emphasizing the need for caution with the WHERE clause to avoid unintended changes.

Uploaded by

duffyunoks123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views5 pages

SQL Aggregates and Joins Explained

The document explains how to use SQL for grouping data with the GROUP BY clause and filtering results using the HAVING clause after aggregation. It also covers the process of joining tables to combine related data, specifically using INNER JOIN, and highlights the importance of specifying columns when tables have identical names. Additionally, it discusses updating and deleting records in a database while emphasizing the need for caution with the WHERE clause to avoid unintended changes.

Uploaded by

duffyunoks123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Complex Aggregates, Joins and DML Commands

SQL allows us to organize rows into groups based on the value of a specific column using the GROUP
BY clause. For example, if we want to calculate the average GPA for students in each major, we can
group all rows with the same major together. This grouping happens internally, where the database
collects all rows with identical values in the specified column. Once grouped, aggregate functions like
AVG are applied to calculate a summary value for each group. In this case, SQL computes the average
GPA for every major.

The result of such a query includes both the calculated aggregate (e.g., the average GPA) and the value of
the column used for grouping (e.g., the major). Including the grouping column is crucial, as it gives
context to the calculated value. Without the grouping column, the output would simply be a list of
unrelated numbers, which would be meaningless.

SELECT AVG(StdGPA) AS "Average GPA" , StdMajor FROM Student


GROUP BY StdMajor

In SQL, when we use GROUP BY to group rows and calculate values like averages, sometimes we want
to filter the results. For example, we might only want to show groups where the average GPA is above
3.0. To do this, we use the HAVING clause.

WHERE vs. HAVING:

The WHERE clause filters rows before grouping and aggregation.

The HAVING clause filters the results after the groups are formed and calculations are done.

How HAVING Works:

After the rows are grouped and an aggregate (like average GPA) is calculated for each group, the
HAVING clause checks the condition (e.g., AVG(StdGPA) > 3.0).

Only groups that meet this condition will be shown in the result.

Example:
If we want to show the average GPA for each major but only for those with an average GPA above 3.0,
we use HAVING AVG(StdGPA) > 3.0. This filters out the majors where the average GPA is 3.0 or
lower.

So, the HAVING clause is used to filter groups based on aggregate values, and it’s applied after the data
has been grouped and the calculations are done.

SELECT AVG(StdGPA) AS "Average GPA" , StdMajor FROM Student


WHERE StdSSN LIKE '1%' OR StdSSN LIKE '%4'
GROUP BY StdMajor
HAVING AVG(StdGPA) > 3.0
In this example, a WHERE clause is added to filter rows before grouping and calculating averages. Here's
how SQL processes the query step-by-step:

Step 1: Apply WHERE Clause


SQL first checks each individual row in the student table to see if it meets the WHERE clause condition
(e.g., StdSSN LIKE '1%' OR StdSSN LIKE '%4'). Only rows that meet this condition are selected for the
next steps.

Step 2: Group Rows by StdMajor


After filtering the rows, SQL organizes them into groups based on the StdMajor column. For example, all
students in the "Accounting" major will be grouped together.

Step 3: Calculate Aggregates


SQL then calculates the requested aggregate value (e.g., average GPA) for each group.

Step 4: Apply HAVING Clause


After the aggregate calculations, SQL checks if each group meets the condition specified in the HAVING
clause (e.g., AVG(StdGPA) > 3.0). Only the groups that meet this condition will be kept.

Step 5: Output Results


Finally, SQL displays the results for the groups that meet all the conditions, including the selected
columns (e.g., "Average GPA" and "StdMajor").

When you need information from more than one table, you can combine them through a process called
"joining" tables. This helps bring together related data from different places. One common way to join
tables is through an inner join.

An inner join connects rows from both tables when they share the same value in a specific column. For
example, imagine you have two tables: one with student details and another with enrollment records. If
both tables use a column like "Student SSN" to identify students, you can join the tables based on this
matching value. This allows you to get both the student information and their corresponding enrollment
details in one result.

If two tables have columns with the same name, you need to specify which table the column comes from,
so there's no confusion. You can do this by naming the table along with the column.

An important point to note is that an inner join only includes rows where there's a match between both
tables. If there’s no matching row in one of the tables, that row won’t appear in the results.

Additionally, if you want to organize the data, you can sort it based on a specific column, such as sorting
students by their SSN.

In the example, we want to get data from both the Student and Enrollment tables. We join these tables
based on the StdSSN column, which is the unique identifier for each student. The query looks like this:

SELECT * FROM Student INNER JOIN Enrollment


ON Student.StdSSN = Enrollment.StdSSN

This query tells SQL to combine the rows from both tables where the StdSSN from the Student table
matches the StdSSN from the Enrollment table.

When two tables have columns with the same name (like StdSSN), you must tell SQL which table the
column is from by using TableName.ColumnName. For example, Student.StdSSN and
Enrollment.StdSSN.

The INNER JOIN will only include rows where both tables have matching data. If a student doesn't have
an enrollment record, their data won't appear in the result. If there’s no match, that student will be
excluded.

 ON Clause: Specifies the condition for the join (matching columns).

 ORDER BY: Sorts the results based on a column you choose.

SELECT Enrollment.* FROM Student INNER JOIN Enrollment


ON Student.StdSSN = Enrollment.StdSSn
WHERE StdName = 'Wells'

When you want to get data from one table but need to qualify it using information from another table, you
can still perform a join, but you specify which table’s details you want to see.

For example, if you only want the enrollment details for a student named "Wells" (and not their personal
details like name or SSN), you can join the Student table with the Enrollment table, but only select the
columns from the Enrollment table.

In this case, the query would join the tables based on the StdSSN (the student's SSN) and filter the data by
the student's name, "Wells." However, you only choose to display the columns from the Enrollment table.
This is done by using a wildcard (*) that refers specifically to the Enrollment table, not the Student table.

Here’s what’s happening:

The query combines the Student and Enrollment tables using the matching SSN.

The WHERE clause filters the results to only include rows where the student’s name is "Wells."

The SELECT Enrollment.* ensures that only the columns from the Enrollment table are displayed.

By using the join, you ensure the student’s name is taken from the Student table, but the final output only
shows the enrollment records for that student.

When working with more than two tables, the process involves connecting each table one by one,
typically through their primary key (PK) and foreign key (FK) relationships.
For example, suppose you need to retrieve student details, but the data that qualifies the student (like a
course prefix) is stored in another table, such as the Offering table. In this case, you first join the Student
table with the Enrollment table, then join the result to the Offering table. These joins are performed before
any conditions or filtering, and this process ensures that the required data is accurately linked across the
tables.

If you want to filter specific data from a table joined with others, you can apply conditions such as
filtering by course description. For example, to find faculty members who have taught courses related to
"business," you can join the Faculty table with the Offering table (using the faculty SSN as the link) and
then join the Course table (using the course number) to apply a condition that filters courses based on the
description.

When multiple rows might be returned for the same student, you can use the DISTINCT keyword to
eliminate duplicates. This ensures that each student's details are shown only once, even if they appear
multiple times due to matching enrollments. The DISTINCT keyword works by comparing entire rows,
so if all the columns in a row match exactly with another row, only one of those rows will be shown.

SELECT DISTINCT Student.* FROM Student INNER JOIN Enrollment


ON Student.StdSSN = Enrollment.StdSSn INNER JOIN Offering
ON Enrollment.OfferNo = Offering.OfferNo
WHERE CourseNo LIKE 'ITM%'
ORDER BY Student.StdSSN

By using the INNER JOIN operation, the query will only return students who have a matching record in
all the related tables, ensuring that you get accurate data connected through the primary and foreign keys.

In SQL, you can update the values of multiple columns in a single command. For example, you can
change both a student's major and GPA at the same time using the UPDATE statement. It is important to
include a WHERE clause to specify which rows should be updated. If you forget to include the WHERE
clause, the changes will affect all rows in the table, which is a common mistake. In more advanced
scenarios, you can use commands like COMMIT and ROLLBACK to make changes permanent or undo
them, but this is beyond the scope of basic SQL use.

UPDATE Student SET StdMajor = 'CS', StdGPA = 3.5

WHERE StdSSN = '123456789'v

In some cases, you may not want to set a column to a specific value but instead calculate the new value
based on an expression. For instance, instead of assigning a fixed GPA, you could increase the existing
GPA by 0.5 points, as shown in the example.

When using the UPDATE statement, you can change the value of a column based on an expression. For
example, you could apply a percentage increase to a faculty member's salary by multiplying the current
salary by a factor (like 1.1 for a 10% raise).
UPDATE Faculty SET FacSalary = FacSalary * 1.1

WHERE FacSSN = '987654321'

When you need to remove data from a table, the DELETE statement is used. Similar to the UPDATE
statement, a WHERE clause is crucial to ensure that only the specific rows you want to delete are
affected. Without a WHERE clause, all rows in the table would be deleted, which could be disastrous if
not done carefully.

DELETE FROM Faculty

WHERE FacSSN = '987654321'

These operations (updating and deleting data) make permanent changes to the database, so it's important
to be cautious when running them, especially when working with real-world data.

You might also like