Lecture 6: Joins
E-commerce Data: Link
RECALL QUESTIONS
Q1
Scenario: Suppose you have a simple "students" table containing student
records. You want to find the average age of students per department.
Which SQL statement using GROUP BY is correct?
a. SELECT department, AVG(age) FROM students;
b. SELECT department, age FROM students GROUP BY department;
c. SELECT department, AVG(age) FROM students GROUP BY
department;
RECALL QUESTIONS
Q1
Scenario: Suppose you have a simple "students" table containing student
records. You want to find the average age of students per department.
Which SQL statement using GROUP BY is correct?
a. SELECT department, AVG(age) FROM students;
b. SELECT department, age FROM students GROUP BY department;
c. SELECT department, AVG(age) FROM students GROUP BY
department;
RECALL QUESTIONS
Q2
Scenario: You have a "sales" table that may contain null values in the
"region" column. You want to find the total sales per region, including nulls
but give a proper name to identify null values. Which SQL statement is
correct?
a. SELECT region, SUM(sales) FROM sales GROUP BY region;
b. SELECT COALESCE(region, 'Unknown') AS region, SUM(sales) FROM
sales GROUP BY region;
c. SELECT region, SUM(sales) FROM sales WHERE region IS NOT NULL
GROUP BY region;
RECALL QUESTIONS
Q2
Scenario: You have a "sales" table that may contain null values in the
"region" column. You want to find the total sales per region, including nulls
but give a proper name to identify null values. Which SQL statement is
correct?
a. SELECT region, SUM(sales) FROM sales GROUP BY region;
b. SELECT COALESCE(region, 'Unknown') AS region, SUM(sales) FROM
sales GROUP BY region;
c. SELECT region, SUM(sales) FROM sales WHERE region IS NOT NULL
GROUP BY region;
RECALL QUESTIONS
Q3
Scenario: You have a "sales" table with "category" and "year" columns. You
want to find the total sales per category per year. Which SQL statement
using GROUP BY is correct?
a. SELECT category, year, SUM(sales) FROM sales;
b. SELECT category, year, SUM(sales) FROM sales GROUP BY category,
year;
c. SELECT category, SUM(sales), year FROM sales GROUP BY category;
RECALL QUESTIONS
Q3
Scenario: You have a "sales" table with "category" and "year" columns. You
want to find the total sales per category per year. Which SQL statement
using GROUP BY is correct?
a. SELECT category, year, SUM(sales) FROM sales;
b. SELECT category, year, SUM(sales) FROM sales GROUP BY category,
year;
c. SELECT category, SUM(sales), year FROM sales GROUP BY category;
RECALL QUESTIONS
Q4
Scenario: In a "customers" table, you want to find customers who have
placed more than 3 orders. Which SQL statement using GROUP BY and
HAVING is correct?
a. SELECT customer_id, COUNT(order_id) FROM customers GROUP
BY customer_id HAVING COUNT(order_id) > 3;
b. SELECT customer_id, COUNT(order_id) FROM customers HAVING
COUNT(order_id) > 3;
c. SELECT customer_id, COUNT(order_id) FROM customers GROUP
BY customer_id WHERE COUNT(order_id) > 3;
RECALL QUESTIONS
Q4
Scenario: In a "customers" table, you want to find customers who have
placed more than 3 orders. Which SQL statement using GROUP BY and
HAVING is correct?
a. SELECT customer_id, COUNT(order_id) FROM customers GROUP
BY customer_id HAVING COUNT(order_id) > 3;
b. SELECT customer_id, COUNT(order_id) FROM customers HAVING
COUNT(order_id) > 3;
c. SELECT customer_id, COUNT(order_id) FROM customers GROUP
BY customer_id WHERE COUNT(order_id) > 3;
SESSION AGENDA
❖ Introduction to Joins
❖ INNER JOIN
❖ SELF JOIN
❖ LEFT JOIN
❖ RIGHT JOIN
Consider this-While on holiday, you visited the
library, and the librarian, aware of the data analyst
background, sought your assistance in retrieving a
list of borrowed books along with their authors,
excluding those not borrowed by any library
members. How can you help her?
Library Database: Link
Now from the database created we can easily get the books that were borrowed from book_loans
table but how to get their authors?
The details of authors is in authors table and also the name of author associated with a book is
given in books table.
So basically, what we need to do is to combine the data from the three tables in order to get our
requirement.
● We will get the borrowed books data from book_loans table
● We will get the id of authors of a book from the books table
● We will get the author name and their details from the authors table
● After combining all these tables, we will get the final requirement
But the question is how to combine these tables?
JOINS
You can seamlessly merge data from multiple tables using SQL Joins. Let’s see how the solution with
joins looks like:
SELECT
b.book_id,
b.title AS book_title,
a.author_name
FROM
books b
INNER JOIN
authors a ON b.author_id = a.author_id
INNER JOIN
book_loans bl ON b.book_id = bl.book_id;
In this scenario, the we have used the join approach to combine data from three tables to get the
requirement, let’s understand joins first then we will come back to understand this as well.
How Joins is related to lookup
Joins do the same thing in sql which can also be done by the lookup functions in the excel. For example say you
have two sheets and you need to put the salaries of customers present in Sheet1 to the column in the sheet2
You can easily do this with the help of a lookup functions. As you are already friendly with the excel try to
perform this task in the excel first
This is the formula you can use to get the desired output
This is expected output from the task you are suppose to perform
You can do same in SQL using joins
ECOMMERCE DATABASE
Ecommerce Database
JOINS
Recall the ERD illustrating table relationships. Now, apply the same concept in DBMS by using SQL
joins to fetch results. For instance, if an ERD depicts "Customers" and "Orders" tables linked through a
one-to-many relationship, you can use a SQL join to obtain a list of customers along with their
corresponding orders.
SELECT Customers.CustomerID,
Customers.CustomerName, Orders.OrderID,
Orders.OrderDate
FROM Customers
INNER JOIN Orders ON Customers.CustomerID =
Orders.CustomerID;
JOINS
Let us understand joins with a simple puzzle analogy. Then, here it is.
Just, imagine real-world databases like intricate puzzles, with each piece of information stored in
different tables.
Sometimes, we need to connect these puzzle pieces to understand the bigger picture and make
meaningful queries.
SQL JOINS
SQL joins function as bridges, connecting data from diverse tables.
Picture tables as jigsaw puzzle pieces; joins assemble these pieces, providing flexibility in deciding how
to connect and revealing the complete picture. Various join types offer flexibility in creating
meaningful queries.
SQL JOINS
There are different types of joins such as-
INNER JOIN: Returns rows with matching values in both tables based on the specified condition,
excluding rows with no match.
SELF JOIN: Joins a table with itself, useful for comparing rows within the same table.
SQL JOINS
● LEFT JOIN: Returns all rows from the left table and matching rows from the right table. NULL
values are included for unmatched rows in the left table.
● RIGHT JOIN: Similar to LEFT JOIN, it returns all rows from the right table with matching rows
from the left table. NULL values are included for unmatched rows in the right table.
● FULL JOIN: Returns all rows with matches in either the left or right table. NULL values are
included for non-matching columns on either side.
● CROSS JOIN: Generates the Cartesian product of two tables, combining each row from the left
with every row from the right, resulting in a comprehensive result set. No join condition is
required.
While seeking solace from library duties, you visit
your MIT professor friend, now an HOD with added
responsibilities. Grateful for your arrival, he's
overwhelmed. He seeks your help with analyzing
data from two tables and merging them. How would
you assist him?
Let us understand Inner joins now.
INNER JOINS
● To combine/join tables, we need a common matching ‘key’ between the tables.
● To solve the question ,let us consider the following tables that professor has, the common key
between the two tables is ‘Student ID’ as seen below and we perform ‘Join’ based on that.
INNER JOINS
We will try now to further understand inner joins now.
Let's consider a Department Database.The schema is
given in this link.
Let's ALSO consider a Student database.The schema is
given in this link .Let us go through these in order to know
the tables we will work with.
INNER JOINS
As seen, we have to consider two tables here: The student table (that contains the Student ID and
name of the student) and the Department table (that contains the student id and department name).
Since both the tables have Student ID as a common column, we can perform an Inner Join on both
tables. Here's the syntax to do so:-
SELECT
b.book_id,
b.title AS book_title,
a.author_name
FROM
books b
INNER JOIN
authors a ON b.author_id = a.author_id
INNER JOIN
book_loans bl ON b.book_id = bl.book_id;
INNER JOINS
The above query fetches only records from both tables where the Department ID in the Student table
matches with the Department ID in the Department table. If the Department ID is NULL or not
matching it wouldn’t retrieve the record.Query is -
SELECT s.StudentID as Student_ID, s.Name as Student_Name,
d.DepartmentName as Department_Name
FROM Student s
INNER JOIN Department d ON s.DepartmentID = d.DepartmentID;
SQL INNER JOIN WITH THREE TABLES
Great! Is it possible to display the faculty name also in the previous output?Yes, by combining all 3
tables it is possible.
Syntax:
SELECT table_1.column_name1, table_2.column_name1,
table_3.column_name1, ......
FROM table_1
INNER JOIN table_2
ON table_1.matching_column_name =
table_2.matching_column_name
INNER JOIN table_3
ON table_2.matching_column_name =
table_3..matching_column_name;
SQL INNER JOIN WITH THREE TABLES
The Solution is given by this SQL command. The result displays the faculty name along with records
from both the tables.
SELECT s.StudentID as Student_ID, s.Name as Student_Name,
d.DepartmentName as Department_Name, f.FacultyName as
Faculty_Name
FROM Student s
INNER JOIN Department d ON s.DepartmentID = d.DepartmentID
INNER JOIN Faculty f ON d.DepartmentID = f.DepartmentID;
INNER JOIN with WHERE and ORDER BY clause
Now, The administration also wants to identify and list high-achieving students along with their
department and faculty information for recognition and awards. They want to filter students with a
percentage greater than or equal to 85 and present this information in descending order of
performance. SQL Query to do that -
SELECT s.StudentID as Student_ID, s.Name as Student_Name,
d.DepartmentName as Department_Name, f.FacultyName as
Faculty_Name, s.Percentage
FROM Student s
INNER JOIN Department d ON s.DepartmentID = d.DepartmentID
INNER JOIN Faculty f ON d.DepartmentID = f.DepartmentID
WHERE s.Percentage >= 85
ORDER BY s.Percentage DESC;
INNER JOIN with WHERE and ORDER BY clause
Now let's solve some questions on this topic -
1.Retrieve a list of orders along with payment details, excluding
orders without payment information.
2.You want to see a list of customer orders, including customer names
and order dates. Write a query to retrieve this information.
3.You need to find out which products were ordered, along with their
prices, in a specific order with order_id = ‘1013’. Write a query to get
this information for an order with a known order_id.
The answer of the following questions are given in the following link .
Here's a question- Getting back to work. The HR
department needs to maintain an up-to-date record
of who reports to whom. Hence, they reach out to
you asking for employee list along with their
manager names.
How can you solve this?
Self Join
Self join is when a table is joined with itself to combine related rows based on a specified condition.
Let us first create a table Employee in order to solve the previous question.
CREATE TABLE Employees(
employee_id INT PRIMARY KEY,
employee_name VARCHAR(50),
manager_id INT);
INSERT INTO Employees (employee_id,
employee_name, manager_id) VALUES
(1, 'Zaid', 3),
(2, 'Rahul', 3),
(3, 'Raman', 4),
(4, 'Kamran', NULL),
(5, 'Farhan', 4);
Now let us extract the content of this table using this command-
SELECT * FROM Employees;
Self Join
As seen, ‘Employees’ table has both employee name along with its manager details. Hence, here we
have to look into a single table twice, once for the employee and once for the managers, and then
match them up. How can we do that?
With the help of self-join, we can accomplish this. We can refer to the same table twice by giving it
different aliases so that 1 alias refers to the employee and the other one to the manager.
SELECT e.employee_name AS
employee,
m.employee_name AS manager FROM
Employees AS e
JOIN
Employees AS m
ON e.manager_id = m.employee_id;
Self Join
employee manager
For the previous query ,we get the final output as :-
Zaid Raman
The above generated output doesn’t include the employees
who don’t have a manager. Rahul Raman
Raman Kamran
Farhan Kamran
But, the above generated output doesn’t include the
employees who don’t have a manager. Is it possible
to display all the employee names irrespective of
whether they have or don’t have the managers?
Left Join
Yes this can be done by with LEFT JOIN.
LEFT JOIN retrieves all records from the left
table and matching records from the right
table.
So, here is the modified query,
The result gives all the employee even
if they don't have manager.
Now answer this,The marketing team reaches out to
you asking for the list of customers who haven’t
placed any order yet.The schema is shared with you
on next slide. How to solve this issue?
RIGHT OUTER JOIN / RIGHT JOIN
This is same Customer and product schema which we are referring that has customer and products
details etc. Consider this to solve the question of finding list of customers who haven’t placed any
order yet.
RIGHT OUTER JOIN / RIGHT JOIN
Given below is the syntax to find people who have not placed order yet.
So basically a right join in SQL returns all records from the right table and the matched records from the
left table, displaying null values for unmatched entries from the left.
SELECT
c.first_name,
c.last_name,
o.order_id
FROM
orders o
RIGHT JOIN
customers c ON o.customer_id =
c.customer_id
WHERE o.order_id IS NULL;
Brain Busters
Q1. Write an SQL query to list all products along with any orders they have been a part
of. If a product has not been included in any orders, it should still appear in your list
with the order details as NULL.(exp: 5 mins)
Brain Busters
Q2. Write an SQL query to find all pairs of customers who have placed orders on the
same date. Exclude pairs of the same customer (i.e., a customer should not be paired
with themselves). The output should include the first and last names of both customers
in each pair and the date on which they placed their orders. (exp: 5 mins)
YAY ! NOW LET'S SUMMARIZE!
❖ JOIN
❖ SELF JOIN
❖ LEFT JOIN
❖ RIGHT JOIN