Lecture - 01 - Upgraded
Lecture - 01 - Upgraded
8311
Instructor:
▪ Eric Maniraguha | [email protected] | LinkedIn Profile
6h00 pm – 9h50 pm
▪ Monday A -G207
▪ Tuesday B-G204
▪ Wednesday E-G207
▪ Thursday F-G307
January 2025 1
Database Development with PL/SQL
Reference reading
▪ What is a relational database?
▪ What is RDBMS(Relational Database Management System)?
▪ MySQL RDBMS
▪ Normalization in SQL DBMS: 1NF, 2NF, 3NF, and BCNF Examples
▪ Hacker Rank : Skills speak louder than words
▪ SQL indexing best practices | How to make your database FASTER!
▪ Normal Forms Introduction
▪ SQL Window Functions
3
Types of SQL Commands (Recap)
Objective: Refresh the foundational knowledge of SQL and its command categories.
SQL is a powerful tool primarily used for querying and manipulating data within databases. It enables users to:
▪ Insert new data.
▪ Update existing records.
▪ Delete unnecessary or outdated information.
▪ Retrieve data efficiently.
In addition to handling data, SQL is also used to define and modify database structures—including tables, indexes, and constraints—ensuring smooth database
management.
Data Definition
Language (DDL):
1.Used to define or modify
the structure of a
database.
Data Manipulation Data Query Language Data Control Language
2. Includes commands: Language (DML): (DQL): (DCL):
▪ CREATE – Creates new
database objects (e.g., 1.Focuses on manipulating 1.Used to query and 1.Manages permissions
tables). data stored in database retrieve data from the and access control
objects. database. within the database.
▪ ALTER – Modifies
existing database 2.Includes commands: 2.Includes the command: 2.Includes commands:
structures. ▪ INSERT – Adds new ▪ SELECT – Fetches data ▪ GRANT – Assigns
▪ DROP – Deletes records to a table. based on specified permissions to users.
database objects. ▪ UPDATE – Modifies criteria. ▪ REVOKE – Removes
▪ TRUNCATE – Removes existing records. permissions from users.
all records from a table ▪ DELETE – Removes
but keeps its structure. records from a table. 4
Understanding RDBMS
What is RDBMS?
A Relational Database Management System (RDBMS) is a database management system that uses a relational model to organize and store data. In a
relational database, data is organized into tables, also known as relations, which consist of rows and columns. Each row represents a single record, and
each column represents a specific data field. RDBMS uses a structured query language (SQL) to access and manipulate the data stored in the tables.
SQL allows users to insert, update, delete and query data in the tables. It also allows creating, altering, and deleting tables and other database objects.
A Relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model as
introduced by E. F. Codd in 1970.
5
Database Management System (DBMS):
Workflow, Components, and Functionality
The previous diagram 2. Database
illustrates the architecture Management System
and workflow of a (DBMS): The core of the
Database Management 1. Input Data Sources: system that manages and 3. Business Logic: 4. Output for Users: Summary:
System (DBMS). Here's processes the data. It
a detailed explanation of includes the following
each component: functionalities:
6
Understanding RDBMS - Table
What is a Table?
The data in an RDBMS is stored in database objects known as tables. This table is basically a collection of related data entries and it consists of numerous
columns and rows.
Remember, a table is the most common and simplest form of data storage in a relational database. Following is an example of a CUSTOMERS table which
stores customer's ID, FirstName, LastName, Birthdate.
7
RDBMS Terminologies
Properties of a row:
▪ No two tuples are identical to each other in all their
entries.
▪ All tuples of the relation have the same format and the
same number of entries.
▪ The order of the tuple is irrelevant. They are identified
by their content, not by their position.
Properties of an Attribute:
▪ Every attribute of a relation must have a name.
▪ Null values are permitted for the attributes.
▪ Default values can be specified for an attribute
Source Image: https://www.javatpoint.com/what-is-rdbms
automatically inserted if no other value is specified for
an attribute.
▪ Attributes that uniquely identify each tuple of a
relation are the primary key.
8
Difference Between RDBMS and NoSQL (Non-
Relational Databases)
Definition
▪ Relational Database (RDBMS):
Data is stored in tables with rows and columns, where relationships between data are defined using keys.
▪ Non-relational Database (NoSQL):
Data is stored in hierarchical, document-based, key-value pairs, or graph structures, making it flexible for unstructured, semi-structured, and
structured data.
Source Image: Data in non-relational
databases often resemble dictionary-like structures,
akin to those in Python. Below is an example showcasing
this relationship:
NoSQL databases offer greater schema flexibility than RDBMS, which can be advantageous in certain use cases. For example, suppose you are implementing an IoT platform
that stores data from different kinds of sensors.
10
Relational vs Non-Relational Databases: Key
Features and Differences
Criteria Relational Database (RDBMS) Non-relational Database (NoSQL)
Storage Capacity Suitable for medium to large datasets. Ideal for big data and high-volume data storage.
Flexibility Requires fixed schemas for data storage and retrieval. Schema-less, allowing dynamic changes in data models.
Performance Slower for complex queries involving joins. Faster performance for queries on large datasets.
Query Language Uses SQL (Structured Query Language) for data manipulation. Uses NoSQL-specific languages or APIs.
Data Processing Best for transactional systems requiring structured relationships. Best for real-time analytics and handling diverse data types.
11
Backup and Consistency Easier to maintain backups and consistency. More complex backup and consistency management.
RDBMS vs NoSQL: Pros, Cons, and Key Use
Cases
1. Advantages (Pros) 2. Disadvantages (Cons)
RDBMS NoSQL
RDBMS NoSQL
MySQL, SQL Server, Oracle Database, PostgreSQL – Best for transactional systems like banking, inventory, and CRM systems.
NoSQL Examples:
12
MongoDB, Cassandra, CouchDB, DocumentDB – Ideal for applications handling big data, IoT, content management systems, and real-time analytics.
Key Features of RDBMS (Relational Database
Management System)
Structured Storage
Indexing
13
ACID Properties in DBMS - Student Notes
1. Atomicity
▪ Ensures transactions are all-or-nothing.
▪ Abort: Rolls back changes if the transaction fails.
▪ Commit: Saves changes if the transaction succeeds.
Example: Transferring 100 from X to Y—either both debit and credit happen, or
none.
2. Consistency
▪ Maintains database correctness before and after a transaction.
▪ Integrity constraints must always be satisfied.
Example: Total balance before and after transfer must remain unchanged.
3. Isolation
▪ Ensures transactions execute independently.
▪ Changes are not visible to others until committed.
Example: Concurrent transactions must avoid interference to prevent inconsistent
results.
4. Durability
▪ Guarantees committed transactions are permanent. Source Image: https://www.geeksforgeeks.org/acid-properties-in-dbms
Reduce
Advanced
the paper
reporting
Customer Relationship Management System work
Recruitment Payroll Financial
Human Management
Resource System
Management
System Ensure
Complete
Data
Audit
Security
Employee Organizational
Data
Benefits Efficiency Integrity
15
Key Differences Between Primary Key ,
Foreign Key and Indexes
Aspect Primary Key Foreign Key Index
Speeds up query performance, improves
Purpose Enforces data integrity by ensuring uniqueness. Establishes a relationship between tables.
data retrieval.
Yes, unless the foreign key is part of a Yes, NULL values are allowed unless part
Null Values No, must not contain NULL values.
unique constraint. of unique index.
Automatically created when a Primary Key is No automatic index creation, index must Index must be explicitly created for query
Index Creation
defined. be created explicitly. optimization.
Yes, ensures that no duplicate or NULL values Ensures referential integrity between No, it is used for optimization purposes
Enforcement of Data Integrity
exist in the column. tables. only.
Links a column in one table to the Primary Does not enforce any relationship
Relation to Tables Unique identifier for rows in the same table.
Key of another table. between tables.
Number Allowed per Table Only one Primary Key per table. Multiple Foreign Keys can exist in a table. Multiple indexes can be created per table.
DELETE FROM Customers WHERE CustomerID = DELETE FROM Orders WHERE OrderID = DROP INDEX idx_lastname ON
Delete
101; 150; Customers;
UPDATE Customers SET CustomerID = 102 WHERE UPDATE Orders SET CustomerID = 200 (No direct update; indexes are
Update
CustomerID = 101; WHERE OrderID = 150; 17 changes)
automatically updated as data
Data normalization
Data normalization is the process of organizing a database to reduce redundancy and improve data integrity. It involves structuring the database in a way that minimizes
duplication of data and ensures that the data is stored efficiently. The goal is to ensure that the data is stored in such a way that it is easy to maintain, update, and query.
The process of normalization typically follows a series of steps called normal forms. Each normal form builds upon the previous one to ensure that the database structure is optimized.
Key Concepts in Data Eliminate Redundant Data: Store data in such a way that the same information is not repeated in multiple places. This reduces the risk
Normalization: of inconsistency.
Ensure Data Integrity: By organizing data into logical units (tables), normalization ensures that changes to data are consistent across
the system and prevents anomalies like insert, update, or delete anomalies.
Improve Query Efficiency: A well-normalized database allows for more efficient queries by reducing the amount of unnecessary data
that needs to be processed.
18
Source Image: https://algodaily.com/lessons/normalization-sql-normal-forms
Normalization progresses from UNF → 1NF →
2NF → 3NF → BCNF
The previous diagram illustrates the process of database normalization, transitioning from an Unnormalized Form (UNF) to the Third Normal Form (3NF) and beyond.
Each step ensures better organization and efficiency of the database.
Boyce-Codd
First Normal Second Normal Third Normal
Normal Form
Form (1NF) Form (2NF) Form (3NF)
(BCNF)
2. 1NF:
▪ Repeating groups (e.g., columns F, G, H) are removed.
▪ The data is split into two tables: one for attributes A, F, G, H and another for A, B, C,
D, E.
3. 2NF:
▪ Partial dependencies are removed:
o In A, F, G, H, split to isolate A, F and G, H based on dependencies.
o In A, B, C, D, E, it is already in 2NF if A is the primary key.
3. 3NF:
▪ Transitive dependencies are removed:
▪ In A, B, C, D, E, isolate D, E into a separate table because they depend on
each other.
▪ Final tables include:
a. A, F
b. F, H
c. A, B, C, D
d. D, E
20
RULE 1, RULE 2, RULE 3, RULE 4
RULE 1 RULE 3
RULE 2 RULE 4
1. Each column should 1. Each column should have a
1. A Column should contain values 1. Order in which data is saved
contain atomic values. unique name
that are of the same type does not matter
2. Entries like X,Y and W,X 2. Same names leads to
2. Do not inter-mix different types of 2. Using SQL query., you can
violate this rule. confusion at the time of data
values in any column easily fetch data in any way
retrieval.
order from a table.
21
Every Table in your database should at least follow the 1st Normal Form, always or Stop Using Database
Normalization Practice Exercise I | Third
Normal Form| Denormalization
StudentI Grad
CourseID StudentName Phone Number CourseName Teacher TeacherEmail
D e
Introduction to Identify functional
101 CSE101 John Doe 1234567890 A Dr. Smith [email protected]
CS
dependencies in this table
Introduction to and normalize it to Third
102 CSE101 Jane Smith 9876543210 B Dr. Smith [email protected]
CS Normal Form (3NF).
Prof. Provide the resulting tables
101 MATH201 John Doe 1234567890 Calculus I C [email protected] with primary and foreign
Johnson
keys clearly identified.
Advanced
104 ENG301 Alice Brown 4567891230 A Dr. Clark [email protected]
English
Introduction to
102 CSE101 Jane Smith 9876543210 B Dr. Smith [email protected]
CS
106 BIO101 Michael Green 7891234560 Biology Basics A- Dr. Wilson [email protected]
107 PHY101 Sarah Johnson 3216549870 Physics Principles B+ Dr. Lewis [email protected]
Advanced
104 ENG301 Alice Brown 4567891230 B Dr. Clark [email protected]
English
No Multivalued Attribute
Prof. 1NF
106 MATH201 Michael Green 7891234560 Calculus I B [email protected]
Johnson
Introduction to No Partial Dependency 2NF
101 CSE101 John Doe 1234567890 A+ Dr. Smith [email protected]
CS
3NF
Demoralize Table 22
Normalization Practice Exercise I | Second
Normal Form
Student Table Student-Course Table
StudentID StudentName Phone Number StudentID CourseID Grade
ENG301 Advanced English Dr. Clark [email protected] The partial dependency here we consider teacher’s column as it depend to coursed.
BIO101 Biology Basics Dr. Wilson [email protected] Not in 3NF as there are transitive dependency like teacher.
PHY101 Physics Principles Dr. Lewis [email protected]
NB: Our objectives is to remove Teacher from Course Table
23
Normalization Practice Exercise I | Third
Normal Form
Student Table Student-Course Table
StudentID StudentName Phone Number StudentID CourseID Grade
We start with a denormalized table containing repeating groups, which To bring the table into 1NF, we eliminate repeating groups and ensure
violates 3NF due to transitive dependencies. The goal is to normalize that each record contains atomic values (no multiple values in a single
it through 1NF, 2NF, 3NF, and finally BCNF (also referred to as 3.5NF in cell). In this case, we simply separate each subject and professor
some contexts). combination into its own row.
To bring the table into 2NF, we need to eliminate partial dependencies, which occur when a non-key attribute depends on only a part of the composite key.
The composite key here is a combination of Student_ID and Subject, because both are needed to uniquely identify each record.
▪ Professor depends on Subject, but not on the full composite key (Student_ID + Subject).
▪ So, we need to split the table into two:
Subject Professor
Student_ID Subject
(Student_ID , Subject) Professor
PLSQL Dr. Smith
101 PLSQL
Professor Can find Subject
Dr.
101 RDBMS RDBMS
Johnson
101 Java Java Dr. Lee
In this step, we've eliminated partial dependency by splitting the tables such that Professor now depends only on Subject, and Student_ID is now associated 26
only with the Subject in the Students Subjects Table.
Scenario: College Enrollment Table | 3.5 NF or
Boyce-Codd Normal Form (BCNF) (3/4)
Step 4: Third Normal Form (3NF)
To bring the table into 3NF, we must eliminate transitive dependencies, where a non-key attribute depends on another non-key attribute.
In this case:
▪ Professor depends on Subject (i.e., a subject has one professor).
▪ But Student_ID depends directly on Subject, and indirectly on Professor.
So, we can split the tables further:
Subjects Table (3NF)
Students Table (3NF)
Subject Professor
Student_ID Subject
PLSQL Dr. Smith
101 PLSQL
Dr.
101 RDBMS RDBMS
Johnson
101 Java Java Dr. Lee
103 Java
104 RDBMS
Here, Subjects Table contains unique subjects with their corresponding professors, and Students Table just links students with subjects. There is no transitive27
dependency anymore, as Professor directly depends on Subject, not indirectly through Student_ID.
Scenario: College Enrollment Table | 3.5 NF or
Boyce-Codd Normal Form (BCNF) (4/4)
Step 5: Boyce-Codd Normal Form (BCNF or 3.5NF)
To bring the table to BCNF (also referred to as 3.5NF), we need to ensure that for every functional dependency, the left-hand side (determinant) is a superkey.
In the Subjects Table:
▪ Subject → Professor (But Subject is not a superkey in this case because a subject can have only one professor, but Professor is not uniquely identifying a record.)
To resolve this, we split the tables further.
Students Table (BCNF) Professors Table (BCNF) Subjects Table (BCNF) Conclusion
Student_ID Subject Professor_ID Professor Subject Professor_ID 1. 1NF: We ensured that each field contains
atomic values.
101 PLSQL 2. 2NF: We removed partial dependencies.
1 Dr. Smith PLSQL 1
3. 3NF: We removed transitive dependencies.
101 RDBMS 4. BCNF (3.5NF): We ensured that every
2 Dr. Johnson RDBMS 2 determinant is a superkey, thus eliminating any
101 Java remaining redundancies.
3 Dr. Lee Java 3
102 Big Data This normalized schema ensures consistency,
4 Dr. Brown Big Data 1 avoids data anomalies, and adheres to Boyce-
102 PLSQL Codd Normal Form (BCNF or 3.5NF).
Let me know if you need further clarification!
103 Java
104 RDBMS
28
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form (BCNF) is an advanced version of the Third Normal Form (3NF) used in database normalization. It ensures a higher level of data integrity
by addressing anomalies that 3NF might not handle effectively.
BCNF Rules:
▪ The table must first satisfy 3NF.
▪ For every functional dependency (A → B), A must be a superkey (i.e., it should uniquely identify all rows in the table).
29
Summary Table
Normal Form Key Concept Goal
1NF Eliminate repeating groups and multivalued cells. Each cell contains atomic values.
2NF Eliminate partial dependencies. Every non-key column depends on the whole primary key.
3NF Eliminate transitive dependencies. No non-key column depends on another non-key column.
a column or set of
columns that
functionally determines
another column
30
Understanding Relationship Types Between
Tables in Database
Relationship Type Description Real-Life Example
▪ Patients and their medical details in a hospital database.
One-to-One (1:1) Each record in one table corresponds to one record in another table. ▪ Employees and salary details in a firm.
These relationship types form the foundation of relational database systems like Oracle,
▪ Relationships help organize and connect data
enabling efficient data management and retrieval.
across multiple tables, improving database
▪ Popular Relational Databases: Common RDBMS include:
normalization.
▪ Primary keys and foreign keys ensure data
integrity and enforce constraints.
▪ Proper design of relationships enhances
scalability, flexibility, and performance in
relational databases.
31
Key Components in Databases with Multiple
Tables
Relationships:
▪ Events and Venues: Each event takes place at one venue, and a venue can host
multiple events (one-to-many).
▪ Events and Clients: Each event is organized by one or more clients, and a client can
organize multiple events (many-to-many).
▪ Events and Vendors: Each event may involve multiple vendors, and each vendor can
participate in multiple events (many-to-many).
Summary:
This design allows for efficient data management, facilitating queries about events, their
locations, clients, and associated vendors. By utilizing foreign keys, it establishes
relationships that maintain data integrity and enable relational querying across these
interconnected tables.
Relational databases offer a robust and efficient way to manage structured data, providing
fast, reliable access through powerful indexing and relationship management features. 32
Relational Database relationships (1:1)
1. One-to-One Relationship
Definition: Each record in Table A is linked to only one record in Table B, and vice versa.
▪ Example: A person’s passport is linked to one unique individual, or a person’s home address may be tied to a specific ZIP code.
▪ Use Case: One-to-one relationships are often used to separate sensitive data for security purposes or to limit access. For example,
separating a patient’s contact information from their medical history allows different levels of access for administrative and medical staff.
Key Points:
▪ Each country has only one capital city, and each capital city
corresponds to exactly one country.
▪ The one-to-one relationship ensures that there is a direct, unique
association between the two tables.
How It Works:
▪ Each CountryID in the Capitals Table uniquely links a capital to a
specific country.
▪ The CountryID in the Capitals table acts as a Foreign Key that
references the Primary Key of the Countries table.
▪ For example, France (CountryID = 1) is linked to Paris, and Germany Source Image: https://phoenixnap.com/kb/database-relationships
(CountryID = 2) is linked to Berlin.
33
Relational Database relationships (1:N)
2. One-to-Many Relationship
Definition: A single record in Table A can be associated with multiple records in Table B, but each record in Table B is linked to only one
record in Table A.
▪ Example:
▪ A customer (Table A) may place many orders (Table B), but each order belongs to only one customer.
▪ A book (Table A) can have multiple authors (Table B), but each author is linked to one book in that context.
▪ Use Case: This is the most common relationship type and is used to represent hierarchical data or parent-child relationships, such as
customers and orders, products and categories, or cities and ZIP codes.
Key Points:
▪ The Primary Key (MotherID) uniquely identifies each record in the
Mothers table.
▪ The Foreign Key (MotherID in the Children table) establishes a link to the
Mothers table.
▪ Multiple records in the Children table can have the same MotherID,
meaning multiple children can belong to the same mother.
How It Works:
▪ The MotherID column in the Children table acts as the foreign key that
connects each child to their respective mother.
▪ Even though multiple children can have the same MotherID, each child is
associated with only one mother, illustrating the one-to-many relationship. 34
Source Image: https://phoenixnap.com/kb/database-relationships
Relational Database relationships (N:N)
3. Many-to-Many Relationship
Definition: Records in Table A can relate to multiple records in Table B, and records in Table B can relate to multiple records in Table A.
▪ Example:
▪ A student (Table A) can enroll in many courses (Table B), and each course can have many students.
▪ A book (Table A) can belong to many categories (Table B), and each category can include multiple books.
▪ Use Case: To manage many-to-many relationships, you need an intermediate table (often called a junction or linking table) to connect the
two tables. For example, a "StudentCourses" table could link students to the courses they are enrolled in.
Key Points:
▪ Junction Table (BookAuthors) is essential to establish and
manage the many-to-many relationship.
▪ Books: A single book can have multiple authors.
▪ Authors: A single author can write multiple books.
Use Case:
▪ Track relationships within a family or genealogy.
▪ Model nested departmental structures (e.g., Head Office → IT → Software Development).
Key Points:
The diagram represents a self-referencing relationship in the Employee table:
Table Structure:
▪ employee_id: Primary key for each employee.
▪ firstname and lastname: Employee's name.
▪ manager_id: Foreign key referencing employee_id within the same table.
Relationship:
▪ One manager (employee_id) can supervise multiple employees (manager_id).
▪ An employee's manager_id refers to their manager's employee_id.
Special Case:
Source Image: https://www.viralpatel.net/hibernate-self-join-a
▪ manager_id = NULL indicates an employee without a manager (e.g., CEO). one-to-many-mapping/
This design models hierarchical structures, such as employee-manager relationships, in a single table.
36
Types of Keys in Relational Model (Candidate,
Super, Primary, Alternate and Foreign)
▪ Candidate Key: A set of one or more columns that can uniquely identify a row in a table.
Every table can have multiple candidate keys, but one is chosen as the primary key.
▪ Super Key: A set of columns that uniquely identify a row in a table. A super key may
include additional attributes not necessary for unique identification, unlike a candidate key.
Example: In Table-1, Primary key, Unique key, Alternate key are a subset of Super Keys.
▪ Alternate Key: Any candidate key that is not chosen as the primary key. Alternate keys
are unique and serve as backup identifiers.
▪ Composite Key: A primary or candidate key that consists of two or more columns
combined to uniquely identify a row in a table.
37
Source Image: https://www.geeksforgeeks.org/types-of-keys-in-relational-model-candidate-super-primary-alternate-and-foreign/v
Constraints and Data Integrity
1. Unique Constraint:
▪ This constraint ensures that all values in a specific column (or combination of columns) are unique across the table, preventing
duplicate values.
Example:
CREATE TABLE Users ( UserID INT PRIMARY KEY, Email VARCHAR(100) UNIQUE );
2. Not Null:
▪ A NOT NULL constraint prevents null values from being inserted into a column, ensuring that every record has a valid entry in that
field.
Example:
CREATE TABLE Products ( ProductID INT PRIMARY KEY, ProductName VARCHAR(100) NOT NULL );
3. Check Constraint:
▪ This constraint enforces a specific condition that the data in a column must meet. If the condition is not satisfied, the database will
reject the entry.
Example:
CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY, Age INT, CHECK (Age >= 18) );
38
Creating and Modifying Constraints
Defining Constraints During Table Creation:
Constraints can be defined at the time of table creation, as shown in the examples above. By specifying constraints at this stage, you ensure that
only valid data is allowed into the table from the start.
ALTER TABLE Employees ADD CONSTRAINT fk_department FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID);
39
SQL Joins
Objective: Teach how to combine data from multiple tables, use subqueries, and apply aggregate functions.
1. SELECT Statement:
▪ The SELECT statement is used to retrieve data from one or more tables in a database. It allows you to specify which columns you want to retrieve.
▪ Using subqueries within SELECT, INSERT, UPDATE, and DELETE.
▪ Example: SELECT FirstName, LastName FROM Customers;
2. WHERE Clause:
▪ The WHERE clause is used to filter rows in a query based on specified conditions, allowing you to retrieve only the records that meet those criteria.
▪ Example: SELECT * FROM Orders WHERE OrderDate = '2024-08-30’;
3. JOIN:
A JOIN combines rows from two or more tables based on a related column. This allows you to link related data stored in different tables.
▪ INNER JOIN
Combines rows from two or more tables and returns only the rows with matching values in the specified columns.
▪ LEFT JOIN (LEFT OUTER JOIN)
Returns all rows from the left table and the matching rows from the right table. Non-matching rows in the right table are filled with NULL.
▪ RIGHT JOIN (RIGHT OUTER JOIN)
Returns all rows from the right table and the matching rows from the left table. Non-matching rows in the left table are filled with NULL.
▪ FULL JOIN (FULL OUTER JOIN)
Returns all rows from both tables. Non-matching rows in either table are filled with NULL.
▪ CROSS JOIN
Produces the Cartesian product of two tables, pairing each row from the first table with every row from the second table.
▪ UNION
Combines the results of two or more SELECT queries into a single result set. Duplicate rows are removed by default unless UNION ALL is used.
▪ UNION ALL
Returns all records, including duplicates. It's often used in reporting and analysis. 40
SQL Joins |Inner|Outer|Right |Left | Cross
Join
Customer_N
Event_ID Event_Name Event_Date Location Email Phone
ame
Wedding john.doe@gm
101 2024-05-20 Grand Hall John Doe 123-456-7890
Ceremony ail.com
Corporate Conference jane.smith@g
102 2024-06-15 Jane Smith 987-654-3210
Meeting Room mail.com
bob.johnson@
103 Birthday Party 2024-07-10 Banquet Hall Bob Johnson 456-789-0123
gmail.com
43
Solution Left & Right Join
Customer_ID Customer_Name Customer_Email Event_ID Event_Name
Left Join
SELECT c.Customer_ID, c.Customer_Name, c.Email AS Customer_Email, 1 John Doe [email protected] 101 Wedding Ceremony
e.Event_ID, e.Event_Name
FROM Customers c 2 Jane Smith [email protected] 102 Corporate Meeting
LEFT JOIN Events e
ON c.Customer_Name = e.Customer_Name; 3 Bob Johnson [email protected] 103 Birthday Party
44
Solution Left & Right Join WHERE B or A IS
NULL |
Left Join Where B is Null
Result:
No results since all
events have
associated
customers.
Right Join Where A is Null
SELECT e.Event_ID, e.Event_Name, e.Location
FROM Customers c Event_ID Event_Name Location
RIGHT JOIN Events e
ON c.Customer_Name = e.Customer_Name
WHERE c.Customer_ID IS NULL;
45
Solution Full Outer Join | Full Outer Join
WHERE B IS NULL OR A IS NULL
Full Outer Join Customer_ID Customer_Name Event_ID Event_Name
SELECT c.Customer_ID, c.Customer_Name, e.Event_ID, e.Event_Name 1 John Doe 101 Wedding Ceremony
FROM Customers c
FULL OUTER JOIN Events e 2 Jane Smith 102 Corporate Meeting
ON c.Customer_Name = e.Customer_Name;
3 Bob Johnson 103 Birthday Party
46
Solution Cross Join
Customer_ID Customer_Name Event_ID Event_Name
47
Solution Union | Union ALL
Name Name
Union Union All
John Doe John Doe
SELECT Customer_Name AS Name Jane Smith SELECT Customer_Name AS Name Jane Smith
FROM Customers FROM Customers
UNION Bob Johnson UNION ALL Bob Johnson
SELECT Event_Name AS Name SELECT Event_Name AS Name
FROM Events; Patrick Kaka FROM Events; Patrick Kaka
▪ Definition: Projection refers to selecting specific columns (attributes) from a table. Instead of retrieving all columns, you can retrieve only those
that are relevant.
▪ How it works: The projection operation is performed using the SELECT clause, where you specify the column names you want to display.
▪ Example: Projecting (selecting) the EmployeeID and Name columns from the Employees table:
In this case, only the EmployeeID and Name columns will be returned, while the other columns (e.g., Salary, DepartmentID) will be ignored.
You can combine both selection and projection in a single query, as shown below:
In this example, you are projecting (selecting) only the EmployeeID and Name columns and filtering the rows where Salary is greater than 50,000
(selection).
Key Points:
Value Access
Aggregate Functions Ranking Functions
Functions
AVG(): Calculates the average value in the ROW_NUMBER(): Assigns a unique sequential LAG(): Accesses a value from a previous row in
window. number to each row in the window. the window.
MAX(): Finds the maximum value in the RANK(): Assigns a rank to each row, with gaps LEAD(): Accesses a value from a subsequent
window. for ties. row in the window.
MIN(): Finds the minimum value in the DENSE_RANK(): Assigns a rank to each row FIRST_VALUE(): Returns the first value in the
window. without gaps for ties. window.
SUM(): Calculates the total sum of values in the PERCENT_RANK(): Computes the relative LAST_VALUE(): Returns the last value in the
window. rank of a row as a percentage. window.
COUNT(): Counts the number of rows in the NTILE(): Divides rows into a specified number NTH_VALUE(): Returns the nth value in the
window. of groups and assigns a group number. window.
They are called window functions because they operate over a "window" or a specific set of rows in a result set, rather than the entire dataset. 51
Window Function Definitions
Window Function Type Definition Example
Assigns a unique row number starting at 1 for each row SELECT ROW_NUMBER() OVER (PARTITION BY column_name ORDER BY another_column) AS row_num,
ROW_NUMBER() Ranking
in the result set. column_name FROM table_name;
Assigns a rank to each row, but skips ranks when there SELECT RANK() OVER (PARTITION BY column_name ORDER BY another_column) AS rank, column_name
RANK() Ranking
are ties. FROM table_name;
Similar to RANK(), but no ranks are skipped when there SELECT DENSE_RANK() OVER (PARTITION BY column_name ORDER BY another_column) AS dense_rank,
DENSE_RANK() Ranking
are ties. column_name FROM table_name;
Divides the result set into n approximately equal parts SELECT NTILE(4) OVER (PARTITION BY column_name ORDER BY another_column) AS quartile,
NTILE(n) Distribution
and assigns each row a bucket number. column_name FROM table_name;
Calculates the sum of values over a specified range of SELECT column_name, SUM(column_name) OVER (PARTITION BY another_column ORDER BY
SUM() Aggregation
rows. yet_another_column) AS total_sum FROM table_name;
Provides access to the next row’s value in the result SELECT column_name, LEAD(column_name) OVER (PARTITION BY another_column ORDER BY
LEAD() Value Access
set. yet_another_column) AS next_value FROM table_name;
Provides access to the previous row’s value in the result SELECT column_name, LAG(column_name) OVER (PARTITION BY another_column ORDER BY
LAG() Value Access
set. yet_another_column) AS previous_value FROM table_name;
▪ Write a query to retrieve the top two employees (based on EMP_ID in descending order) from each department.
▪ How does the ROW_NUMBER() function with PARTITION BY DEPT_NAME help in ranking employees within each department?
▪ Modify a query to include a ranking column for employees within each department, ordered by EMP_ID in descending order.
▪ Explain the purpose of filtering rows using WHERE X.RN < 3 in the query.
▪ Write a query that selects all details of the top two employees in each department based on their EMP_ID.
54
Window Function – Exercises using
ROW_NUMBER & OVER
Question
▪ Write a query to retrieve the top two employees (based on EMP_ID in descending order) from each department.
▪ How does the ROW_NUMBER() function with PARTITION BY DEPT_NAME help in ranking employees within each department?
▪ Modify a query to include a ranking column for employees within each department, ordered by EMP_ID in descending order.
▪ Explain the purpose of filtering rows using WHERE X.RN < 3 in the query.
▪ Write a query that selects all details of the top two employees in each department based on their EMP_ID.
55
Window Function – Exercises: FIRST_VALUE()
Question
EMP_NAME DEPT_NAME EMP_ID SALARY AVG_SALARY
▪ Write a query to display each employee's details along with the first salary in their department
using the FIRST_VALUE function. Bob Finance 2 60000 60000
▪ How does the FIRST_VALUE function help retrieve the earliest salary in each department? Diana Finance 4 62000 60000
▪ Write a query to rank employees by salary in their department, ensuring no gaps in rank 14 Nina Finance 64000 1
values for ties, using DENSE_RANK().
▪ Explain the difference between RANK and DENSE_RANK() in ranking employees. 4 Diana Finance 62000 2
9 Ivy Finance 61000 3
2 Bob Finance 60000 4
13 Mona HR 53000 1
5 Eve HR 52000 2
SELECT E.*,
DENSE_RANK() OVER (PARTITION BY DEPT_NAME ORDER BY SALARY DESC) AS 1 Alice HR 50000 3
SALARY_RANK
8 Hank HR 49000 4
FROM EMPLOYEE_TABLE E;
12 Leo IT 80000 1
6 Frank IT 78000 2
10 Jack IT 77000 3
3 Charlie IT 75000 4
57
15 Oscar Marketing 55000 3
Window Function – Exercises: NTILE()
EMP_ID EMP_NAME DEPT_NAME SALARY CATEGORY
Question
12 Leo IT 80000 1
▪ Use the NTILE function to divide employees into 4 salary quartiles and display the quartile
number for each employee. 6 Frank IT 78000 1
▪ How does NTILE partition employees into groups based on salary?
10 Jack IT 77000 1
3 Charlie IT 75000 1
13 Mona HR 53000 3
5 Eve HR 52000 4
1 Alice HR 50000 4
58
8 Hank HR 49000 4
Window Function – Exercises: LAG()
Question
▪ Use the LEAD function to display each employee's salary along with the next higher salary 14 Nina Finance 64000 NULL
within the same department.
▪ Write a query using the LAG function to show each employee's salary along with the previous4 Diana Finance 62000 64000
salary within their department.
9 Ivy Finance 61000 62000
SELECT
ROUND(SUM(ListPrice * Quantity), 2) AS TotalValue, -- Total value of all products (rounded to 2
decimals)
ROUND(AVG(ListPrice), 2) AS AveragePrice, -- Average list price (rounded to 2 decimals)
COUNT(ProductID) AS ProductCount, -- Total number of products
ROUND(MIN(ListPrice), 2) AS MinimumPrice, -- Minimum list price (rounded to 2 decimals)
ROUND(MAX(ListPrice), 2) AS MaximumPrice -- Maximum list price (rounded to 2 decimals)
FROM Production;
TotalValue AveragePrice ProductCount MinimumPrice MaximumPrice
60
58726.3 65.39 7 25 120.75
Aggregate Functions – Production Table
SELECT
EXTRACT(MONTH FROM Production) AS Month, Month AvgPrice Running Total of TotalPrice Over Months
AVG(ListPrice) AS AvgPrice 12 60
FROM
1 120.75 SELECT
Production
EXTRACT(MONTH FROM Production) AS Month,
GROUP BY 3 80.99 SUM(ListPrice * Quantity) AS TotalPrice,
EXTRACT(MONTH FROM Production) SUM(SUM(ListPrice * Quantity)) OVER (ORDER BY EXTRACT(MONTH FROM Production))
HAVING 5 90.5
AS RunningTotal
AVG(ListPrice) > 50; FROM Month TotalPrice RunningTotal
Products
GROUP BY 1 6037.5 6037.5
SELECT ID Value EXTRACT(MONTH FROM Product)
EXTRACT(MONTH FROM Production) AS Month, 3 9718.8 15756.3
ORDER BY
SUM(ListPrice * Quantity) AS TotalPrice 9 9000 Month; 5 8145 23901.3
FROM 11 5325
Production 7 2500 26401.3
GROUP BY 12 18000
9 9000 35401.3
EXTRACT(MONTH FROM Production) 1 6037.5
HAVING 11 5325 40726.3
Month NinetyPercentilePrice
SUM(ListPrice * Quantity) > 5000; 3 9718.8
12 18000 58726.3
5 8145 1 6037.5
5 8145
SELECT
EXTRACT(MONTH FROM Production) AS Month, 7 2500
PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY ListPrice * Quantity) AS NinetyPercentilePrice
FROM 9 9000
Production
GROUP BY 11 5325
EXTRACT(MONTH FROM Production); 61
12 18000
Multi-Level Aggregation: Total Quantity and
Average per Product
WITH ProductTotals AS (
SELECT
ProductID, AvgQuantityAcrossAl
ProductID Name TotalQuantity
Name, lProducts
SUM(Quantity) AS TotalQuantity
FROM 1 Widget A 100 144.29
Production 2 Gadget B 200 144.29
GROUP BY
ProductID, Name 3 Device C 150 144.29
)
SELECT 4 Tool D 300 144.29
ProductID,
Name, 5 Machine E 50 144.29
TotalQuantity,
ROUND(AVG(TotalQuantity) OVER (), 2) AS AvgQuantityAcrossAllProducts 6 Gizmo F 120 144.29
FROM 7 Appliance G 90 144.29
ProductTotals;
SELECT
EXTRACT(MONTH FROM Production) AS Month, Grouping by Price Range and Month
CASE
WHEN ListPrice < 50 THEN 'Low' Month PriceRange TotalPrice
WHEN ListPrice BETWEEN 50 AND 100 THEN 'Medium'
ELSE 'High' 7 Low 2500
END AS PriceRange,
SUM(ListPrice * Quantity) AS TotalPrice 9 Low 9000
FROM
Production 11 Low 5325 Question
GROUP BY
EXTRACT(MONTH FROM Production), 12 Medium 18000
Write a SQL query to extract the month from the SaleDate,
CASE
WHEN ListPrice < 50 THEN 'Low' 1 High 6037.5 categorize the Price into 'Low', 'Medium', and 'High'
WHEN ListPrice BETWEEN 50 AND 100 THEN 'Medium' ranges, calculate the total sales revenue (Price *
ELSE 'High' 3 Medium 9718.8 QuantitySold) for each month and price range,
62 and display
END; the results with Month, PriceRange, and TotalSales.
SQL Subqueries
A subquery in SQL is a query nested inside another SQL query, often used to perform operations that need a result from a secondary
query to complete the primary one. Here’s a refined explanation based on your input:
Definition of a Subquery
▪ A subquery is a SQL query embedded within another SQL statement, often referred to as an "inner query" or "inner select," while the
main query containing it is called the "outer query" or "outer select."
▪ The inner query executes first, and its result is then used by the outer query.
Locations of a Subquery
Subqueries can be placed in several parts of a SQL statement:
▪ SELECT clause: For calculating values to be used in the result set.
▪ FROM clause: As a derived table.
▪ WHERE clause: To filter rows based on criteria from the inner query.
▪ HAVING clause: To filter groups.
64
Subqueries: Guidelines
65
SQL Subqueries - Exercise
Average of TotalPrice for Products with Highest Total Price in Each Month
SELECT
EXTRACT(MONTH FROM Production) AS Month,
AVG(TotalPrice) AS AvgTotalPrice
FROM (
SELECT
Month AvgTotalPrice
EXTRACT(MONTH FROM Production) AS Month,
ListPrice * Quantity AS TotalPrice 1 6037.50
FROM
Products 3 9718.80
) AS SubQuery
GROUP BY 5 8145.00
Month
HAVING
TotalPrice = (SELECT MAX(ListPrice * Quantity) FROM Products WHERE
EXTRACT(MONTH FROM Production) = SubQuery.Month); This subquery calculates the maximum total price (ListPrice *
Quantity) for the same month as the current SubQuery.Month.
▪ GROUP BY is used when you want to aggregate data across different categories or groups. It combines rows with the same values
in specified columns into a single row. You should apply GROUP BY when you need to calculate aggregate values (like sums or
averages) for each distinct group in your data.
▪ For example, if you want to calculate the total salary for each department, you would group the data by DepartmentID.
▪ HAVING is used to filter groups after aggregation has occurred. It’s similar to WHERE, but while WHERE filters rows before
aggregation, HAVING filters groups after they’ve been created by the GROUP BY clause.
▪ For instance, if you want to display only departments with a total salary exceeding 500,000, you would use HAVING to apply
this condition after the aggregation is done.
Key Points:
▪ Use GROUP BY to specify how data should be grouped before applying aggregate functions.
▪ Use HAVING to filter the results after the aggregation process.
67
Example Aggregate Functions with GROUP BY
and HAVING in SQL
Example:
In the example below, the GROUP BY clause groups employees by DepartmentID, and the HAVING clause filters out any departments
where the total salary is less than 500,000.
SELECT DepartmentID,
SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID -- Group employees by their department
HAVING SUM(Salary) > 500000; -- Filter departments where total salary exceeds 500,000
1. The GROUP BY clause first groups all rows (in this case, employees) by DepartmentID. This forms subsets (groups) of the data.
2. Aggregate functions (in this case, SUM(Salary)) are applied to each group, calculating the total salary for each department.
3. The HAVING clause then filters the results to display only those departments where the total salary exceeds 500,000.
This ensures that you can both group and filter data effectively in a single query.
68
Comparison: Views vs. Indexes
Feature Views Indexes
Definition Virtual tables based on SELECT queries. Data structures that optimize query performance.
Purpose Simplify queries, enhance security, and customize data views. Speed up data retrieval by creating quick access paths.
Data Storage Do not store data physically. Require additional storage for index structures.
Performance Impact Simplifies repeated queries; does not improve performance directly. Improves query performance, especially for large datasets.
Trade-offs No additional storage, but depends on underlying tables for performance. Slows down write operations and consumes more storage.
Use Cases Simplifying reports, restricting sensitive data. Filtering, sorting, and joining tables in large datasets. 69
Summary of Commands
Operation Command Purpose
Create View CREATE VIEW ... AS SELECT ... Simplify complex queries or restrict data.
Alter Index DROP INDEX ... followed by CREATE INDEX ... Update an existing index.
72
Assignment I (1/2)
SQL Query Development for HR Database: Table Creation, Data Manipulation, and Retrieval
Assignment: HR Database SQL Queries
Use the HR database conceptual design diagram provided to answer the following questions by writing
SQL queries. Part 3: Data Retrieval
3. Insert Data
Write SQL queries to insert sample data into all tables, ensuring the foreign key constraints are
73
respected.
Assignment I (2/2)
Comprehensive SQL Querying with HR Database: Table Creation, Data Manipulation, and
Advanced Retrieval
74
Assignment II: Normalization Question for
Students
Problem: A university database stores student and course information in a single table:
Question:
▪ Identify the issues with this table structure based on normalization principles. Which normal forms
(1NF, 2NF, or 3NF) are violated? Explain why.
2. Follow-Up Task: Design normalized tables that address these violations and ensure the database
adheres to 3NF. Write the SQL queries to create these tables.
75
Thank you!
76