@DataScience Ir 13 SQL Statements for 90% of Your Data Science Tasks
@DataScience Ir 13 SQL Statements for 90% of Your Data Science Tasks
Analysis Tasks.
Abhishek Saud · Follow
13 min read · Mar 23
Filtering, sorting, grouping, and aggregating data are just a few of the many data
manipulation operations that can be carried out using SQL, which is a strong tool.
We’ll go over 13 key SQL statements in this article that are necessary for 90% of
the data science tasks you’ll be performing. These straightforward statements will
give you a strong foundation for working with SQL and are simple to comprehend
and implement.
This article will give you insightful knowledge and helpful advice for handling
data, whether you are new to SQL or have some experience with it.
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
1. SELECT
Data can be pulled from one or more database tables using the SELECT statement.
To filter, sort, and group data using various functions like WHERE, ORDER BY,
and GROUP BY, you should become proficient with using SELECT. A SELECT
statement is demonstrated by the following:
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
WHERE conditions;
In this example, table_name is the name of the table that contains the data, and
COL1, COL2, and COL3 are the names of the columns from which you want to
retrieve the data. Although it is optional, the WHERE clause is used to specify a
requirement that must be satisfied for the query to successfully retrieve data.
Here’s an example that selects all records from a table called “employees” where
the employee’s age is greater than or equal to 35:
SELECT *
FROM employees
WHERE age >= 35;
2. WHERE
Data can be filtered using the WHERE statement according to a given condition.
You should become proficient in using WHERE to only retrieve data that satisfies
specific requirements.
Here is an illustration of how to filter data from a table using a “where” statement
in SQL:
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
3. GROUP BY
Data can be grouped based on one or more columns using the GROUP BY
statement, and aggregate functions (like COUNT, SUM, and AVG) can be used to
determine summaries of the grouped data. You should become proficient with
GROUP BY if you want to categorize data.
A list of all departments and their average salaries, determined by dividing the
total of all employee salaries by the number of employees in each department,
would be returned by this query. The employees are grouped by department using
the GROUP BY clause, and the average pay for each department is determined
using the AVG function.
department | avg_salary
-----------------------
Sales | 65000
Marketing | 55000
Engineering| 80000
Info_tech | 130000
4. JOIN
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
To combine data from two or more tables in a database, use the JOIN statement.
To retrieve data from multiple tables, you should become proficient at using JOIN
and specifying the correct type of join (e.g., INNER, LEFT, RIGHT, FULL OUTER).
INNER JOIN
Only the rows with a match between the columns in both tables are returned by
an INNER JOIN. Here’s an illustration:
The customer_id column is used in this example to join the tables for orders and
customers. Only when there is a match between the customer_id columns in both
tables will the resulting table contain the order_id and customer_name columns.
LEFT JOIN
With a LEFT JOIN, all of the rows from the left table are returned, along with any
matching rows from the right table. The result will have NULL values if the
appropriate table does not match. An illustration would be:
In this illustration, the orders table is on the right and the customers table is on
the left. The columns are joined using the customer_id column. All the rows from
the customers table and their corresponding rows from the orders table will be
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
included in the final table. If there is no match in the orders table, NULL values
will appear in the order_id column.
RIGHT JOIN
All of the rows from the right table and the matching rows from the left table are
returned by a RIGHT JOIN. The result will contain NULL values if there is no
match in the left table. Here’s an illustration:
In this illustration, the customer table is on the right and the orders table is on the
left. The columns are joined using the customer_id column. All of the rows from
the orders table and their corresponding rows from the customers table will be
included in the final table. If there is no match in the customers table, NULL
values will appear in the customer_name column.
OUTER JOIN
All the rows from one or both tables, including the non-matching rows, are
returned using an OUTER JOIN in SQL. LEFT OUTER JOIN and RIGHT OUTER
JOIN are the two different kinds of OUTER JOINS.
In this illustration, the orders table is on the right and the customers table is on
the left. The columns are joined using the customer_id column. All the rows from
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
the customers table and their corresponding rows from the orders table will be
included in the final table. If there is no match in the orders table, NULL values
will appear in the order_id column.
In this illustration, the customer table is on the right and the orders table is on the
left. The columns are joined using the customer_id column. All of the rows from
the orders table and their corresponding rows from the customers table will be
included in the final table. If there is no match in the customers table, NULL
values will appear in the customer_name column.
It’s important to note that while some databases might not support RIGHT OUTER
JOINS, you can still get the same outcome by using an LEFT OUTER JOIN and
switching the order of the tables.
5. HAVING
Data is filtered using the HAVING statement after being grouped using the GROUP
BY statement. You should become proficient in using HAVING to filter grouped
data according to particular criteria.
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
outcomes so that they only contain customers who have placed orders totaling at
least 50 units using the HAVING clause:
Only customers who have placed orders totaling at least 50 units will be included
in the list of all customers and the total number of products they have ordered.
The SUM function is used to determine the total number of products each
customer has ordered, the GROUP BY clause is used to group orders by customers,
and the HAVING clause is used to limit the results to only customers who have
placed orders totaling at least 50 units.
customer_id | total_quantity
---------------------------
123 | 60
456 | 70
6. Window Function
In SQL, window functions are employed to carry out calculations across a
collection of rows that are connected to the current row. A window — a subset of
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
rows from a table based on a given condition or partition — is where these
functions are applied. Here are a few SQL window function examples:
This query will return a result set with an additional column “row_num” that
contains the sequential numbers assigned to each row based on the order of
“column1”.
2. SUM(): This function calculates the sum of a column within a partition. The
syntax for the SUM() function is:
This query will return a result set with an additional column “column3_sum” that
contains the sum of “column3” for each partition based on the values of
“column1”.
3. RANK(): This function assigns a rank to each row within a partition based on
the values of a specified column. The syntax for the RANK() function is:
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
This query will return a result set with an additional column “rank_num” that
contains the rank of each row within each partition based on the descending
order of “column3”.
4. AVG(): This function calculates the average of a column within a partition. The
syntax for the AVG() function is:
This query will return a result set with an additional column “column3_avg” that
contains the average of “column3” for each partition based on the values of
“column1”.
Note that the syntax for window functions may vary depending on the specific
database management system (DBMS) being used.
7. UNION
To combine the output of two or more SELECT statements into a single result set
in SQL, use the UNION operator. The number of columns in the SELECT
statements must match, and the data types of the columns must be compatible.
The result set is automatically cleared of duplicate rows.
Consider two tables with the names “customers” and “employees,” respectively,
and columns for “name” and “city.” We want to compile a list of everyone who
resides in New York City, including both clients and staff. Two SELECT statements,
one for each table, can be combined using the UNION operator:
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
SELECT name, city
FROM customers
WHERE city = 'New York'
UNION
SELECT name, city
FROM employees
WHERE city = 'New York';
Customers and employees alike would be included in the list of people who reside
in New York City as a result of this query. The first SELECT statement returns all
New York City-based clients, and the second SELECT statement returns all New
York City-based personnel. These two SELECT statements’ results are combined,
and any duplicate rows are eliminated, by the UNION operator.
name | city
-------------------
John Smith | New York
Jane Doe | New York
Bob Johnson | New York
Samantha Lee| New York
In this example, we can see that there are four people who live in New York City,
two from the “customers” table and two from the “employees” table, and the
UNION operator has combined the results of the two SELECT statements into a
single result set.
8. CREATE
A new database table, view, or other database object can be created using the
CREATE statement. To create new tables, views, and other database objects, you
should become an expert at using CREATE. An illustration of how to use the
CREATE statement in SQL
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
Suppose we want to create a new table called “employees” with columns for “id”,
“name”, “email”, “phone” and “department”. We can use the CREATE statement to
do this:
A new table called “customers” with the columns “id,” “name,” “email,” “phone”
and “department” would be created as a result of this query. The “id” column,
which is designated as an integer, serves as the table’s primary key. The “name”
column has a maximum character limit of 50, while the “email” and “phone”
columns have maximum character limits of 100 and 20, respectively.
After the query is executed, we can insert new rows into the “customers” table and
retrieve data from it:
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
In this example, we have used the CREATE statement to create a new table in a
database and inserted a new row into the table.
9. INSERT
Data is inserted into a database table using the INSERT statement. Learn how to
use INSERT to update a database table with new information. Here is an
illustration of how to use the SQL INSERT statement:
Consider a table called “students” that contains columns for “id,” “name,” “major,”
and “gpa.” The student with the ID 1134, name “Adam Fields,” major in “Data
Science,” and GPA of 3.7 needs to have a new row added to the table. This can be
accomplished using the INSERT statement:
This query would add a new row with the specified values for the “id,” “name,”
“major,” and “gpa” columns to the “students” table. The table name and the list of
columns into which values are to be inserted are both specified in the INSERT
statement. The values we want to insert into each column are then specified using
the VALUES keyword in the order that the columns were listed.
In this example, we have inserted a new row into the “students” table using the
INSERT statement.
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
10. UPDATE
The UPDATE statement is used to modify existing data in a database table. You
should master using UPDATE to update the values of one or more columns in a
table. Here’s an example of using the UPDATE statement in SQL:
Suppose we have a table named “students” with columns for “id”, “name”, “major”,
and “gpa”. We want to update the major and GPA of a student with an ID of 1134.
We can use the UPDATE statement to do this:
UPDATE students
SET major = 'Mathematics', gpa = 3.3
WHERE id = 1134;
This query would update the “major” and “gpa” columns of the row in the
“students” table with an ID of 1134. The UPDATE statement specifies the name of
the table we want to update, followed by the SET keyword and a list of column-
value pairs that we want to update. We then use the WHERE clause to specify
which rows we want to update. In this case, we want to update the row with an ID
of 1134, so we specify “WHERE id = 1134”.
After the query is executed, the “students” table would have the updated values for
the “major” and “gpa” columns in the row with an ID of 1134:
In this example, we have updated the “major” and “gpa” columns of a row in the
“students” table using the UPDATE statement.
11. DELETE
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
A database table’s rows can be deleted using the DELETE statement. To remove
data from a table, you should become proficient with DELETE. Here is an
illustration of how to use the SQL DELETE statement:
Consider a table called “students” that contains columns for “id,” “name,” “major,”
and “gpa.” A student with the ID of 1134 needs to be removed from the table. To
accomplish this, we can use the DELETE statement.
The “students” table’s row with the ID 1134 would be deleted as a result of this
query. Following the WHERE clause to specify which rows we want to delete, the
DELETE statement specifies the name of the table we want to delete from. In this
instance, we specify “WHERE id = 1134” because we want to remove the row with
ID 1134.
12. DROP
To remove a database table or other database object, use the DROP statement. To
remove unnecessary tables or other objects from a database, you should become
proficient with the DROP command. Depending on the type of object being
deleted, the syntax for the DROP statement varies, but some typical examples
include:
1. DROP TABLE: This statement is used to delete an existing table along with all
its data and indexes. The syntax for the DROP TABLE statement is:
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
2. DROP INDEX: This statement is used to delete an existing index from a table.
The syntax for the DROP INDEX statement is:
13. ALTER
A database table’s or another database object’s structure can be changed using the
ALTER statement. To add or remove columns, alter data types, or change other
features of a table, you should become proficient with ALTER. Depending on the
kind of object being modified, the syntax for the ALTER statement varies, but
some typical examples include:
1. Using the ALTER TABLE statement, you can change a table’s structure by
adding or removing columns, altering the data types, or imposing constraints.
The ALTER TABLE statement has the following syntax:
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com
MODIFY column_name data_type [constraint],
DROP column_name,
ADD CONSTRAINT constraint_name constraint_definition,
DROP CONSTRAINT constraint_name;
2. Using the ALTER INDEX statement, you can change an existing index’s structure
by adding or removing columns or by changing the index type. The ALTER INDEX
statement has the following syntax:
Note that the exact syntax for the ALTER statement may vary depending on the
specific database management system (DBMS) being used.
If you like the article and would like to support me make sure to:
👏 Clap for the story (50 claps) to help this article be featured
Follow me on Medium
Convert web pages and HTML files to PDF in your applications with the Pdfcrowd HTML to PDF API Printed with Pdfcrowd.com