0% found this document useful (0 votes)

16 views

SQL__1721960421

The document is a comprehensive study guide on SQL and data retrieval, covering various SQL commands, query structures, and data manipulation techniques. It includes explanations of key concepts such as joins, aggregations, window functions, and table management, along with practical SQL commands and examples. Additionally, it features a detailed table of contents outlining numerous SQL-related topics and questions.

Uploaded by

narayana143

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

SQL__1721960421

Uploaded by

narayana143

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 131

Scanned by CamScanner

Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Compilation - Arockia Liborious
15.003 Software Tools — Data Science Afshine Amidi & Shervine Amidi

Study Guide: Data Retrieval with SQL Category Operator Command

Equality / non-equality = / !=, <>
Inequalities >=, >, <, <=
Afshine Amidi and Shervine Amidi Belonging IN (val_1, ..., val_n)
General
And / or AND / OR
August 21, 2020 Check for missing value IS NULL
Between bounds BETWEEN val_1 AND val_2
Strings Pattern matching LIKE ’%val%’
General concepts
r Structured Query Language – Structured Query Language, abbreviated as SQL, is a
language that is largely used in the industry to query data from databases. r Joins – Two tables table_1 and table_2 can be joined in the following way:

r Query structure – Queries are usually structured as follows: SQL

...
SQL FROM table_1 t1
-- Select fields.....................mandatory type_of_join table_2 t2
SELECT ..ON (t2.key = t1.key)
....col_1,
....col_2, ...
........ ,
....col_n where the different type_of_join commands are summarized in the table below:
-- Source of data....................mandatory
FROM table t Type of join Illustration

-- Gather info from other sources....optional

JOIN other_table ot INNER JOIN
..ON (t.key = ot.key)

-- Conditions........................optional
WHERE some_condition(s)
-- Aggregating.......................optional LEFT JOIN
GROUP BY column_group_list
-- Sorting values....................optional
ORDER BY column_order_list
RIGHT JOIN
-- Restricting aggregated values.....optional
HAVING some_condition(s)
-- Limiting number of rows...........optional
LIMIT some_value FULL JOIN

Remark: the SELECT DISTINCT command can be used to ensure not having duplicate rows.

r Condition – A condition is of the following format: Remark: joining every row of table 1 with every row of table 2 can be done with the CROSS JOIN
command, and is commonly known as the cartesian product.
SQL
some_col some_operator some_col_or_value
Aggregations
where some_operator can be among the following common operations: r Grouping data – Aggregate metrics are computed on grouped data in the following way:

Massachusetts Institute of Technology 1 https://www.mit.edu/~amidi

15.003 Software Tools — Data Science Afshine Amidi & Shervine Amidi

WHERE HAVING
- Filter condition applies to individual rows - Filter condition applies to aggregates
- Statement placed right after FROM - Statement placed right after GROUP BY

Remark: if WHERE and HAVING are both in the same query, WHERE will be executed first.

The SQL command is as follows:

Window functions
SQL
r Definition – A window function computes a metric over groups and has the following struc-
SELECT ture:
....col_1,
....agg_function(col_2)
FROM table
GROUP BY col_1

r Grouping sets – The GROUPING SETS command is useful when there is a need to compute
aggregations across different dimensions at a time. Below is an example of how all aggregations
across two dimensions are computed:
The SQL command is as follows:
SQL
SQL
SELECT
....col_1, some_window_function() OVER(PARTITION BY some_col ORDER BY another_col)
....col_2,
....agg_function(col_3) Remark: window functions are only allowed in the SELECT clause.
FROM table
GROUP BY ( r Row numbering – The table below summarizes the main commands that rank each row
..GROUPING SETS across specified groups, ordered by a specific column:
....(col_1),
....(col_2),
....(col_1, col_2) Command Description Example
)
ROW_NUMBER() Ties are given different ranks 1, 2, 3, 4
RANK() Ties are given same rank and skip numbers 1, 2, 2, 4
r Aggregation functions – The table below summarizes the main aggregate functions that
can be used in an aggregation query: DENSE_RANK() Ties are given same rank and don’t skip numbers 1, 2, 2, 3

Category Operation Command

r Values – The following window functions allow to keep track of specific types of values with
Mean AVG(col) respect to the partition:
Percentile PERCENTILE_APPROX(col, p)
Command Description
Values Sum / # of instances SUM(col) / COUNT(col)
FIRST_VALUE(col) Takes the first value of the column
Max / min MAX(col) / MIN(col)
LAST_VALUE(col) Takes the last value of the column
Variance / standard deviation VAR(col) / STDEV(col)
LAG(col, n) Takes the nth previous value of the column
Arrays Concatenate into array collect_list(col)
LEAD(col, n) Takes the nth following value of the column
Remark: the median can be computed using the PERCENTILE_APPROX function with p equal to 0.5. NTH_VALUE(col, n) Takes the nth value of the column
r Filtering – The table below highlights the differences between the WHERE and HAVING com-
mands:

Massachusetts Institute of Technology 2 https://www.mit.edu/~amidi

15.003 Software Tools — Data Science Afshine Amidi & Shervine Amidi

Advanced functions Category Operation Command

r SQL tips – In order to keep the query in a clear and concise format, the following tricks are Take first non-NULL value COALESCE(col_1, col_2, ..., col_n)
often done: General Create a new column
CONCAT(col_1, ..., col_n)
Operation Command Description combining existing ones

Renaming New column names shown in Value Round value to n decimals ROUND(col, n)
SELECT operation_on_column AS col_name
columns query results Converts string column to
LOWER(col) / UPPER(col)
Abbreviation used within lower / upper case
Abbreviating
FROM table_1 t1 query for simplicity in Replace occurrences of
tables REPLACE(col, old, new)
notations old in col to new
Specify column position in String Take the substring of col,
Simplifying SUBSTR(col, start, length)
GROUP BY col_number_list SELECT clause instead of with a given start and length
group by
whole column names
Remove spaces from the
Limiting LTRIM(col) / RTRIM(col) / TRIM(col)
LIMIT n Display only n rows left / right / both sides
results
Length of the string LENGTH(col)
Truncate at a given granularity
r Sorting values – The query results can be sorted along a given set of columns using the DATE_TRUNC(time_dimension, col_date)
following command: Date (year, month, week)

SQL Transform date DATE_ADD(col_date, number_of_days)

... [query] ...

ORDER BY col_list r Conditional column – A column can take different values with respect to a particular set
of conditions with the CASE WHEN command as follows:
Remark: by default, the command sorts in ascending order. If we want to sort it in descending SQL
order, the DESC command needs to be used after the column.
CASE WHEN some_condition THEN some_value
r Column types – In order to ensure that a column or value is of one specific data type, the ..................
following command is used: .....WHEN some_other_condition THEN some_other_value
.....ELSE some_other_value_n END
SQL
CAST(some_col_or_value AS data_type)
r Combining results – The table below summarizes the main ways to combine results in
queries:
where data_type is one of the following:
Category Command Remarks
Data type Description Example
UNION Guarantees distinct rows
INT Integer 2 Union
UNION ALL Potential newly-formed duplicates are kept
DOUBLE Numerical value 2.0
Intersection INTERSECT Keeps observations that are in all selected queries
STRING
String ’teddy bear’
VARCHAR r Common table expression – A common way of handling complex queries is to have tem-
porary result sets coming from intermediary queries, which are called common table expressions
DATE Date ’2020-01-01’ (abbreviated CTE), that increase the readability of the overall query. It is done thanks to the
TIMESTAMP Timestamp ’2020-01-01 00:00:00.000’ WITH ... AS ... command as follows:
SQL
Remark: if the column contains data of different types, the TRY_CAST() command will convert
unknown types to NULL instead of throwing an error. WITH cte_1 AS (
SELECT ...
r Column manipulation – The main functions used to manipulate columns are described in ),
the table below:

Massachusetts Institute of Technology 3 https://www.mit.edu/~amidi

15.003 Software Tools — Data Science Afshine Amidi & Shervine Amidi

Command Description
...
OVERWRITE Overwrites existing data
cte_n AS (
SELECT ... INTO Appends to existing data
)
SELECT ... r Dropping table – Tables are dropped in the following way:
FROM ...
SQL
DROP TABLE table_name;

Table manipulation
r View – Instead of using a complicated query, the latter can be saved as a view which can
r Table creation – The creation of a table is done as follows: then be used to get the data. A view is created with the following command:

SQL SQL
CREATE [table_type] TABLE [creation_type] table_name( CREATE VIEW view_name AS complicated_query;
..col_1 data_type_1,
...................,
..col_n data_type_n Remark: a view does not create any physical table and is instead seen as a shortcut.
)
[options];

where [table_type], [creation_type] and [options] are one of the following:

r Data insertion – New data can either append or overwrite already existing data in a given
table as follows:

SQL
WITH ..............................-- optional
INSERT [insert_type] table_name....-- mandatory
SELECT ...;........................-- mandatory

where [insert_type] is among the following:

Massachusetts Institute of Technology 4 https://www.mit.edu/~amidi

TABLE OF
CONTENTS
1.What is Relational Database Management System
(RDMBS)?
2.What is Structured Query Language?
3.What is a Database?
4.What is primary key?
5.What is a unique key?
6.What is a foreign key?
7.Explain the difference between spreadsheets and
databases.
8.What are table and fields?
9.Explain the various SQL languages.
10. What is normalization?
11. What is denormalization?
12. Explain the different types of normalization.
13. What are views in SQL?
14. What is join? Explain the different types.
15. What are the different types of indexes?
16. What is a cursor in SQL?
17. What is query?
TABLE OF
CONTENTS
18. What is a subquery?
19. What is a trigger?
20. Differentiate between the DELETE and TRUNCATE
commands.
21. What are local and global variables?
22. What are constraints?
23. What is data integrity?
24. What is auto increment?
25. What is a data warehouse?
26. What is the difference between DROP and TRUNCATE
statements?
27. What are aggregate and scalar functions?
28. What is alias in SQL?
29. What is the difference between OLTP and OLAP?
30. What is collation? What are the various types of
collation sensitivity?
31. How can we create tables in SQL?
32. How can we insert data in SQL?
33. How can we change a table name in SQL?
TABLE OF
CONTENTS
34. What is SQL server?
35. What is ETL in SQL?
36. What are nested queries?
37. What is the difference between CHAR and
VARCHAR2 data types in SQL server?
38. What is difference between SQL and PL/SQL?
39. What is the difference between SQL and MySQL?
40. What is cross join?
41. What are user defined functions?
42. What is a CLAUSE?
43. What is recursive stored procedure?
44. Explain UNION, MINUS and INTERACT commands?
45. What TCP/IP port does SQL Server run?
46. Which operator is used in query for pattern
matching?
47. How can we select unique records from a Table?
48. List and explain each of the ACID properties that
collectively guarantee that database transactions are
processed reliably.
TABLE OF
CONTENTS
49. What is the main difference in the BETWEEN and IN
condition operators?
50. What are SQL functions used for?
51. What is the need for MERGE statement?
52. List the ways in which dynamic SQL can be
executed.
53. List some case manipulation functions in SQL.
54. Is semicolon used after sql? Justify why or why not.
55. What is candidate key?
56. What is the difference between JOIN and UNION?
57. What is the difference between order and group
by?
58. Write an SQL query to fetch employee names
having a salary greater than or equal to 20000 and
less than or equal to 10000.
59. What is SQL injection? When does SQL injection
occur?
60. What is ENUM?
61. What is the difference between the ATAN and ATAN2
function?
TABLE OF
CONTENTS
62. What is the difference between the CEIL, FLOOR and
ROUND functions?
63. What is the RAND() function?
64. What is the difference between LOCALTIMESTAMP
and CURRENT_TIMESTAMP?
65. Name three functions that specify current date and
time.
66. Which function returns the difference between two
periods? What would the format of the output be?
67. How can we fetch common records from two
tables?
68. How can we fetch alternate records from a table?
69. How can we select unique records from a table?
70. What is the command used to fetch the first 5
characters of the string?
71. How to use LIKE in SQL?
72. How can we copy a table in SQL?
73. If we drop a table, does it also drop related objects?
74. What is Live Lock?
TABLE OF
CONTENTS
75. Can you join a table by itself?
76. Explain Equi join with an example.
77. Explain non-Equi join with an example.
78. State the difference between NVL and NVL2
functions.
79. What does this query achieve? GRANT
privilege_name ON object_name TO
{user_name|PUBLIC|role_name} [WITH GRANT
OPTION]; ?
80. Where is MyISAM table stored?
81. What does myisamchk do?
82. How can we store videos inside SQL server table?
83. Write an SQL query to show the second highest
salary from a table.
84. How would you select all the users whose phone
number is NULL?
85. Write an SQL query to fetch three max salaries from
a table.
TABLE OF
CONTENTS
86. Write an SQL query to create a new table with data
and structure copied from another table.
87. What are the differences between the HAVING
clause, and the WHERE clause?
88. What does a BCP command do?
89. Can a view be active if the base table is dropped?
90. When should we use NoSQL and SQL?
91. What is SYSTEM privilege?
92. What are object privileges?
93. Does the data stored in the stored procedure
increase access time or execution time? Explain.
94. What is CTE?
95. Does view contain data?
96. Define a temp table.
97. What is the difference between the RANK() and
DENSE_RANK() function?
98. What is referential integrity?
99. What does query optimization imply?
100. What are nested triggers?
TABLE OF
CONTENTS
101. What is schema in SQL server?
102. Write a query to fetch 50% records from an
EmployeeInfo table.
103. Write a query to add email validation to your
database.
104. What is CTE in SQL server?
105. Suppose you have a sample table of workers,
bonus and title.
106. Write a query to fetch the top N records.

Bonus:
1. Social Media Company Interview Qs (e.g. Facebook)
2. Audio Streaming Service Company Interview Qs(e.g.
Spotify)
3. e-Commerce Company Interview Qs (e.g. Amazon)
4. Entertainment Streaming Company Interview Qs
(e.g. Netflix)
5. Financial Institution Interview Qs (e.g. HSBC)
6. Online Marketplace Interview Qs(e.g. Airbnb)
7. Software Company Interview Qs (e.g. Microsoft)
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

01
1. What is Relational Database Management System
(RDMBS)?
RDBMS store data into a collection of tables, which is related
by common fields between the columns of the table. It also
provides relational operators to manipulate the data stored
into the tables.
Example: SQL Server.

2. What is Structured Query Language?

SQL stands for Structured Query Language , and it is used to
communicate with the Database. This is a standard
language used to perform tasks such as retrieval, updates,
insertion and deletion of data from a database.
Standard SQL Commands are Select.

3. What is a Database?
A Database is an organized form of data for easy access,
storing, retrieval and managing of data. This is also known
as structured form of data which can be accessed in many
ways.
Example: School Management Database, Bank Management
Database.

4. What is primary key?

A primary key is a combination of fields which uniquely
specify a row. This is a special kind of unique key, and it has
implicit NOT NULL constraint. This means, Primary key values
cannot be NULL.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

02
5. What is a unique key?
A Unique key constraint uniquely identifies each record in a
database. This provides uniqueness for the column or set of
columns. A Primary key constraint has automatic unique
constraint defined on it. There can be many unique
constraints defined per table, but only one Primary key
constraint defined per table.

6. What is a foreign key?

A foreign key is one table which can be related to the
primary key of another table. Relationships need to be
created between two tables by referencing the foreign key
with the primary key of another table.

7. Explain the difference between spreadsheets and

databases.
Spreadsheet:
A file that exists of cells in rows and columns and can help
arrange, calculate and sort data. It can have numeric
values, text, formulas and functions. It features columns and
rows to keep inserted information legible and simple to
understand. It is an electronic graph sheet.
Database:
It is an organized collection of data arranged for ease and
speed of search and retrieval. It contains multiple tables. A
database engine can sort, change or serve the information
on the database. Basically, it is a set of information which is
held in a computer.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

03
8. What are table and fields?
A table is a set of data that are organized in a model with
Columns and Rows. Columns can be categorized as vertical,
and Rows are horizontal. A table has a specified number of
column called fields but can have any number of rows
which are called records.
Example:
Table: Employee.
Field: Emp ID, Emp Name, Date of Birth.
Data: 201456, David, 11/15/1960.

9. Explain the various SQL languages.

There are five types of SQL commands: DDL, DML, DCL, TCL,
and DQL.
Data Definition Language (DDL)
DDL changes the structure of the table like creating a table,
deleting a table, altering a table, etc. All the commands of
DDL are auto-committed which means that it permanently
saves all the changes in the database.
Some commands that come under DDL:
CREATE; ALTER; DROP; TRUNCATE
Data Manipulation Language
DML commands are used to modify the database. It is
responsible for all forms of changes in the database. The
commands of DML are not auto-committed which means
that it can't permanently save all the changes in the
database.
Some commands that come under DML:
INSERT; UPDATE; DELETE
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

04
Data Control Language
DCL commands are used to grant and take back authority
from any database user.
Some commands that come under DCL:
Grant; Revoke
Transaction Control Language
TCL commands can only be used with DML commands like
INSERT, DELETE and UPDATE. These operations are
automatically committed in the database, which is why they
cannot be used while creating tables or dropping them.
Some commands that come under TCL:
COMMIT; ROLLBACK; SAVEPOINT
Data Query Language
DQL is used to fetch the data from the database.
It uses only one command:
SELECT

10. What is normalization?

Normalization is the process of minimizing redundancy and
dependency by organizing fields and tables of a database.
The main aim of Normalization is to add, delete or modify
fields that can be made in a single table.

11. What is denormalization?

Denormalization is a technique used to access the data
from higher to lower normal forms of database. It is also a
process of introducing redundancy into a table by
incorporating data from the related tables.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

05
12. Explain the different types of normalization.
Some types are:
First Normal Form (1NF): This should remove all
the duplicate columns from the table. Creation
of tables for the related data and identification
of unique columns.
Second Normal Form (2NF): Meeting all
requirements of the first normal form. Placing the
subsets of data in separate tables and Creation
of relationships between the tables using primary
keys.
Third Normal Form (3NF): This should meet all
requirements of 2NF. Removing the columns
which are not dependent on primary key
constraints.
Fourth Normal Form (4NF): Meeting all the
requirements of third normal form and it should
not have multi- valued dependencies.

13. What are views in SQL?

A view is a virtual table which consists of a subset
of data contained in a table. Views are not virtually
present, and it takes less space to store. View can
have data of one or more tables combined, and it is
depending on the relationship.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

06
14. What is join? Explain the different types.
This is a keyword used to query data from more tables
based on the relationship between the fields of the
tables. Keys play a major role when JOINs are used.

There are various types of joins which can be used to

retrieve data and it depends on the relationship between
tables.
Left Outer Join: If we want all the records from left
table and only matching records from right table then
will use left outer join/left join.
Right Outer Join: If we want to display all the records
from right table and only matching records from left
table then will right outer join/right join.
Full Outer Join: If we want to display all the records
from both the tables then will use full outer join.
Inner Join: If we want only the matching records from
both the tables then will use Inner join/Simple join.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

07
15. What are the different types of indexes?
An index is a performance tuning method of allowing
faster retrieval of records from the table. An index creates
an entry for each value and makes it faster to retrieve
data.
There are three types of indexes:
Unique Index: This indexing does not allow the field to
have duplicate values if the column is unique indexed.
Unique index can be applied automatically when
primary key is defined.
Clustered Index: This type of index reorders the
physical order of the table and search based on the
key values. Each table can have only one clustered
index.
Non-Clustered Index: Non-Clustered Index does not
alter the physical order of the table and maintains
logical order of data. Each table can have 999 non-
clustered indexes.

16. What is a cursor in SQL?

A database Cursor is a control which enables traversal
over the rows or records in the table. This can be viewed
as a pointer to one row in a set of rows. Cursor is very
much useful for traversing such as for retrieval, addition
and removal of database records.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

08
17. What is query?
A DB query is a code written in order to get the
information back from the database. Queries can be
designed in such a way that it matches with our
expectation of the result set.

18. What is a subquery?

A subquery is a query within another query. The outer
query is called as main query, and inner query is called
subquery. SubQuery is always executed first, and the
result of subquery is passed on to the main query.

There are two types of subquery – Correlated and Non-

Correlated.
A correlated subquery cannot be considered as an
independent query, whereas a Non-Correlated sub query
can be considered as independent query and the output
of subquery are substituted in the main query.

Be a part of the team at Zep

Why don't you start your journey

as a tech blogger and enjoy
unlimited perks and cash prizes
every month.

Explore

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

09
19. What is a trigger?
A DB trigger is a code or programs that
automatically execute with response to some event
on a table or view in a database. Mainly, trigger
helps to maintain the integrity of the database.
Example: When a new student is added to the
student database, new records should be created in
the related tables such as the Exam, Score and
Attendance tables.

20. Differentiate between the DELETE and TRUNCATE

commands.
DELETE command is used to remove rows from
the table, and WHERE clause can be used for
conditional set of parameters. Commit and
Rollback can be performed after delete
statement.
TRUNCATE removes all rows from the table.
Truncate operation cannot be rolled back.

21. What are local and global variables?

Local variables are the variables which can be
used or exist inside the function. They are not
known to the other functions and those variables
cannot be referred to or used. Variables can be
created whenever that function is called.
Global variables are the variables which can be
used or exist throughout the program. Same
variable declared in global cannot be used in
functions. Global variables cannot be created
whenever that function is called.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

10
22. What are constraints?
Constraint can be used to specify the limit on the
data type of table. Constraint can be specified
while creating or altering the table statement.

23. What is data integrity?

Data Integrity defines the accuracy and consistency
of data stored in a database. It can also define
integrity constraints to enforce business rules on
the data when it is entered into the application or
database.

24. What is auto increment?

Auto increment keyword allows the user to create a
unique number to be generated when a new record
is inserted into the table. AUTO INCREMENT keyword
can be used in Oracle and IDENTITY keyword can be
used in SQL SERVER.

25. What is a data warehouse?

Data warehouses are a central repository of data
from multiple sources of information. This data is
consolidated, transformed and made available for
the mining and online processing. Warehouse data
have subsets of data called Data Marts.

26. What is the difference between DROP and

TRUNCATE statements?
TRUNCATE removes all the rows from the table, and
it cannot be rolled back. DROP command removes a
table from the database and operation cannot be
rolled back.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

11
27. What are aggregate and scalar functions?
Functions are methods used to perform data operations.
SQL has many in-built functions used to perform string
concatenations, mathematical calculations etc.
SQL functions are categorized into the following two
categories: Aggregate Functions and Scalar Functions.
Aggregate SQL Functions
The Aggregate Functions in SQL perform calculations on a
group of values and then return a single value. Following
are a few of the most commonly used Aggregate
Functions:

Scalar SQL Functions

The Scalar Functions in SQL are used to return a single
value from the given input value. Following are a few of
the most commonly used Scalar Functions:

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

12
28. What is alias in SQL?
SQL aliases are used to give a table, or a column in
a table, a temporary name. Aliases are often used
to make column names more readable. An alias
only exists for the duration of that query. An alias is
created with the AS keyword.

29. What is the difference between OLTP and OLAP?

OLAP
Online Analytical Processing, a category of software
tools which provide analysis of data for business
decisions. OLAP systems allow users to analyze
database information from multiple database
systems at one time.
The primary objective is data analysis and not data
processing.
OLTP
Online transaction processing shortly known as OLTP
supports transaction-oriented applications in a 3-
tier architecture. OLTP administers day to day
transaction of an organization.
The primary objective is data processing and not
data analysis. Unlike OLAP systems, the goal of OLTP
systems is serving real-time transactions.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

13
30. What is collation? What are the various types of
collation sensitivity?
Collation is defined as a set of rules that determine how
character data can be sorted and compared.
ASCII value can be used to compare these character
data.
Case sensitivity: A and a are treated differently.
Accent sensitivity: a and á are treated differently.
Kana sensitivity: Japanese kana characters Hiragana
and Katakana are treated differently.
Width sensitivity: Same character represented in
single-byte (half-width) and double-byte (full-
width) are treated differently.

31. How can we create tables in SQL?

The command to create a table in SQL is extremely
simple:
We will start off by giving the keywords, CREATE TABLE,
then we will give the name of the table. After that in
braces, we will list out all the columns along with their
data types.

For example, if we want to create a simple employee

table:
CREATE TABLE employee (
name varchar(25),
age int,
gender varchar(25),
....
);

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
14
32. How can we insert data in SQL?
It is possible to write the INSERT INTO statement in
two ways:

1. Specify both the column names and the values to

be inserted:
INSERT INTO table_name (column1, column2,
column3, ...)
VALUES (value1, value2, value3, ...);

2. If you are adding values for all the columns of the

table, you do not need to specify the column names
in the SQL query. However, make sure the order of
the values is in the same order as the columns in
the table. Here, the INSERT INTO syntax would be as
follows:
INSERT INTO table_name
VALUES (value1, value2, value3, ...);

33. How can we change a table name in SQL?

We will start off by giving the keywords ALTER TABLE,
then we will follow it up by giving the original name
of the table, after that, we will give in the keywords
RENAME TO and finally, we will give the new table
name.

For example, if we want to change the “employee”

table to “employee_information”, this will be the
command:
ALTER TABLE employee
RENAME TO employee_information;

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

15
34. What is SQL server?
SQL server has stayed on top as one of the most popular
database management products ever since its first
release in 1989 by Microsoft Corporation. The product is
used across industries to store and process large
volumes of data. It was primarily built to store and
process data that is built on a relational model of data.

SQL Server is widely used for data analysis and also

scaling up of data. SQL Server can be used in conjunction
with Big Data tools such as Hadoop.

SQL Server can be used to process data from various

data sources such as Excel, Table, .Net Framework
application, etc.

35. What is ETL in SQL?

ETL stands for Extract, Transform and Load. It is a three
step process, where we would have to start off by
extracting the data from sources. Once we collate the
data from different sources, we have our raw data. This
raw data has to be transformed into a tidy format, which
will come in the second phase. Finally, we would have to
load this tidy data into tools which would help us to find
insights.

36. What are nested queries?

Triggers may implement DML by using INSERT, UPDATE,
and DELETE statements. These triggers that contain DML
and find other triggers for data modification are called
Nested Triggers.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

16
37. What is the difference between CHAR and VARCHAR2
data types in SQL server?
When stored in a database, varchar2 uses only the
allocated space. E.g. if you have a varchar2(1999) and
put 50 bytes in the table, it will use 52 bytes.
But when stored in a database, char always uses the
maximum length and is blank-padded. E.g. if you have
char(1999) and put 50 bytes in the table, it will consume
2000 bytes.

38. What is difference between SQL and PL/SQL?

SQL is a Structured Query Language to create and access
databases whereas PL/SQL comes with procedural
concepts of programming languages.

39. What is the difference between SQL and MySQL?

SQL is a Structured Query Language that is used for
manipulating and accessing the relational database. On
the other hand, MySQL itself is a relational database that
uses SQL as the standard database language.

40. What is cross join?

Cross join is a Cartesian product where number of rows
in the first table multiplied by number of rows in the
second table.

41. What are user-defined functions?

User-defined functions are the functions written to use
that logic whenever required. It is not necessary to write
the same logic several times. Instead, function can be
called or executed whenever needed.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

17
42. What is a CLAUSE?
SQL clause is defined to limit the result set by
providing condition to the query. This usually filters
some rows from the whole set of records.
Example – Query that has WHERE condition.

43. What is recursive stored procedure?

This is a stored procedure which calls by itself until
it reaches some boundary condition. This recursive
function or procedure helps programmers to use
the same set of code any number of times.

44. Explain UNION, MINUS and INTERACT commands?

UNION operator is used to combine the results of
two tables, and it eliminates duplicate rows from
the tables.
MINUS operator is used to return rows from the
first query but not from the second query.
Matching records of first and second query and
other rows from the first query will be displayed
as a result set.
INTERSECT operator is used to return rows
returned by both the queries.

45. What TCP/IP port does SQL Server run?

By default, SQL Server runs on port 1433.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

18
46. Which operator is used in query for pattern
matching?
LIKE operator is used for pattern matching, and it can be
used with:
% - Matches zero or more characters.
_(Underscore) – Matching exactly one character.

47. How can we select unique records from a Table?

Select unique records from a table by using DISTINCT
keyword.

48. List and explain each of the ACID properties that

collectively guarantee that database transactions are
processed reliably.
ACID Properties are used for maintaining the integrity of
database during transaction processing. ACID in DBMS
stands for Atomicity, Consistency, Isolation, and Durability.
Atomicity: A transaction is a single unit of operation.
You either execute it entirely or do not execute it at all.
There cannot be partial execution.
Consistency: Once the transaction is executed, it
should move from one consistent state to another.
Isolation: Transaction should be executed in isolation
from other transactions. During concurrent transaction
execution, intermediate transaction results from
simultaneously executed transactions should not be
made available to each other.
Durability: After successful completion of a
transaction, the changes in the database should
persist, even in the case of system failures.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

19
49. What is the main difference in the BETWEEN and
IN condition operators?
BETWEEN operator is used to display rows based on
a range of values in a row whereas the IN condition
operator is used to check for values contained in a
specific set of values.
Example of BETWEEN: SELECT * FROM Students
where ROLL_NO BETWEEN 10 AND 50;
Example of IN: SELECT * FROM students where
ROLL_NO IN (8,15,25);

50. What are SQL functions used for?

SQL functions are used for the following purposes:
To perform some calculations on the data
To modify individual data items
To manipulate the output
To format dates and numbers
To convert the data types

51. What is the need for MERGE statement?

This statement allows conditional update or
insertion of data into a table. It performs an UPDATE
if a row exists, or an INSERT if the row does not exist.

52. List the ways in which dynamic SQL can be

executed.
Write a query with parameters.
Using EXEC.
Using sp_executesql.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

20
53. List some case manipulation functions in SQL.
There are three case manipulation functions in SQL,
namely:
LOWER: This function returns the string in
lowercase. It takes a string as an argument and
returns it by converting it into lower case. Syntax:
LOWER(‘string’)
UPPER: This function returns the string in
uppercase. It takes a string as an argument and
returns it by converting it into uppercase. Syntax:
UPPER(‘string’)
INITCAP: This function returns the string with the
first letter in uppercase and rest of the letters in
lowercase. Syntax: INITCAP(‘string’)

54. Is semicolon used after sql? Justify why or why

not.
Some database systems require a semicolon at the
end of each SQL statement. Semicolon is the
standard way to separate each SQL statement in
database systems that allow more than one SQL
statement to be executed in the same call to the
server.

55. What is candidate key?

A candidate key is a subset of a super key set
where the key which contains no redundant
attribute is none other than a Candidate Key. In
order to select the candidate keys from the set of
super key, we need to look at the super key set.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

21
56. What is the difference between JOIN and UNION?
JOIN
JOIN in SQL is used to combine data from many
tables based on a matched condition between
them. The data combined using JOIN statement
results into new columns.
UNION
UNION in SQL is used to combine the result-set of
two or more SELECT statements. The data combined
using UNION statement results into new distinct
rows.

57. What is the difference between order and group

by?
ORDER BY
The ORDER BY clause is used in SQL queries to sort
the data returned by a query in ascending or
descending order. If we omit the sorting order, it
sorts the summarized result in the ascending order
by default. The ORDER BY clause, like the GROUP BY
clause, could be used in conjunction with the SELECT
statement. ASC denotes ascending order, while
DESC denotes descending order.

The following is the syntax to use the ORDER BY

clause in a SQL statement:
SELECT expressions
FROM tables
[WHERE conditions]
ORDER BY expression [ ASC | DESC ];

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

22
GROUP BY
The GROUP BY clause is used in SQL queries to
organize data that have the same attribute values.
Usually, we use it with the SELECT statement. It is
important to remember that we have to place the
GROUP BY clause after the WHERE clause.
Additionally, it is paced before the ORDER BY clause.

We can often use this clause in collaboration with

aggregate functions like SUM, AVG, MIN, MAX, and
COUNT to produce summary reports from the
database. It's important to remember that the
attribute in this clause must appear in the SELECT
clause, not under an aggregate function. If we do
so, the query would be incorrect. As a result, the
GROUP BY clause is always used in conjunction with
the SELECT clause. The query for the GROUP BY
clause is grouped query, and it returns a single row
for each grouped object.

The following is the syntax to use GROUP BY clause

in a SQL statement:
SELECT column_name, function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name;

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

23
58. Write an SQL query to fetch employee names having
a salary greater than or equal to 20000 and less than or
equal to 10000.
By using BETWEEN in the where clause, we can retrieve the
Employee Ids of employees with salary >= 20000 and
<=10000.
e.g.
SELECT FullName
FROM EmployeeDetails
WHERE EmpId
IN (SELECT EmpId FROM EmployeeSalary WHERE Salary
BETWEEN 0 AND 10000)

59. What is SQL injection? When does SQL injection

occur?
SQL Injection is a type of database attack technique
where malicious SQL statements are inserted into an
entry field of database in a way that once it is executed,
the database is exposed to an attacker for the attack.
This technique is usually used for attacking data-driven
applications to have access to sensitive data and
perform administrative tasks on databases.

60. What is ENUM?

An ENUM is a string object with a value chosen from a list
of permitted values that are enumerated explicitly in the
column specification at table creation time.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

24
61. What is the difference between the ATAN and ATAN2
function?
ATAN() Function
ATAN() function in MySQL is used to return the arc
tangent of any number x. The arctangent of x is defined
as the inverse tangent function of x when x is real (x ℝ). ∈
ATAN2() Function
ATAN2() function in MySQL is used for returning the arc
tangent between specified two numbers, i.e., x and y. It
returns the angle between the positive x-axis and the line
from the origin to the point (y, x).

62. What is the difference between the CEIL, FLOOR and

ROUND functions?
ROUND - Rounds a positive or negative value to a
specific length.
CEILING - Evaluates the value on the right side of the
decimal and returns the smallest integer greater than,
or equal to, the specified numeric expression.
FLOOR - Evaluates the value on the right side of the
decimal and returns the largest integer less than or
equal to the specified numeric expression.

63. What is the RAND() function?

The RAND() function will return a value between 0
(inclusive) and 1 (exclusive). The RAND() function will
return a completely random number if no seed is
provided, and a repeatable sequence of random
numbers if a seed value is used.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

25
64. What is the difference between LOCALTIMESTAMP and
CURRENT_TIMESTAMP?
LOCALTIMESTAMP returns only time stamp value where as
the function CURRENT_TIMESTAMP will return time stamp
with Time Zone value.

65. Name three functions that specify current date and

time.
SQL Server provides several different functions that return
the current date time including: GETDATE(),
SYSDATETIME(), and CURRENT_TIMESTAMP.

66. Which function returns the difference between two

periods? What would the format of the output be?
DATEDIFF() is a basic SQL Server function that can be
used to do date math. Specifically, it gets the difference
between 2 dates with the results returned in date units
specified as years, months, days, minutes, seconds as an
int (integer) value.

67. How can we fetch common records from two tables?

Intersection A ∩ B of two sets A and B is the set, which
contains all the elements of A, which also belong to B (or
equivalently, all elements of B that also belong to A), but
no other elements.
Let A={ Orange, pineapple, banana} and let B={
spoon,Orange, pineapple, mango}
A ∩ B = {Orange, pineapple}
Select * from student
Select * from student1
(Select * from student) Intersect (Select * from
student1)
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

26
68. How can we fetch alternate records from a
table?
Records can be fetched for both Odd and Even row
numbers.
To display even numbers
Select employeeId from (Select row no, employeeId
from employee) where mod(row no,2)=0
To display odd numbers
Select employeeId from (Select rowno, employeeId
from employee) where mod(row no,2)=1

69. How can we select unique records from a table?

Select unique records from a table by using the
DISTINCT keyword.

70. What is the command used to fetch the first 5

characters of the string?
SELECT SUBSTRING('SQL Tutorial', 1, 5) AS
ExtractString;

71. How to use LIKE in SQL?

The LIKE operator checks if an attribute value
matches a given string pattern. Here is an example
of LIKE operator:

SELECT * FROM employees WHERE first_name like

‘Steven’;

With this command, we will be able to extract all the

records where the first name is like “Steven”.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

27
72. How can we copy a table in SQL?
We can use the SELECT INTO statement to copy data
from one table to another. Either we can copy all
the data or only some specific columns.

This is how we can copy all the columns into a new

table:
SELECT *
INTO newtable
FROM oldtable
WHERE condition;

If we want to copy only some specific columns, we

can do it this way:
SELECT column1, column2, column3, ...
INTO newtable
FROM oldtable
WHERE condition;

73. If we drop a table, does it also drop related

objects such as constraints, indexes, columns,
default, views and sorted procedures?
Yes, SQL server drops all related objects, which
exists inside a table like constraints, index, columns,
defaults etc. However, dropping a table will not drop
views and sorted procedures as they exist outside
the table.

74. What is Live Lock?

A live lock is one wherein a request for an exclusive
lock is repeatedly denied because a series of
overlapping shared locks keep interfering.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

75. Can you join a table by itself? 28

A table can be joined to itself using self join, when
you want to create a result set that joins records in
a table with other records in the same table.

76. Explain Equi join with an example.

When two or more tables has been joined using
equal to operator then this category is called as
equi join.
Example:
Select a.Employee_name, b.Department_name
from Employee a, Employee b where
a.Department_ID = b.Department_ID

77. Explain non-Equi join with an example.

When two or more tables are joining without an
equal to condition then that join is known as Non
Equi Join. Any operator can be used here, that is
<>,!=,<,>,Between.
Example:
Select b.Department_ID, b.Department_name from
Employee a, Department b where a.Department_id
<> b.Department_ID;

78. State the difference between NVL and NVL2

functions.
Both the NVL(exp1, exp2) and NVL2(exp1, exp2,
exp3) functions check the value exp1 to see if it is
null. With the NVL(exp1, exp2) function, if exp1 is not
null, then the value of exp1 is returned; otherwise,
the value of exp2 is returned. With the NVL2(exp1,
exp2, exp3) function, if exp1 is not null, then exp2 is
returned; otherwise, the value of exp3 is returned.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

29
79. What does this query achieve? GRANT
privilege_name ON object_name TO
{user_name|PUBLIC|role_name} [WITH GRANT
OPTION]; ?
The given syntax indicates that the user can grant
access to another user too.

80. Where is MyISAM table stored?

Each MyISAM table is stored on disk in three files.
The “.frm” file stores the table definition.
The data file has a ‘.MYD’ (MYData) extension.
The index file has a ‘.MYI’ (MYIndex) extension.

81. What does myisamchk do?

It compresses the MyISAM tables, which reduces
their disk or memory usage.

82. How can we store videos inside SQL server

table?
By using FILESTREAM datatype, which was introduced
in SQL Server 2008.

83. Write an SQL query to show the second highest

salary from a table.
Below is the syntax to find 2nd highest salary in SQL:
SELECT name, MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary)
FROM employees);

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

30
84. How would you select all the users whose phone
number is NULL?
SELECT user_name FROM users WHERE
ISNULL(user_phonenumber);

85. Write an SQL query to fetch three max salaries from a

table.
SELECT TOP 1 salary FROM ( SELECT TOP 3 salary FROM
employee_table ORDER BY salary DESC ) AS emp ORDER
BY salary ASC;

86. Write an SQL query to create a new table with data

and structure copied from another table.
Using SELECT INTO command- SELECT * INTO newTable
FROM EmployeeDetails;

87. What are the differences between the HAVING clause,

and the WHERE clause?

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

31
88. What does a BCP command do?
The Bulk Copy is a utility or a tool that
exports/imports data from a table into a file and
vice versa.

89. Can a view be active if the base table is

dropped?
No, the view cannot be active if the parent table is
dropped.

90. When should we use NoSQL and SQL?

SQL stands for structured query language and is
majorly used to query data from relational
databases. When we talk about a SQL database, it
will be a relational database.
But when it comes to NoSQL database, we will be
working with non-relational databases.

91. What is SYSTEM privilege?

Rights are given to a user, usually by the DBA, to
perform a particular action on the database
schema objects like creating tablespaces.

The following are examples of system privileges that

can be granted to users:
CREATE TABLE allows a grantee to create tables
in the grantee's schema.
CREATE USER allows a grantee to create users in
the database.
CREATE SESSION allows a grantee to connect to
an Oracle database to create a user session.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

32
92. What are object privileges?
An object-level privilege is a permission granted to a
database user account or role to perform some action on
a database object. These object privileges include
SELECT, INSERT, UPDATE, DELETE, ALTER, INDEX on tables, and
so on.

The following example is of object privileges that can be

granted to users:
SELECT ON hr.employees TO myuser INSERT ON
hr.employees TO myuser

93. Does the data stored in the stored procedure

increase access time or execution time? Explain.
Data stored in stored procedures can be retrieved much
faster than the data stored in the SQL database. Data
can be precompiled and stored in stored procedures.
This reduces the time gap between query and compiling
as the data has been pre-compiled and stored in the
procedure.

94. What is CTE?

A CTE or common table expression is an expression that
contains temporary result set which is defined in a SQL
statement.

95. Does view contain data?

No, Views are virtual structures.

96. Define a temp table.

A temp table is a temporary storage structure to store
the data temporarily.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

33
97. What is the difference between the
RANK() and DENSE_RANK() function?
The only difference between the RANK() and
DENSE_RANK() functions is in cases where there is a “tie”;
i.e., in cases where multiple values in a set have the same
ranking. In such cases, RANK() will assign non-
consecutive “ranks” to the values in the set (resulting in
gaps between the integer ranking values when there is a
tie), whereas DENSE_RANK() will assign consecutive
ranks to the values in the set (so there will be no gaps
between the integer ranking values in the case of a tie).

For example, consider the set {25, 25, 50, 75, 75, 100}. For
such a set, RANK() will return {1, 1, 3, 4, 4, 6} (note that
the values 2 and 5 are skipped), whereas DENSE_RANK()
will return {1,1,2,3,3,4}.

98. What is referential integrity?

Set of rules that restrict the values of one or more
columns of the tables based on the values of the primary
key or unique key of the referenced table.

99. What does query optimization imply?

Query optimization is a process in which a database
system compares different query strategies and selects
the query with the least cost.

100. What are nested triggers?

Triggers may implement data modification logic by using
INSERT, UPDATE, and DELETE statements. These triggers
that contain data modification logic and find other
triggers for data modification are called Nested Triggers.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

35
104. What is CTE in SQL server?
CTEs are Common Table Expressions that are used
to create temporary result tables from which data
can be retrieved/ used. The standard syntax for a
CTE with a SELECT statement is:
WITH RESULT AS
(SELECT COL1, COL2, COL3
FROM EMPLOYEE)
SELECT COL1, COL2 FROM RESULT

CTEs can be used with Insert, Update or Delete

statements as well.

Few examples of CTEs are given below:

Query to find the 10 highest salaries with result
as:
(select distinct salary, dense_rank() over (order by
salary desc) as salary rank from employees)
select result. salary from result where the
result.salaryrank = 10
Query to find the 2nd highest salary with the
result as:
(select distinct salary, dense_rank() over (order by
salary desc) as salaryrank from employees)
select result. salary from result where the
result.salaryrank = 2

In this way, CTEs can be used to find the nth highest

salary within an organisation.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

34
101. What is schema in SQL server?
Our database comprises of a lot of different entities
such as tables, stored procedures, functions,
database owners and so on. To make sense of how
all these different entities interact, we would need
the help of schema. So, you can consider schema to
be the logical relationship between all the different
entities which are present in the database.

Once we have a clear understanding of the schema,

this helps in a lot of ways:
We can decide which user has access to which
tables in the database.
We can modify or add new relationships between
different entities in the database.

Overall, you can consider a schema to be a

blueprint for the database, which will give you the
complete picture of how different objects interact
with each other and which users have access to
different entities.

102. Write a query to fetch 50% records from an

EmployeeInfo table.
Select top 50 percent * from Employee;

103. Write a query to add email validation to your

database.
SELECT * FROM student
WHERE s_email LIKE '%@gmail.com';

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

36
105. Suppose you have a sample table of Workers with
columns Worker_id, first_name,last_name, salary,
join_date, department. We have another table bonus with
columns worker_ref_id, bonus date, bonus_amt. We also
have another table called title and it has cols like
worker_ref_id, worker_title, affected_from.

Write an SQL query to print the FIRST_NAME and

LAST_NAME from Worker table into a single column
COMPLETE_NAME. A space char should separate them.
Select CONCAT(FIRST_NAME, ' ', LAST_NAME) AS
'COMPLETE_NAME' from Worker;

Write an SQL query to fetch duplicate records having

matching data in some fields of a table.
SELECT WORKER_TITLE, AFFECTED_FROM, COUNT(*)
FROM Title
GROUP BY WORKER_TITLE, AFFECTED_FROM
HAVING COUNT(*) > 1;

Write an SQL query to print the name of employees

having the highest salary in each department.
SELECT t.DEPARTMENT,t.FIRST_NAME,t.Salary
from(SELECT max(Salary) as TotalSalary,DEPARTMENT
from Worker group by DEPARTMENT) as TempNew
Inner Join Worker t on
TempNew.DEPARTMENT=t.DEPARTMENT and
TempNew.TotalSalary=t.Salary;

Write an SQL query that fetches the unique values of

DEPARTMENT from Worker table and prints its length.
Select distinct length(DEPARTMENT) from Worker;
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

37
106. Write a query to fetch the top N records.
The SELECT TOP clause allows you to limit the
number of rows or percentage of rows returned in a
query result set.

Because the order of rows stored in a table is

unspecified, the SELECT TOP statement is always
used in conjunction with the ORDER BY clause.
Therefore, the result set is limited to the first N
number of ordered rows.

The following shows the syntax of the TOP clause

with the SELECT statement:
SELECT TOP (expression) [PERCENT][WITH TIES]
FROM table_name
ORDER BY column_name;

In this syntax, the SELECT statement can have other

clauses such as WHERE, JOIN, HAVING, and GROUP
BY.

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

38
Social Media Company Interview Qs
(e.g. Facebook)

1. Find the new users which are defined as users that have
started using the services for the first time.
We can find this by finding the minimum date from the
'time_id' column for each user, which gives the date they
started using services.
SELECT user_id,
min(time_id) as new_user_start_date
FROM fact_events
GROUP BY user_id

2. Calculate the count of new users by month by

extracting the month from the date and counting unique
users.
SELECT date_part('month', new_user_start_date) AS
month,
count(DISTINCT user_id) as new_users
FROM (SELECT user_id, min(time_id) as
new_user_start_date FROM fact_events
GROUP BY user_id) sq
GROUP BY month

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

39
3. Calculate all users (existing and new) for each month.
This will give us existing users once we subtract out the
new users.
SELECT date_part('month', time_id) AS month,
count(DISTINCT user_id) as all_users
FROM fact_events
GROUP BY month

4. Join the two tables together by month.

with all_users as (
SELECT date_part('month', time_id) AS month,
count(DISTINCT user_id) as all_users
FROM fact_events
GROUP BY month),
new_users as (
SELECT date_part('month', new_user_start_date) AS
month,
count(DISTINCT user_id) as new_users
FROM
(SELECT user_id,
min(time_id) as new_user_start_date
FROM fact_events
GROUP BY user_id) sq
GROUP BY month
)
SELECT
*
FROM all_users au
JOIN new_users nu ON nu.month = au.month

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

40
5. Calculate user shares.
with all_users as (
SELECT date_part('month', time_id) AS month,
count(DISTINCT user_id) as all_users
FROM fact_events
GROUP BY month),
new_users as (
SELECT date_part('month', new_user_start_date) AS
month,
count(DISTINCT user_id) as new_users
FROM
(SELECT user_id,
min(time_id) as new_user_start_date
FROM fact_events
GROUP BY user_id) sq
GROUP BY month
)
SELECT
au.month,
new_users / all_users::decimal as share_new_users,
1- (new_users / all_users::decimal) as
share_existing_users
FROM all_users au
JOIN new_users nu ON nu.month = au.month

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

41
Audio Streaming Service Company Interview Qs
(e.g. Spotify)

1. Write a query to return top 5 songs in the UK yesterday.

SELECT
S.song_id,
S.name
FROM song_plays P
INNER JOIN song S
ON P.song_id = S.id
WHERE P.country = 'UK'
AND P.date = CURRENT_DATE - 1
ORDER BY daily_plays DESC
LIMIT 5;

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

42
2. Write a query to return the top 5 artists in the US and
UK yesterday.
WITH artist_ranking AS (
SELECT
A.artist_id,
MAX(A.artist_name) AS artist_name,
MAX(P.country) AS country
ROW_NUMBER() OVER(PARTITION BY country ORDER
BY SUM(plays) DESC) AS ranking
FROM daily_plays P
INNER JOIN song S
ON P.song_id = S.id
INNER JOIN artist A ON
A.artist_id = S.artist_id
WHERE P.country IN ('UK', 'US')
AND P.date = CURRENT_DATE - 1
GROUP BY A.artist_id
)
SELECT artist_id, artist_name, country, ranking
FROM artist_ranking
WHERE ranking <= 5
LIMIT 5;

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

43
e-Commerce Company Interview Qs
(e.g. Amazon)

1. Assume you are given the below table on purchases

from users. Write a query to get the number of people
that purchased at least one product on multiple days.

SELECT *
FROM
(SELECT p.user_id,
COUNT (DISTINCT purchase_id) as purchase_frequency
FROM purchase_p
GROUP BY p.user_id)
PIVOT
(COUNT (user_id)
for purchase_frequency in ('1' one, '2' two, '3' three)
);

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

44
2. Assume you are given
the table alongside for the
session activity of user.
Write a query to assign ranks
to users by the total session
duration for the different
session types they have had
between a start date
(2020-01-01) and an end date (2020-02-01).

SELECT ss.*,
rank() over (partition by ss.user_id order by
ss.total_duration desc) as rank_order
FROM (select s.user_id,
s.session_type,
sum(s.duration) as total_duration
FROM sessions.s
WHERE s.start_time between '01-jan-20' and '01-feb-20'
GROUP BY s.user_id,
s.session_type)ss

3. How many customers

placed an order and
what is the average order
amount?

SELECT count(DISTINCT customer_id),

avg(amount)
FROM orders

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

45
Entertainment Streaming Company
Interview Qs (e.g. Netflix)

1. Return the share of monthly active users in United

States (U.S). Active users are the ones with the "open"
status in the table.

SELECT active_users /total_users::float AS

active_users_share
FROM
(SELECT count(user_id) total_users,
count(CASE
WHEN status = 'open' THEN 1
ELSE NULL
END) AS active_users
FROM fb_active_users
WHERE country = 'USA') subq

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

46
2. Using the table given,
list the top 10 users who
accumulated the most
sessions where they
had more streaming
sessions than viewing.
Return the user_id,
number of streaming
sessions, and the number of viewing sessions.

SELECT user_id,
count(CASE
WHEN session_type='streamer' THEN 1
ELSE NULL
END) AS streaming,
count(CASE
WHEN session_type='viewer' THEN 1
ELSE NULL
END) AS VIEW
FROM twitch_sessions
GROUP BY user_id
HAVING count(CASE
WHEN session_type='streamer' THEN 1
ELSE NULL
END) > count(CASE
WHEN session_type='viewer' THEN 1
ELSE NULL
END)
LIMIT 10

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

47
Financial Institution Interview Qs (e.g. HSBC)

1. Write a query that return the rate_type, loan_id and

balance of each loan type, and a column that shows
what percentage of the submission's total balance each
loan constitutes.

SELECT s1.loan_id,
s1.rate_type,
sum(s1.balance) AS balance,
sum(s1.balance)::decimal/total_balance AS
balance_share
FROM submissions s1
LEFT JOIN
(SELECT rate_type,
sum(balance) AS total_balance
FROM submissions
GROUP BY rate_type) s2 ON s1.rate_type =
s2.rate_type
GROUP BY s1.loan_id,
s1.rate_type,
s2.total_balance
ORDER BY s1.rate_type, s1.loan_id

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

48
Online Marketplace Interview Qs (e.g. AirBnB)

1. Find the average number of bathrooms and bedrooms

for each city and property type.

Select city,
property_type,
avg(bathrooms) as average_bathrooms,
avg(bedrooms) as average_bedrooms
from airnb_search_details
group by city,
property_type;

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

49
2. Find the min, avg and max log price per review
qualification.
The review qualification is categorized by the number of
reviews as defined below, along with the associated price
0 reviews : NO
1 to 5 reviews : FEW
5 to 15 reviews : SOME
15 to 40 reviews : MANY
More than 40 reviews : ALOT

Select b.qualification_category,
min(b.price),
avg(b.price),
max(b.price)
from
(select a.*,
case when a.number_of_reviews = 0 then 'NO'
when a.number_of_reviews between 1 and 5 then 'FEW'
when a.number_of_reviews between 5 and 15
then 'SOME'
when a.number_of_reviews between 15 and 40
then 'MANY'
when a.number_of_reviews > 40 then 'ALOT'
else 'NA' end as qualification_category
from airbnb_search_details a) b
group by qualification_category;

zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE

50
Software Company Interview Qs (e.g. Microsoft)

1. Write query to show the top n (say 10) records of a

table in 3 different ways.
Using LIMIT Method
SELECT * FROM Worker ORDER BY Salary DESC LIMIT 10;
Using TOP command
SELECT TOP 10 * FROM Worker ORDER BY Salary DESC;
Using ROWNUM
SELECT * FROM (SELECT * FROM Worker ORDER BY Salary
DESC) WHERE ROWNUM <= 10;

2. Write an SQL query to print the name of the employees

having the highest salary in each department.
SELECT t.department, t.first_name, t.salary from (SELECT
MAX(Salary) as total_salary, department FROM Worker
GROUP BY Department) as TempNew
INNER JOIN Worker t on TempNew.Department=
t.Department
and TempNew.TotalSalary = t.Salary;
zepanalytics.com

SQL Presentation 1
67% (3)
SQL Presentation 1
30 pages
SQL Commands Cheat Sheet
86% (7)
SQL Commands Cheat Sheet
1 page
Dbms Practical File
No ratings yet
Dbms Practical File
29 pages
662a5089e0494246e350140dslides - Data Wrangling With SQL
No ratings yet
662a5089e0494246e350140dslides - Data Wrangling With SQL
85 pages
Advanced SQL Concepts
No ratings yet
Advanced SQL Concepts
38 pages
Crack Your Data Engineering SQL Round
No ratings yet
Crack Your Data Engineering SQL Round
112 pages
Quick SQL Cheatsheet: SELECT: Used To Select Data From A Database
No ratings yet
Quick SQL Cheatsheet: SELECT: Used To Select Data From A Database
8 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
11 pages
SQL-Commands-revision - Sheet (Nisha - Jha)
No ratings yet
SQL-Commands-revision - Sheet (Nisha - Jha)
1 page
SQL Notes-1
No ratings yet
SQL Notes-1
28 pages
The Most Commonly Used SQL Queries
No ratings yet
The Most Commonly Used SQL Queries
29 pages
Powerful SQL Commands
No ratings yet
Powerful SQL Commands
32 pages
SQL Commands
No ratings yet
SQL Commands
4 pages
Basic SQL Queries
No ratings yet
Basic SQL Queries
16 pages
Advanced Concepts in SQL
No ratings yet
Advanced Concepts in SQL
5 pages
2402 Week3
No ratings yet
2402 Week3
63 pages
SQL
No ratings yet
SQL
9 pages
SQL Queries 2
No ratings yet
SQL Queries 2
7 pages
SQL Fundamentals
No ratings yet
SQL Fundamentals
45 pages
Guide To SQL Queries Basics
No ratings yet
Guide To SQL Queries Basics
15 pages
SQL for Data Analysis.pdf
No ratings yet
SQL for Data Analysis.pdf
10 pages
SQL Commands Glossary PDF
100% (1)
SQL Commands Glossary PDF
6 pages
Basic SQL Commands
No ratings yet
Basic SQL Commands
10 pages
Wa0003.
No ratings yet
Wa0003.
20 pages
Udacity Challenge Prog - DS With Python - NOTES
No ratings yet
Udacity Challenge Prog - DS With Python - NOTES
84 pages
Database Nest Quiz
No ratings yet
Database Nest Quiz
22 pages
SQL Commands - Codecademy
No ratings yet
SQL Commands - Codecademy
8 pages
SQL Basics
No ratings yet
SQL Basics
15 pages
List of SQL Commands Codecademy
No ratings yet
List of SQL Commands Codecademy
1 page
Lec4 - SQL ASR
No ratings yet
Lec4 - SQL ASR
55 pages
SQLCheatsheet
No ratings yet
SQLCheatsheet
5 pages
UNIT-3
No ratings yet
UNIT-3
64 pages
How To Simplify Complex SQL Queries
No ratings yet
How To Simplify Complex SQL Queries
22 pages
Alter Table: Table - Name ADD Column - Name Datatype
No ratings yet
Alter Table: Table - Name ADD Column - Name Datatype
5 pages
100 SQL Commands
No ratings yet
100 SQL Commands
10 pages
DBMS Chapter7 Notes
No ratings yet
DBMS Chapter7 Notes
5 pages
Unit 3 notes DBMS final
No ratings yet
Unit 3 notes DBMS final
14 pages
SQL Query Updated
No ratings yet
SQL Query Updated
30 pages
Shivanesh Dbms
No ratings yet
Shivanesh Dbms
22 pages
IDAB Assignment 3: 1. Explain SQL Subqueries
No ratings yet
IDAB Assignment 3: 1. Explain SQL Subqueries
6 pages
S07 Slides
No ratings yet
S07 Slides
17 pages
Week 9 Updated
No ratings yet
Week 9 Updated
31 pages
RDBMS
No ratings yet
RDBMS
49 pages
Chapter-6 Add From Handout
No ratings yet
Chapter-6 Add From Handout
72 pages
Week 4. Advanced SQL
No ratings yet
Week 4. Advanced SQL
71 pages
Unit I: DBMS (Database Management System)
No ratings yet
Unit I: DBMS (Database Management System)
26 pages
SQL Concepts and Queries
No ratings yet
SQL Concepts and Queries
11 pages
Table of Contents
No ratings yet
Table of Contents
4 pages
Use Advanced Structured Query Language: Module Title: Nominal Duration
No ratings yet
Use Advanced Structured Query Language: Module Title: Nominal Duration
19 pages
List of SQL Commands: Background
No ratings yet
List of SQL Commands: Background
6 pages
SC4x W2L2 v2
No ratings yet
SC4x W2L2 v2
49 pages
SQL Theory With Query
No ratings yet
SQL Theory With Query
11 pages
Crack the top 40 SQL interview questions by The Educative Team Jun, 2022 Grokking the Tech Interview
No ratings yet
Crack the top 40 SQL interview questions by The Educative Team Jun, 2022 Grokking the Tech Interview
1 page
IP XII Quick Notes- Querying in MYSQL
No ratings yet
IP XII Quick Notes- Querying in MYSQL
11 pages
Dbms
No ratings yet
Dbms
40 pages
DB Chapter3
No ratings yet
DB Chapter3
56 pages
Untitled document
No ratings yet
Untitled document
41 pages
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
SCDM Intro 2020r2 en Le04
No ratings yet
SCDM Intro 2020r2 en Le04
23 pages
Digital Content Manager Version 20.1 Installation Guide
No ratings yet
Digital Content Manager Version 20.1 Installation Guide
32 pages
Project: Solar Powered Induction Motor Drive
No ratings yet
Project: Solar Powered Induction Motor Drive
75 pages
Aconex 1
No ratings yet
Aconex 1
1 page
Chapter Ii
No ratings yet
Chapter Ii
5 pages
B-Tree in Database Management Systems (DBMS)
No ratings yet
B-Tree in Database Management Systems (DBMS)
19 pages
Imagepilot
No ratings yet
Imagepilot
28 pages
Cfm56 5b Engine Systems
100% (2)
Cfm56 5b Engine Systems
493 pages
BE2010 R2 Download Instructions
No ratings yet
BE2010 R2 Download Instructions
11 pages
SECTION 05530 Gratings Rev 1
No ratings yet
SECTION 05530 Gratings Rev 1
17 pages
Kathryn A. Palladino: Education
No ratings yet
Kathryn A. Palladino: Education
2 pages
DSP Lab Manual C Matlab Programs Draft 2008 B.Tech ECE IV-I JNTU Hyd V 1.9
100% (21)
DSP Lab Manual C Matlab Programs Draft 2008 B.Tech ECE IV-I JNTU Hyd V 1.9
47 pages
Thesis Sa Social Media
100% (2)
Thesis Sa Social Media
5 pages
Common Latitude-14-7400-Laptop Install Guide en Us
No ratings yet
Common Latitude-14-7400-Laptop Install Guide en Us
6 pages
Lab 5
No ratings yet
Lab 5
28 pages
SAP MM Consultant - Course Content - 85 Hours Weekend Class (Sat and Sunday-3 Hrs Per Day)
No ratings yet
SAP MM Consultant - Course Content - 85 Hours Weekend Class (Sat and Sunday-3 Hrs Per Day)
4 pages
High Voltage MLC Chips: For 600V To 5000V Applications
No ratings yet
High Voltage MLC Chips: For 600V To 5000V Applications
2 pages
VanJee W-115B+ ETC OBU Product Introduction
No ratings yet
VanJee W-115B+ ETC OBU Product Introduction
4 pages
Syllabus For Apece, University of Dhaka
No ratings yet
Syllabus For Apece, University of Dhaka
20 pages
Project Appendix A
No ratings yet
Project Appendix A
16 pages
Intrusion Detection System
No ratings yet
Intrusion Detection System
45 pages
Google Travel in A Changing World PDF
No ratings yet
Google Travel in A Changing World PDF
61 pages
Mobile Apps and Advertising
100% (1)
Mobile Apps and Advertising
9 pages
Oracle SQL Tutorial
No ratings yet
Oracle SQL Tutorial
22 pages
Example of Terms and Conditions
No ratings yet
Example of Terms and Conditions
22 pages
Weeks 8-9 Sessions 21-26 Sub Procedures and Functions
No ratings yet
Weeks 8-9 Sessions 21-26 Sub Procedures and Functions
7 pages
Case Study Facebook Privacy:What Privacy?
No ratings yet
Case Study Facebook Privacy:What Privacy?
4 pages
Senior PMO Project Manager in Washington DC Resume Mark Yader
No ratings yet
Senior PMO Project Manager in Washington DC Resume Mark Yader
4 pages
Akamai Migration Guide
No ratings yet
Akamai Migration Guide
2 pages
Hcde 417 Group 3 Final Report
No ratings yet
Hcde 417 Group 3 Final Report
52 pages

SQL__1721960421

Uploaded by

SQL__1721960421

Uploaded by

Scanned by CamScanner

Study Guide: Data Retrieval with SQL Category Operator Command

r Query structure – Queries are usually structured as follows: SQL

-- Gather info from other sources....optional

Massachusetts Institute of Technology 1 https://www.mit.edu/~amidi

The SQL command is as follows:

Category Operation Command

Massachusetts Institute of Technology 2 https://www.mit.edu/~amidi

Advanced functions Category Operation Command

SQL Transform date DATE_ADD(col_date, number_of_days)

... [query] ...

Massachusetts Institute of Technology 3 https://www.mit.edu/~amidi

where [table_type], [creation_type] and [options] are one of the following:

Category Command Description

where [insert_type] is among the following:

Massachusetts Institute of Technology 4 https://www.mit.edu/~amidi

2. What is Structured Query Language?

4. What is primary key?

6. What is a foreign key?

7. Explain the difference between spreadsheets and

9. Explain the various SQL languages.

10. What is normalization?

11. What is denormalization?

13. What are views in SQL?

There are various types of joins which can be used to

16. What is a cursor in SQL?

18. What is a subquery?

There are two types of subquery – Correlated and Non-

Be a part of the team at Zep

Why don't you start your journey

20. Differentiate between the DELETE and TRUNCATE

21. What are local and global variables?

23. What is data integrity?

24. What is auto increment?

25. What is a data warehouse?

26. What is the difference between DROP and

Scalar SQL Functions

29. What is the difference between OLTP and OLAP?

31. How can we create tables in SQL?

For example, if we want to create a simple employee

1. Specify both the column names and the values to

2. If you are adding values for all the columns of the

33. How can we change a table name in SQL?

For example, if we want to change the “employee”

SQL Server is widely used for data analysis and also

SQL Server can be used to process data from various

35. What is ETL in SQL?

36. What are nested queries?

38. What is difference between SQL and PL/SQL?

39. What is the difference between SQL and MySQL?

40. What is cross join?

41. What are user-defined functions?

43. What is recursive stored procedure?

44. Explain UNION, MINUS and INTERACT commands?

45. What TCP/IP port does SQL Server run?

47. How can we select unique records from a Table?

48. List and explain each of the ACID properties that

50. What are SQL functions used for?

51. What is the need for MERGE statement?

52. List the ways in which dynamic SQL can be

54. Is semicolon used after sql? Justify why or why

55. What is candidate key?

57. What is the difference between order and group

The following is the syntax to use the ORDER BY

We can often use this clause in collaboration with

The following is the syntax to use GROUP BY clause

59. What is SQL injection? When does SQL injection

60. What is ENUM?

62. What is the difference between the CEIL, FLOOR and

63. What is the RAND() function?

65. Name three functions that specify current date and

66. Which function returns the difference between two

67. How can we fetch common records from two tables?

69. How can we select unique records from a table?

70. What is the command used to fetch the first 5

71. How to use LIKE in SQL?