SQL__1721960421
SQL__1721960421
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Compilation - Arockia Liborious
15.003 Software Tools — Data Science Afshine Amidi & Shervine Amidi
-- Conditions........................optional
WHERE some_condition(s)
-- Aggregating.......................optional LEFT JOIN
GROUP BY column_group_list
-- Sorting values....................optional
ORDER BY column_order_list
RIGHT JOIN
-- Restricting aggregated values.....optional
HAVING some_condition(s)
-- Limiting number of rows...........optional
LIMIT some_value FULL JOIN
Remark: the SELECT DISTINCT command can be used to ensure not having duplicate rows.
r Condition – A condition is of the following format: Remark: joining every row of table 1 with every row of table 2 can be done with the CROSS JOIN
command, and is commonly known as the cartesian product.
SQL
some_col some_operator some_col_or_value
Aggregations
where some_operator can be among the following common operations: r Grouping data – Aggregate metrics are computed on grouped data in the following way:
WHERE HAVING
- Filter condition applies to individual rows - Filter condition applies to aggregates
- Statement placed right after FROM - Statement placed right after GROUP BY
Remark: if WHERE and HAVING are both in the same query, WHERE will be executed first.
r Grouping sets – The GROUPING SETS command is useful when there is a need to compute
aggregations across different dimensions at a time. Below is an example of how all aggregations
across two dimensions are computed:
The SQL command is as follows:
SQL
SQL
SELECT
....col_1, some_window_function() OVER(PARTITION BY some_col ORDER BY another_col)
....col_2,
....agg_function(col_3) Remark: window functions are only allowed in the SELECT clause.
FROM table
GROUP BY ( r Row numbering – The table below summarizes the main commands that rank each row
..GROUPING SETS across specified groups, ordered by a specific column:
....(col_1),
....(col_2),
....(col_1, col_2) Command Description Example
)
ROW_NUMBER() Ties are given different ranks 1, 2, 3, 4
RANK() Ties are given same rank and skip numbers 1, 2, 2, 4
r Aggregation functions – The table below summarizes the main aggregate functions that
can be used in an aggregation query: DENSE_RANK() Ties are given same rank and don’t skip numbers 1, 2, 2, 3
r SQL tips – In order to keep the query in a clear and concise format, the following tricks are Take first non-NULL value COALESCE(col_1, col_2, ..., col_n)
often done: General Create a new column
CONCAT(col_1, ..., col_n)
Operation Command Description combining existing ones
Renaming New column names shown in Value Round value to n decimals ROUND(col, n)
SELECT operation_on_column AS col_name
columns query results Converts string column to
LOWER(col) / UPPER(col)
Abbreviation used within lower / upper case
Abbreviating
FROM table_1 t1 query for simplicity in Replace occurrences of
tables REPLACE(col, old, new)
notations old in col to new
Specify column position in String Take the substring of col,
Simplifying SUBSTR(col, start, length)
GROUP BY col_number_list SELECT clause instead of with a given start and length
group by
whole column names
Remove spaces from the
Limiting LTRIM(col) / RTRIM(col) / TRIM(col)
LIMIT n Display only n rows left / right / both sides
results
Length of the string LENGTH(col)
Truncate at a given granularity
r Sorting values – The query results can be sorted along a given set of columns using the DATE_TRUNC(time_dimension, col_date)
following command: Date (year, month, week)
Command Description
...
OVERWRITE Overwrites existing data
cte_n AS (
SELECT ... INTO Appends to existing data
)
SELECT ... r Dropping table – Tables are dropped in the following way:
FROM ...
SQL
DROP TABLE table_name;
Table manipulation
r View – Instead of using a complicated query, the latter can be saved as a view which can
r Table creation – The creation of a table is done as follows: then be used to get the data. A view is created with the following command:
SQL SQL
CREATE [table_type] TABLE [creation_type] table_name( CREATE VIEW view_name AS complicated_query;
..col_1 data_type_1,
...................,
..col_n data_type_n Remark: a view does not create any physical table and is instead seen as a shortcut.
)
[options];
r Data insertion – New data can either append or overwrite already existing data in a given
table as follows:
SQL
WITH ..............................-- optional
INSERT [insert_type] table_name....-- mandatory
SELECT ...;........................-- mandatory
Bonus:
1. Social Media Company Interview Qs (e.g. Facebook)
2. Audio Streaming Service Company Interview Qs(e.g.
Spotify)
3. e-Commerce Company Interview Qs (e.g. Amazon)
4. Entertainment Streaming Company Interview Qs
(e.g. Netflix)
5. Financial Institution Interview Qs (e.g. HSBC)
6. Online Marketplace Interview Qs(e.g. Airbnb)
7. Software Company Interview Qs (e.g. Microsoft)
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
01
1. What is Relational Database Management System
(RDMBS)?
RDBMS store data into a collection of tables, which is related
by common fields between the columns of the table. It also
provides relational operators to manipulate the data stored
into the tables.
Example: SQL Server.
3. What is a Database?
A Database is an organized form of data for easy access,
storing, retrieval and managing of data. This is also known
as structured form of data which can be accessed in many
ways.
Example: School Management Database, Bank Management
Database.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
02
5. What is a unique key?
A Unique key constraint uniquely identifies each record in a
database. This provides uniqueness for the column or set of
columns. A Primary key constraint has automatic unique
constraint defined on it. There can be many unique
constraints defined per table, but only one Primary key
constraint defined per table.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
03
8. What are table and fields?
A table is a set of data that are organized in a model with
Columns and Rows. Columns can be categorized as vertical,
and Rows are horizontal. A table has a specified number of
column called fields but can have any number of rows
which are called records.
Example:
Table: Employee.
Field: Emp ID, Emp Name, Date of Birth.
Data: 201456, David, 11/15/1960.
04
Data Control Language
DCL commands are used to grant and take back authority
from any database user.
Some commands that come under DCL:
Grant; Revoke
Transaction Control Language
TCL commands can only be used with DML commands like
INSERT, DELETE and UPDATE. These operations are
automatically committed in the database, which is why they
cannot be used while creating tables or dropping them.
Some commands that come under TCL:
COMMIT; ROLLBACK; SAVEPOINT
Data Query Language
DQL is used to fetch the data from the database.
It uses only one command:
SELECT
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
05
12. Explain the different types of normalization.
Some types are:
First Normal Form (1NF): This should remove all
the duplicate columns from the table. Creation
of tables for the related data and identification
of unique columns.
Second Normal Form (2NF): Meeting all
requirements of the first normal form. Placing the
subsets of data in separate tables and Creation
of relationships between the tables using primary
keys.
Third Normal Form (3NF): This should meet all
requirements of 2NF. Removing the columns
which are not dependent on primary key
constraints.
Fourth Normal Form (4NF): Meeting all the
requirements of third normal form and it should
not have multi- valued dependencies.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
06
14. What is join? Explain the different types.
This is a keyword used to query data from more tables
based on the relationship between the fields of the
tables. Keys play a major role when JOINs are used.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
07
15. What are the different types of indexes?
An index is a performance tuning method of allowing
faster retrieval of records from the table. An index creates
an entry for each value and makes it faster to retrieve
data.
There are three types of indexes:
Unique Index: This indexing does not allow the field to
have duplicate values if the column is unique indexed.
Unique index can be applied automatically when
primary key is defined.
Clustered Index: This type of index reorders the
physical order of the table and search based on the
key values. Each table can have only one clustered
index.
Non-Clustered Index: Non-Clustered Index does not
alter the physical order of the table and maintains
logical order of data. Each table can have 999 non-
clustered indexes.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
08
17. What is query?
A DB query is a code written in order to get the
information back from the database. Queries can be
designed in such a way that it matches with our
expectation of the result set.
Explore
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
09
19. What is a trigger?
A DB trigger is a code or programs that
automatically execute with response to some event
on a table or view in a database. Mainly, trigger
helps to maintain the integrity of the database.
Example: When a new student is added to the
student database, new records should be created in
the related tables such as the Exam, Score and
Attendance tables.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
10
22. What are constraints?
Constraint can be used to specify the limit on the
data type of table. Constraint can be specified
while creating or altering the table statement.
11
27. What are aggregate and scalar functions?
Functions are methods used to perform data operations.
SQL has many in-built functions used to perform string
concatenations, mathematical calculations etc.
SQL functions are categorized into the following two
categories: Aggregate Functions and Scalar Functions.
Aggregate SQL Functions
The Aggregate Functions in SQL perform calculations on a
group of values and then return a single value. Following
are a few of the most commonly used Aggregate
Functions:
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
12
28. What is alias in SQL?
SQL aliases are used to give a table, or a column in
a table, a temporary name. Aliases are often used
to make column names more readable. An alias
only exists for the duration of that query. An alias is
created with the AS keyword.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
13
30. What is collation? What are the various types of
collation sensitivity?
Collation is defined as a set of rules that determine how
character data can be sorted and compared.
ASCII value can be used to compare these character
data.
Case sensitivity: A and a are treated differently.
Accent sensitivity: a and á are treated differently.
Kana sensitivity: Japanese kana characters Hiragana
and Katakana are treated differently.
Width sensitivity: Same character represented in
single-byte (half-width) and double-byte (full-
width) are treated differently.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
14
32. How can we insert data in SQL?
It is possible to write the INSERT INTO statement in
two ways:
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
15
34. What is SQL server?
SQL server has stayed on top as one of the most popular
database management products ever since its first
release in 1989 by Microsoft Corporation. The product is
used across industries to store and process large
volumes of data. It was primarily built to store and
process data that is built on a relational model of data.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
16
37. What is the difference between CHAR and VARCHAR2
data types in SQL server?
When stored in a database, varchar2 uses only the
allocated space. E.g. if you have a varchar2(1999) and
put 50 bytes in the table, it will use 52 bytes.
But when stored in a database, char always uses the
maximum length and is blank-padded. E.g. if you have
char(1999) and put 50 bytes in the table, it will consume
2000 bytes.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
17
42. What is a CLAUSE?
SQL clause is defined to limit the result set by
providing condition to the query. This usually filters
some rows from the whole set of records.
Example – Query that has WHERE condition.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
18
46. Which operator is used in query for pattern
matching?
LIKE operator is used for pattern matching, and it can be
used with:
% - Matches zero or more characters.
_(Underscore) – Matching exactly one character.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
19
49. What is the main difference in the BETWEEN and
IN condition operators?
BETWEEN operator is used to display rows based on
a range of values in a row whereas the IN condition
operator is used to check for values contained in a
specific set of values.
Example of BETWEEN: SELECT * FROM Students
where ROLL_NO BETWEEN 10 AND 50;
Example of IN: SELECT * FROM students where
ROLL_NO IN (8,15,25);
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
20
53. List some case manipulation functions in SQL.
There are three case manipulation functions in SQL,
namely:
LOWER: This function returns the string in
lowercase. It takes a string as an argument and
returns it by converting it into lower case. Syntax:
LOWER(‘string’)
UPPER: This function returns the string in
uppercase. It takes a string as an argument and
returns it by converting it into uppercase. Syntax:
UPPER(‘string’)
INITCAP: This function returns the string with the
first letter in uppercase and rest of the letters in
lowercase. Syntax: INITCAP(‘string’)
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
21
56. What is the difference between JOIN and UNION?
JOIN
JOIN in SQL is used to combine data from many
tables based on a matched condition between
them. The data combined using JOIN statement
results into new columns.
UNION
UNION in SQL is used to combine the result-set of
two or more SELECT statements. The data combined
using UNION statement results into new distinct
rows.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
22
GROUP BY
The GROUP BY clause is used in SQL queries to
organize data that have the same attribute values.
Usually, we use it with the SELECT statement. It is
important to remember that we have to place the
GROUP BY clause after the WHERE clause.
Additionally, it is paced before the ORDER BY clause.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
23
58. Write an SQL query to fetch employee names having
a salary greater than or equal to 20000 and less than or
equal to 10000.
By using BETWEEN in the where clause, we can retrieve the
Employee Ids of employees with salary >= 20000 and
<=10000.
e.g.
SELECT FullName
FROM EmployeeDetails
WHERE EmpId
IN (SELECT EmpId FROM EmployeeSalary WHERE Salary
BETWEEN 0 AND 10000)
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
24
61. What is the difference between the ATAN and ATAN2
function?
ATAN() Function
ATAN() function in MySQL is used to return the arc
tangent of any number x. The arctangent of x is defined
as the inverse tangent function of x when x is real (x ℝ). ∈
ATAN2() Function
ATAN2() function in MySQL is used for returning the arc
tangent between specified two numbers, i.e., x and y. It
returns the angle between the positive x-axis and the line
from the origin to the point (y, x).
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
25
64. What is the difference between LOCALTIMESTAMP and
CURRENT_TIMESTAMP?
LOCALTIMESTAMP returns only time stamp value where as
the function CURRENT_TIMESTAMP will return time stamp
with Time Zone value.
26
68. How can we fetch alternate records from a
table?
Records can be fetched for both Odd and Even row
numbers.
To display even numbers
Select employeeId from (Select row no, employeeId
from employee) where mod(row no,2)=0
To display odd numbers
Select employeeId from (Select rowno, employeeId
from employee) where mod(row no,2)=1
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
27
72. How can we copy a table in SQL?
We can use the SELECT INTO statement to copy data
from one table to another. Either we can copy all
the data or only some specific columns.
29
79. What does this query achieve? GRANT
privilege_name ON object_name TO
{user_name|PUBLIC|role_name} [WITH GRANT
OPTION]; ?
The given syntax indicates that the user can grant
access to another user too.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
30
84. How would you select all the users whose phone
number is NULL?
SELECT user_name FROM users WHERE
ISNULL(user_phonenumber);
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
31
88. What does a BCP command do?
The Bulk Copy is a utility or a tool that
exports/imports data from a table into a file and
vice versa.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
32
92. What are object privileges?
An object-level privilege is a permission granted to a
database user account or role to perform some action on
a database object. These object privileges include
SELECT, INSERT, UPDATE, DELETE, ALTER, INDEX on tables, and
so on.
33
97. What is the difference between the
RANK() and DENSE_RANK() function?
The only difference between the RANK() and
DENSE_RANK() functions is in cases where there is a “tie”;
i.e., in cases where multiple values in a set have the same
ranking. In such cases, RANK() will assign non-
consecutive “ranks” to the values in the set (resulting in
gaps between the integer ranking values when there is a
tie), whereas DENSE_RANK() will assign consecutive
ranks to the values in the set (so there will be no gaps
between the integer ranking values in the case of a tie).
For example, consider the set {25, 25, 50, 75, 75, 100}. For
such a set, RANK() will return {1, 1, 3, 4, 4, 6} (note that
the values 2 and 5 are skipped), whereas DENSE_RANK()
will return {1,1,2,3,3,4}.
35
104. What is CTE in SQL server?
CTEs are Common Table Expressions that are used
to create temporary result tables from which data
can be retrieved/ used. The standard syntax for a
CTE with a SELECT statement is:
WITH RESULT AS
(SELECT COL1, COL2, COL3
FROM EMPLOYEE)
SELECT COL1, COL2 FROM RESULT
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
34
101. What is schema in SQL server?
Our database comprises of a lot of different entities
such as tables, stored procedures, functions,
database owners and so on. To make sense of how
all these different entities interact, we would need
the help of schema. So, you can consider schema to
be the logical relationship between all the different
entities which are present in the database.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
36
105. Suppose you have a sample table of Workers with
columns Worker_id, first_name,last_name, salary,
join_date, department. We have another table bonus with
columns worker_ref_id, bonus date, bonus_amt. We also
have another table called title and it has cols like
worker_ref_id, worker_title, affected_from.
37
106. Write a query to fetch the top N records.
The SELECT TOP clause allows you to limit the
number of rows or percentage of rows returned in a
query result set.
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
38
Social Media Company Interview Qs
(e.g. Facebook)
1. Find the new users which are defined as users that have
started using the services for the first time.
We can find this by finding the minimum date from the
'time_id' column for each user, which gives the date they
started using services.
SELECT user_id,
min(time_id) as new_user_start_date
FROM fact_events
GROUP BY user_id
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
39
3. Calculate all users (existing and new) for each month.
This will give us existing users once we subtract out the
new users.
SELECT date_part('month', time_id) AS month,
count(DISTINCT user_id) as all_users
FROM fact_events
GROUP BY month
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
40
5. Calculate user shares.
with all_users as (
SELECT date_part('month', time_id) AS month,
count(DISTINCT user_id) as all_users
FROM fact_events
GROUP BY month),
new_users as (
SELECT date_part('month', new_user_start_date) AS
month,
count(DISTINCT user_id) as new_users
FROM
(SELECT user_id,
min(time_id) as new_user_start_date
FROM fact_events
GROUP BY user_id) sq
GROUP BY month
)
SELECT
au.month,
new_users / all_users::decimal as share_new_users,
1- (new_users / all_users::decimal) as
share_existing_users
FROM all_users au
JOIN new_users nu ON nu.month = au.month
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
41
Audio Streaming Service Company Interview Qs
(e.g. Spotify)
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
42
2. Write a query to return the top 5 artists in the US and
UK yesterday.
WITH artist_ranking AS (
SELECT
A.artist_id,
MAX(A.artist_name) AS artist_name,
MAX(P.country) AS country
ROW_NUMBER() OVER(PARTITION BY country ORDER
BY SUM(plays) DESC) AS ranking
FROM daily_plays P
INNER JOIN song S
ON P.song_id = S.id
INNER JOIN artist A ON
A.artist_id = S.artist_id
WHERE P.country IN ('UK', 'US')
AND P.date = CURRENT_DATE - 1
GROUP BY A.artist_id
)
SELECT artist_id, artist_name, country, ranking
FROM artist_ranking
WHERE ranking <= 5
LIMIT 5;
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
43
e-Commerce Company Interview Qs
(e.g. Amazon)
SELECT *
FROM
(SELECT p.user_id,
COUNT (DISTINCT purchase_id) as purchase_frequency
FROM purchase_p
GROUP BY p.user_id)
PIVOT
(COUNT (user_id)
for purchase_frequency in ('1' one, '2' two, '3' three)
);
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
44
2. Assume you are given
the table alongside for the
session activity of user.
Write a query to assign ranks
to users by the total session
duration for the different
session types they have had
between a start date
(2020-01-01) and an end date (2020-02-01).
SELECT ss.*,
rank() over (partition by ss.user_id order by
ss.total_duration desc) as rank_order
FROM (select s.user_id,
s.session_type,
sum(s.duration) as total_duration
FROM sessions.s
WHERE s.start_time between '01-jan-20' and '01-feb-20'
GROUP BY s.user_id,
s.session_type)ss
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
45
Entertainment Streaming Company
Interview Qs (e.g. Netflix)
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
46
2. Using the table given,
list the top 10 users who
accumulated the most
sessions where they
had more streaming
sessions than viewing.
Return the user_id,
number of streaming
sessions, and the number of viewing sessions.
SELECT user_id,
count(CASE
WHEN session_type='streamer' THEN 1
ELSE NULL
END) AS streaming,
count(CASE
WHEN session_type='viewer' THEN 1
ELSE NULL
END) AS VIEW
FROM twitch_sessions
GROUP BY user_id
HAVING count(CASE
WHEN session_type='streamer' THEN 1
ELSE NULL
END) > count(CASE
WHEN session_type='viewer' THEN 1
ELSE NULL
END)
LIMIT 10
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
47
Financial Institution Interview Qs (e.g. HSBC)
SELECT s1.loan_id,
s1.rate_type,
sum(s1.balance) AS balance,
sum(s1.balance)::decimal/total_balance AS
balance_share
FROM submissions s1
LEFT JOIN
(SELECT rate_type,
sum(balance) AS total_balance
FROM submissions
GROUP BY rate_type) s2 ON s1.rate_type =
s2.rate_type
GROUP BY s1.loan_id,
s1.rate_type,
s2.total_balance
ORDER BY s1.rate_type, s1.loan_id
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
48
Online Marketplace Interview Qs (e.g. AirBnB)
Select city,
property_type,
avg(bathrooms) as average_bathrooms,
avg(bedrooms) as average_bedrooms
from airnb_search_details
group by city,
property_type;
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
49
2. Find the min, avg and max log price per review
qualification.
The review qualification is categorized by the number of
reviews as defined below, along with the associated price
0 reviews : NO
1 to 5 reviews : FEW
5 to 15 reviews : SOME
15 to 40 reviews : MANY
More than 40 reviews : ALOT
Select b.qualification_category,
min(b.price),
avg(b.price),
max(b.price)
from
(select a.*,
case when a.number_of_reviews = 0 then 'NO'
when a.number_of_reviews between 1 and 5 then 'FEW'
when a.number_of_reviews between 5 and 15
then 'SOME'
when a.number_of_reviews between 15 and 40
then 'MANY'
when a.number_of_reviews > 40 then 'ALOT'
else 'NA' end as qualification_category
from airbnb_search_details a) b
group by qualification_category;
zepanalytics.com
SQL | COMPREHENSIVE GUIDE TO INTERVIEWS FOR DATA SCIENCE
50
Software Company Interview Qs (e.g. Microsoft)