Data Analytics Interview Handbook Isb
Data Analytics Interview Handbook Isb
Behavioural Interview 29
Senior-level data
scientists
Given how intellectually stimulating and lucrative data science is, it should
not be surprising that competition for these top data jobs is fierce. Between
"entry-level" positions in data science weirdly expecting multiple years of
experience and entry-level jobs being relatively rare in this field saturated
with PhD holders, early-career data scientists face hurdles in even landing
interviews at many firms.
3
Worse, job seekers at all experience levels face obstacles with online
applications, likely never hearing back from most jobs they apply to.
Sometimes, this is due to undiagnosed weakness in a data scientist's resume,
causing recruiters to pass on talented candidates. But often, it's because
candidates cannot stand out from the sea of candidates an online job
application attracts. Forget about acing the data science interview, given the
number of job-hunting challenges a data scientist faces, just getting an
interview at a top firm can be considered an achievement.
4
Who Is a Data Analyst?
Data analysis is a field that extracts logical inferences and arguments from
the vast amounts of data and information continuously captured in business
and creates coherent and intelligible data for company choices and plans.
A data analyst is a translator who takes the total of incomplete and illogical
information and extracts a knowledge of the situation, goals and strategies
for the future, and remedies for the existing situation. On this basis, analysts
can identify the business's competitive environment, as well as its internal
and external interests, and provide improvements.
To further understand what a data analyst does, let's use an example. You
might work for a big business that needs to study audience behaviour to sell
its goods and figure out what methods to focus on going forward. When the
marketing department of this organisation approaches you as a data analyst,
they can explain, for instance, that they need to know which goods the
audience purchased more frequently during the quarantine period and how
they did so. Why items like A, which are barely distinguishable from B, have
gotten significantly more attention. Or that a greenhouse wants to learn
more about how a plant reacts to temperature and other things. And you
ought to discover solutions to the problems with your toolset.
5
Machine
Learning
Computer Math and
Science/IT Statistics
Data
Science
Software Traditional
Development Research
Domains/Business
Knowledge
Source : https://towardsdatascience.com/introduction-to-statistics-e9d72d818745
6
Roles and Responsibilities of a
Data Analyst
7
Data Report
confidentiality creation
Data and information are Data analysts create reports
essential resources for any that contain essential data.
firm in 2020. Therefore, These reports include graphs
one of the crucial duties and charts to illustrate
of data analysts today is business-related factors by
to protect data and analysing variables like
information security. profitability, market analysis,
internal activities, etc. They
assist in determining the path
of corporate growth.
8
How to Build a Perfect Resume for a
Data Analyst
The Sole Purpose of Your Resume Is to Land an Interview
01 No resume results in an immediate job offer; that isn't its role. Your
resume must convince its recipient to take a closer look at you. During
the interview, your data science and people skills will carry you toward
an offer. Your resume merely opens the door to the interview process.
In practice, it means keeping your resume short! One page if you have
under a decade of experience and two pages if more. Save whatever
else you want to say for your in-person interview when you'll be given
ample time to get into the weeds and impress the interviewer with
your breadth of knowledge and experience.
9
Only write down your GPA if it helps. A 3.2 at IIT might be okay to list,
but a 3.2 at a lesser-known college might not be worth listing if you
apply to, say, Google, especially if Google doesn't usually recruit from
your college. Why? Because Google, being Google, might be
expecting you to be at the top of your class with a 4.0 GPA from a
non-target university. As a result, a 3.2 might look bad.
10
Competencies Required to Become
a Data Analyst
Technical Skills - Data Analysis and Programming
50
40
Avg % of Listings
30
20
10
0
SQL
Excel
Tableau
Python
R
SAS
SQL Server
Oracle
HIve
C++
Redshift
NoSQL
Alteryx
Linux
Matlab
Scala
C#
PowerPoint
Power BI
Hedoop
Java
SPSS
C
AWS
MySQL
Taradata
Azure
Stata
Javascript
Spark
1. SQL
Upon hearing the term "data scientist," buzzwords such as predictive
analytics, big data and deep learning may leap to mind. So, let's not beat
around the bush: data wrangling isn't the most fun or sexy part of being a
data scientist. However, as a data scientist, you will likely spend a great deal of
your time working on writing SQL queries to retrieve and analyse data. As
such, almost every company you interview with will test your ability to write
SQL queries. These questions are practically guaranteed if you are
interviewing for a data scientist role on a product or analytics team or if you're
after a data science-adjacent role like data analyst or business intelligence
analyst. Sometimes, data science interviews may go beyond just writing SQL
queries and cover the basic principles of database design and other big data
systems. This focus on data architecture is particularly true in early-stage
startups, where data scientists often take an active role in data engineering
and data infrastructure development.
11
How SQL Interview Questions Are Asked?
Because most analytics workflows require quick slicing and dicing of data in
SQL, interviewers will often present you with hypothetical database tables
and a business problem and then ask you to write SQL on the spot to get to
an answer. This is an especially common early interview question
conducted via a shared coding environment or an automated remote
assessment tool.
Because the industry uses many different flavours of SQL, these questions
aren't usually testing your knowledge of database-specific syntax or
obscure commands. Instead, interviews are designed to test your ability to
translate reporting requirements into SQL.
Finally, some companies might ask you about the performance of your SQL
query. While these interview questions are rare, and companies don't
expect you to be a query optimisation expert, knowing how to structure a
database for performance and avoiding slow-running queries can be
helpful. This knowledge can also come in handy when you are asked more
conceptual questions about database design and SQL.
First off, don't jump into SQL questions without fully understanding the
problem. Before you start whiteboarding or typing out a solution, it's crucial
to repeat the problem so you can be sure you've understood it correctly.
12
Next, try to work backwards, especially if the answer needs multiple joins,
subqueries and common table expressions (CTEs). Don't overwhelm
yourself trying to figure out the multiple parts of the final query at the same
time.
Instead, imagine you had all the information you needed in a single table so
that your query was just a single SELECT statement. Working backwards
slowly from this ideal table, one SQL statement at a time, try to end up with
the tables you originally started with.
2. Coding
Every Superman has his kryptonite, but as a Data Scientist, coding can't be
yours. Between data munging, pulling in data from APIs and setting up data
processing pipelines, writing code is a near-universal part of a Data Scientist's
job. This is especially true at smaller companies, where data scientists tend to
wear multiple hats and are responsible for productionising their analyses and
models. Even if you are the rare data scientist who never has to write
production code, consider the collaborative nature of the field. Having strong
computer science fundamentals will give you a leg up when working with
software and data engineers.
Data Science interviews often take you on a stroll down memory lane back to
your Data Structures and Algorithms class (you did take one, right?) to test
your programming foundation. These coding questions test your ability to
manipulate data structures like lists, trees and graphs, along with your ability
to implement algorithmic concepts such as recursion and dynamic
programming. You're also expected to assess your solution's runtime and
space efficiency using Big O notation.
Here are some recommended online resources that you can use to practice
your coding skills before your interview:
Python Exercises, Practice Questions and Solutions - GeeksforGeeks
Python Exercises (w3schools.com)
13
Approaching Coding Questions
14
organisation principles. Write well-styled code; for example, follow PEP 8
guidelines when coding in Python. While you are allowed to cut some
corners, assuming a helper method exists, be explicit about it and offer to
fix it later.
15
Business/Product Intuition -
Metrics and Identifying Opportunities for Impact
16
Designing A/B tests:
How would you set up an A/B test to measure the success of a new
feature? What are some likely pitfalls you might run into while performing
A/B tests, and how would you deal with them?
The tips below work for approaching product questions as well as for the
occasional business question:
Be conversational:
Don't talk at the interviewer; speak with them. Engage them in
17
conversation from time to time as a means of checking in. For instance,
"I think a good metric for engagement on YouTube is the average time
spent watching videos, so I'll focus on that. How does that sound?"
If you think these are too many tips to keep in mind, simply remember this
one thing when solving product problems. Pretend you've already been hired
at the company as a data scientist. You're just having a meeting about the
problem with another co-worker. When you adopt this mindset that you're
already working for the company, behaviours like talking out loud with your
"co-worker" or keeping the company mission in mind should and will come
naturally to you.
What is Product Sense? Don't let the term faze you. This isn't an innate gift
you're born with but rather a skill that can be developed over time. Because
data scientists help product managers (PMs) quantitatively, understand the
business and look for opportunities for product improvement within the
data, they play a crucial role during the product roadmap creation process. As
such, questions asking you to brainstorm new products and features are very
common during product-focused data science interviews. The best way to
improve your performance on this type of problem is to improve your general
product sense.
18
By following the tips in this section to enhance your overall product
sensibilities, you won't freeze up like a deer in the headlights in your next
interview with Google when you're asked to brainstorm features to help
students better use Google Hangouts. Instead, you'll tackle the problem with
the confidence of Sundar Pichai after yet another Alphabet quarterly
earnings beat.
An easy way to develop your product sense is by analysing the products you
naturally encounter in your daily life. When using a product, think about:
What are the product’s end-user benefits (this is bigger than simply what
problem it solves!)?
How do the visual design and marketing copy help convey the product’s
purpose and benefits?
How does the product tie in with the company’s mission and vision?
19
Statistics
Hypothesis Testing
The process of testing whether a sample of data supports a particular
hypothesis is called hypothesis testing. Generally, hypotheses concern
properties of interest for a given population, such as its parameters, like u (for
example, the mean conversion rate among a set of users). The steps in testing
a hypothesis are as follows:
20
Use a particular test statistic of the null hypothesis to calculate the
02 corresponding p-value.
Since the null hypothesis typically represents a baseline (e.g., the marketing
campaign did not increase conversion rates, etc.), the goal is to reject the null
hypothesis with statistical significance and hope that there is a significant
outcome.
For example, you are working for Uber Eats, which wants to determine
whether email campaigns will increase its product's conversion rates. To
conduct an appropriate hypothesis test, you would need two roughly equal
groups (equal with respect to dimensions like age, gender, location, etc.). One
group would receive the email campaigns, and the other group would not be
exposed. The null hypothesis, in this case, would be that the two groups
exhibit equal conversion rates, and the hope is that the null hypothesis would
be rejected.
Test Statistics
For example, the number of heads in a series of coin flips may be distributed
as a binomial distribution. But with a large enough sample size, the sampling
distribution should be approximately normally distributed. Hence, the
sampling distribution for the total number of heads in a large series of coin
21
flips would be normally distributed.
Both p-values and confidence intervals are commonly covered topics during
interviews. A p-value is a probability of observing the value of the calculated
test statistic under the null hypothesis assumptions. Usually, the p-value is
assessed relative to some predetermined level of significance (0.05 is often
chosen).
There are two errors that are frequently assessed: type I error, which is also
known as a "false positive" and type II error, which is also known as a "false
negative." Specifically, a type I error is when one rejects the null hypothesis
when it is correct, and a type II error is when the null hypothesis is not
rejected when it is incorrect.
22
see a statistically significant outcome at least once. So, for example, if you set
a = 0.05 and run 100 hypothesis tests, then by pure chance you would expect
5 of the tests to be statistically significant. However, a more desirable
outcome is to have the overall a of the 100 tests be 0.05. This can be done by
setting the new a to a/n, where n is the number of hypothesis tests (in this
case, a/n = 0.05/100= 0.0005). This is known as Bonferroni correction, and
using it helps make sure that the overall rate of false positives is controlled
within a multiple testing framework.
Communication Skills
Communication skills are a subjective topic and are highly dependent on the
perception of the interviewer, the language the interview is conducted in and
many other factors. So here is a list of don'ts in a data science/analyst
interview in terms of communication. These are also generic to any job
interview.
Poor Communication: You are more likely to succeed in landing the job if you
can speak with the interviewer openly and have a fruitful conversation. The
key to success in the field of data science is communication. You will need to
deliver your data analysis to your team; therefore, you'll need to be able to
communicate effectively. The interview will be directed by you; therefore, it
goes without saying that you will speak more than the interviewer.
Not asking Questions: "The one who does not ask any questions remains a
fool forever," is a proverb. "The one who asks a question might seem a fool for
five minutes." A skilled interviewer will always give you the chance to ask
questions at the end of a data science interview. Do not think twice! Now is
the moment to inquire as to whether you are a good fit for this new position.
This enables the interviewer to gauge your commitment to the job.
23
Not giving an explanation to your answers: Don't respond with a simple yes
or no to inquiries that call for your explanation. The interviewer will think
more highly of you for the position if you can clearly explain your response. Try
to justify your response; be specific and tie in hints (the interviewer might
help you with this). Bring one of your favourite projects to the data science
interview and be sure to know every little nuance of it.
Using incorrect phrases: The following five phrases should never be used
during a data science interview:
◆ I don't really have any weaknesses.
◆ I have no queries currently.
◆ I believe
◆ I think I am incapable of handling this task
◆ I shall try
24
Tech Stack or Tools Used for
Data Analyst
One of the best rising fields with several career prospects is data science. SQL
is one of the most crucial skills to master to become a great data scientist.
Data about businesses are kept in databases, and with the aid of a
database management system, these data are processed and managed.
The current Big Data platforms employ SQL as a primary API for relational
databases and are based on SQL.
SQL is used by big data platforms like Hadoop and Spark to manage
relational database systems and handle structured data.
25
1. Python
Python is beginner-friendly
Although not always programmers, data scientists should be tech-savvy. In
the middle of their careers, people from finance, marketing, HR and
academics frequently transition to data science and pick up new skills. In
data science, tools that are simpler to master are more likely to succeed.
For those without any prior IT experience, Python is the ideal choice due to its
simplicity and ease of use. For experts with varied backgrounds, it is quite
accessible. It might just take a few weeks to learn how to use Python to
process data and create straightforward models.
26
Python excels at data visualisation
Data visualisation produces a lot of data insights. Python's default library for
data visualisation is called matplotlib. Regarding the variety of alternatives
provided and the flexibility of the plots, it does so. However, using this library
to create anything sophisticated can take some time. Fortunately, additional,
more user-friendly data visualisation tools are based on matplotlib. Check out
the seaborn, Plotly and Bokeh libraries if you want to construct complex plots
using Python.
2. R
Data Science has become vital to studying data and drawing conclusions
from it. Raw data is transformed into supplied data products by industries.
The raw data must be processed using a number of crucial technologies in
order to do this. One of the programming languages that offers a powerful
environment for information analysis, processing, transformation and
visualisation is R. R, a GNU project, is a language and environment for
statistical computing and graphics.
For many statisticians who wish to get involved in creating statistical models
to address challenging problems, it is their first choice. There are several
packages in R that are useful for all types of sciences, including astronomy,
biology, etc. R was initially used for academic purposes, but it is currently also
used in industries.
R Shiny, which is used to embed visualisations in web pages and offers a high
degree of interaction to the users, also enables users to create web
applications.
27
to achieve this. Additionally, R gives you access to a number of complex data
analytics capabilities, including the creation of prediction models and
machine learning algorithms. R also provides several packages for image
processing.
3. Tableau
Tableau is a leading data visualisation tool for data analysis and business
intelligence.
28
Behavioural Interview
You may not agree that behavioural questions are important. You might
think that the behavioural interview is all fluff and that you can simply wing
it. But companies take this very seriously. Amazon, for example, interviews
every single candidate on the company leadership principles like "customer
obsession" and "invent and simplify" in their bar raiser interviews. Uber also
includes a bar-raiser round that focuses on behavioural questions about cul-
ture fit, previous work experience and past projects. If you want to work
there, you must take behavioural interviews seriously. So here are some
pointers to keep in mind for your behavioural interview.
At the end of the on-site interview, they know you can do the work, but are
you someone they want to personally work with? You'll meet with your
future boss, and maybe even their boss, where they'll both try to sell you on
the company but also see if you'd be a good culture fit. The reality is you are
constantly being assessed! That's why, based on frequency alone, preparing
for and practising answers to these questions is well worth the effort.
29
Popular Interview Questions for
a Data Analyst’s Position
Here are some popular interview questions asked in some well-known
companies.
*Possible answers have been provided for the first few questions for the reader’s benefit
On SQL
A database view is the result of a particular query within a set of tables. Unlike
a normal table, a view does not have a physical schema. Instead, a view is
computed dynamically whenever it is requested. In the underlying tables if
the view’s references are changed, then the view itself will change
accordingly. Views have several advantages over tables:
2. Since views can represent only a subset of the data, they provide limited
exposure of the table's underlying data and hence increase data security.
3. Since views do not store actual data, there is significantly less memory
overhead.
2. Expedia: Say you have a database system where most of the queries
made were UPDATES/INSERTS/DELETES. How would this affect your
decision to create indices? What if the queries made were mostly
SELECTS and JOINS instead?
Possible Answer:
SQL statements that modify the database, like UPDATE, INSERT and DELETE,
30
need to change not only the rows of the table but also the underlying
indexes. Therefore, the performance of those statements depends on the
number of indexes that need to be updated. The larger the number of
indexes, the longer it takes those statements to execute. On the flip side,
indexing can dramatically speed up row retrieval since no underlying indexes
need to be modified. This is important for statements performing full table
scans, like SELECTS and JOINS.
31
6. Amazon: Name one major similarity and difference between a WHERE
clause and a HAVING clause in SQL.
7. KPMG: Describe what a foreign key is and how it relates to a primary key
On R
9. Explain data import in R language
Answer:
10. How missing values and impossible values are represented in R language?
Answer:
11. Two vectors X and Y are defined as follows – X <- c(3, 2, 4) and Y <- c
(1, 2). What will be the output of vector Z that is defined as Z <- X*Y
Answer: c(3,4,4)
13. What will be the output of log (-5.8) when executed on R console?
32
14. What is the difference between data frame and a matrix in R?
On Python
15. What is the use of the split function in Python?
Answer:
Whenever there is a need to break bigger strings or a line into several small
strings, you need to use the split() function in Python.
The split() function still works if the separator is not specified by considering
white spaces as the separator to separate the given string or given line.
33
Code:
import numpy as np
elon_list = [11, 21, 19, 18, 29]
elon_array = np.array(elon_list)
print(elon_array)
print(type(elon_array))
18. What are the advantages of NumPy arrays over Python lists?
20. How do you select both rows and columns from the data frame?
On Product Sense
21. Facebook: Imagine the social graphs for both Facebook and Twitter.
How do they differ? What metrics would you use to measure how these
social graphs differ?
Answer:
A bad answer glazes past the nuances of the social graph and jumps straight
into defining metrics. For clarity, not just for you, dear reader, but also for the
hypothetical interviewer asking this question, we'll structure our answer into
three steps.
Before explaining how the social graphs of Facebook and Twitter differ, it's
crucial to first consider how each platform's average user interacts with their
respective platform. Facebook is mostly about friendships, and so two users
on the same social graph are mutual friends. Twitter, on the other hand, is
more focused on followership, where one user typically follows another (who
is usually an influential figure) without getting a follow back. Thus, Twitter
likely has a small number of people with a very large followership, whereas on
Facebook, that pattern appears less often.
34
Step 2: Describing the Social Graph and Its Differences Between Facebook
and Twitter
Modelled as a graph, let's say each user is represented as a node, and the
edge linking two nodes denotes a relationship (typically, friendship on
Facebook and fellowship on Twitter) between the two users whose nodes the
edge connects. Most nodes on Twitter would have low degrees, but a small
number of nodes (those of influential people) would have very high degrees,
resulting in a "hub-and-spoke" social graph for that platform.
One way to quantify the difference between the two platforms' social graphs
is by looking at the distributions of friendships/followership represented by
the social graphs of each platform's typical users. Because a typical node's
degrees—that is, the number of connections it has to other nodes—should
capture the difference in these platforms' social graphs, one concrete metric
would thus be the average degree among all nodes on each platform.
Alternatively, to obtain an even more detailed understanding of the
differences between the two social networks, we could construct
box-and-whisker plots of the platforms' degrees among all their respective
nodes.
Yet another way of looking at the two graphs would be to check the
distribution of degrees across all nodes in each platform's network. Likely, the
Twitter distribution would show a greater amount of right skewness than
that of Facebook. Metrics that quantify a distribution's skewness or kurtosis
could thus be used to describe the difference between the two platforms'
degree distributions and, hence, social graphs.
22. Uber: Why does surge pricing exist? What metrics would you track to
ensure that surge pricing was working effectively?
Answer:
For any metrics definition question, it's important to first explain the
business goal of the product or feature and then explain the related
35
stakeholders before ultimately landing on good metrics to measure the
success of that feature.
You don't have to be an econ major to realise that surge pricing is about
fixing imbalances between supply and demand. In the case of Uber, such an
imbalance could result from either a lack of drivers or an excess number of
potential riders. Therefore, surge pricing's goal would be to increase supply
by enticing more drivers to use the app through increased pay and reduce
demand by raising prices for riders.
Surge-specific metrics are the duration of the surge, the surge pricing
multiplier, and the number of riders and drivers in the affected area. We
should also track the following metrics during surge periods: the number of
rides taken, number of rides canceled, total revenue made for both Uber and
Uber's drivers, total profit made by both Uber and Uber's drivers and the
average ride wait time. These are all standard metrics but critical to monitor
to ensure the business is healthy during surge periods.
In addition, topline metrics like user's lifetime value (LTV), driver retention,
rides taken, active daily riders, and drivers should also be tracked so that we
can be sure surge pricing isn't having adverse impacts on the business
overall.
36
pricing can annoy users, for whom frequently fluctuating sky-high prices can
be a source of frustration. And then there's the potential for mistakes or users
in a less-than-sober state accidentally making a purchase. It can even be a PR
risk. Like clockwork, every New Year's day, there's a news story about
someone getting drunk and taking an $800 Uber by accident. Between bad
PR, frustrated users and potentially increased support tickets, some metrics
like NPS that ensure that this is a quality program, is key.
23. Airbnb: What factors might make A/B testing metrics on the Airbnb
platform difficult?
24. Google: We currently pay the Mozilla foundation nine figures per year
for Google to be the default search engine on Firefox. The deal is being
renegotiated, and Mozilla is now asking for twice the money. Should we
take the deal? How would you estimate the upper bound on what Google
should be willing to pay?
25. LinkedIn: Assume you were working on LinkedIn's Feed. What metrics
would you use to track engagement? What product ideas do you have to
improve these engagement metrics?
26. Lyft: Your team is trying to figure out whether a new rider app with
extra UI features would increase the number of rides taken. For an A/B
test, how would you split users and ensure that your tests have balanced
groups?
27. Amazon: If you were to plot the average revenue per seller on the
Amazon marketplace, what would the shape of the distribution look like?
Reference Links: W3Schools Online Web Tutorials, Kaggle: Your Machine Learning
and Data Science Community, Tableau Practice Test - Practice Test Geeks
37
Dos and Don’ts
During an Interview
DOs
Practice Business Questions:
38
DONTs
The fear of being rejected is one possible explanation for seeming to know
the solution. However, the contrary is preferable, where you either attempt
the answer based on what you know or state that you have no idea. Although
it may come as a shock to some, it is better to admit ignorance than lie, fake,
or evade the question. It's okay to not know everything, and some
interviewers might only be checking your knowledge to see what you need to
be taught in the future rather than testing you. It is also fantastic to know that
you were completely honest and direct in your interview so that when the
time comes, you really do land the role.
Go on Long Tangents:
The interviewer will also evaluate your level of expertise, understanding, and
correctness. How do they achieve this? They pose a question to you, and you
respond in a way that demonstrates your understanding. Answering the
question directly will demonstrate that you have done a thorough research
and that you are informed about the subject you are discussing. However, if
you start going into detail about things that really don't need to be explained,
it will appear that you don't know much about the subject, and you'll also
annoy the interviewer, which is not a good sign. So, always try to provide a
direct response. Crisp and simple explanations are preferred, and detailed
responses should only be given in response to specific inquiries.
In conclusion:
39
Copyright © 2022 Emeritus | All rights rеѕеrvеd.
No раrt оf this book may bе reproduced in any fоrm оr by аnу еlесtrоniс
оr mесhаniсаl means inсluding infоrmаtiоn storage аnd rеtriеvаl ѕуѕtеmѕ,
withоut реrmiѕѕiоn in writing from thе author.