0% found this document useful (0 votes)
593 views40 pages

Data Analytics Interview Handbook Isb

The document provides information about data analysis and the role of a data analyst. It discusses that data analysts are well-paid professionals who solve important problems through analyzing large amounts of data. A data analyst extracts logical insights from data to help companies understand situations and make strategic decisions. The document then outlines key roles and responsibilities of a data analyst, including data mining, quality control, data preparation, collaboration, ensuring data confidentiality, and creating reports. It also provides tips for building an effective resume as a data analyst, such as keeping it concise to land an interview.

Uploaded by

Sarang Potdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
593 views40 pages

Data Analytics Interview Handbook Isb

The document provides information about data analysis and the role of a data analyst. It discusses that data analysts are well-paid professionals who solve important problems through analyzing large amounts of data. A data analyst extracts logical insights from data to help companies understand situations and make strategic decisions. The document then outlines key roles and responsibilities of a data analyst, including data mining, quality control, data preparation, collaboration, ensuring data confidentiality, and creating reports. It also provides tips for building an effective resume as a data analyst, such as keeping it concise to land an interview.

Uploaded by

Sarang Potdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Contents

Introduction - Market Scenario of


03
Data Analysis or Data Science

Who Is a Data Analyst? 05

Roles and Responsibilities of a


07
Data Analyst

How to Build a Perfect Resume for a 09


Data Analyst

Competencies Required to Become


11
a Data Analyst

Tech Stack or Tools Used for


25
Data Analysis

Behavioural Interview 29

Popular Interview Questions for a


30
Data Analyst’s Position

Dos and Don’ts During an Interview 38


Introduction - Market Scenario of
Data Analysis or Data Science
Data scientists are not only privileged to solve some of the most intellectually
stimulating and impactful problems in the world but they are also getting
paid very well for it. At Google, the median total compensation reported for a
Senior Data Scientist is a whopping 25L per year.
Ref: Google Salaries in India | Glassdoor

Hierarchy for DS growth (Google):


Principal-level data scientists

Senior-level data
scientists

Mid-level, master executioners

Junior-level, entry executioners

Interns: Company gauges your fit for the role

Given how intellectually stimulating and lucrative data science is, it should
not be surprising that competition for these top data jobs is fierce. Between
"entry-level" positions in data science weirdly expecting multiple years of
experience and entry-level jobs being relatively rare in this field saturated
with PhD holders, early-career data scientists face hurdles in even landing
interviews at many firms.

3
Worse, job seekers at all experience levels face obstacles with online
applications, likely never hearing back from most jobs they apply to.
Sometimes, this is due to undiagnosed weakness in a data scientist's resume,
causing recruiters to pass on talented candidates. But often, it's because
candidates cannot stand out from the sea of candidates an online job
application attracts. Forget about acing the data science interview, given the
number of job-hunting challenges a data scientist faces, just getting an
interview at a top firm can be considered an achievement.

Then there's the question of passing the rigorous technical interviews. To


minimise false positives (aka "bad hires"), top companies run everyone from
interns to industry veterans through tough technical challenges to filter out
weak candidates. These interviews cover a lot of topics because the data
science role is itself so nebulous and varied - what one company calls a data
scientist, another company might call a data analyst, data engineer or
machine learning engineer. Only after passing these onerous technical
interviews-often three or four on the same day-can you land your dream job
in data science.

4
Who Is a Data Analyst?
Data analysis is a field that extracts logical inferences and arguments from
the vast amounts of data and information continuously captured in business
and creates coherent and intelligible data for company choices and plans.

A data analyst is a translator who takes the total of incomplete and illogical
information and extracts a knowledge of the situation, goals and strategies
for the future, and remedies for the existing situation. On this basis, analysts
can identify the business's competitive environment, as well as its internal
and external interests, and provide improvements.

To further understand what a data analyst does, let's use an example. You
might work for a big business that needs to study audience behaviour to sell
its goods and figure out what methods to focus on going forward. When the
marketing department of this organisation approaches you as a data analyst,
they can explain, for instance, that they need to know which goods the
audience purchased more frequently during the quarantine period and how
they did so. Why items like A, which are barely distinguishable from B, have
gotten significantly more attention. Or that a greenhouse wants to learn
more about how a plant reacts to temperature and other things. And you
ought to discover solutions to the problems with your toolset.

A data analyst must be skilled in three aspects:

Math Business Tech


Statistics and Finding a business Knowing your
probability. It forms problem and tools well like
the base of all the breaking it into SQL/R/
models you’re smaller chunks Python/Tableau
going to make

5
Machine
Learning
Computer Math and
Science/IT Statistics

Data
Science

Software Traditional
Development Research

Domains/Business
Knowledge

Source : https://towardsdatascience.com/introduction-to-statistics-e9d72d818745

6
Roles and Responsibilities of a
Data Analyst

Data mining Quality control


Data analysts gather Most businesses rely on
information from a variety their data to carry out
of primary or secondary their daily operations.
sources. They then arrange Therefore, obtaining
the data in a suitable high-quality data is
format that is simple to essential for increasing an
understand. Data analysts organisation's efficiency.
assist in the design and Data analysts ensure that
upkeep of database the information gathered
systems. This way, a from various sources is
database can be created, pertinent to the
updated, read and deleted. company's operations.

Data preparation Collaboration


Data gathered from numerous Data analysts work
sources will inevitably have with other software
errors, duplications, missing development teams to
values and many other issues; prepare data for data
as a result, the data is scientists, ML engineers
unprocessed. After the data has and other groups. They
been extracted, data analysts use the information to
must transform the create automated software
unstructured data into that is ML-based. Data
structured data by fixing analysts work with
mistakes, deleting unnecessary development teams to
data and discovering potential deliver pertinent data
data. To prepare the data for information.
modification and display by
Data Science, they use a variety
of data cleaning procedures.

7
Data Report
confidentiality creation
Data and information are Data analysts create reports
essential resources for any that contain essential data.
firm in 2020. Therefore, These reports include graphs
one of the crucial duties and charts to illustrate
of data analysts today is business-related factors by
to protect data and analysing variables like
information security. profitability, market analysis,
internal activities, etc. They
assist in determining the path
of corporate growth.

Data analysts assist in troubleshooting information, reports and database problems.

8
How to Build a Perfect Resume for a
Data Analyst
The Sole Purpose of Your Resume Is to Land an Interview
01 No resume results in an immediate job offer; that isn't its role. Your
resume must convince its recipient to take a closer look at you. During
the interview, your data science and people skills will carry you toward
an offer. Your resume merely opens the door to the interview process.
In practice, it means keeping your resume short! One page if you have
under a decade of experience and two pages if more. Save whatever
else you want to say for your in-person interview when you'll be given
ample time to get into the weeds and impress the interviewer with
your breadth of knowledge and experience.

Build Your Resume to Impress the Recruiter


02 The person you most need to impress with your resume is a
nontechnical recruiter. The senior data scientist or hiring manager at
the company you want to work for is NOT your target audience. They
will have a chance to review your experience in-depth during your
on-site interview. As such, spell out technical acronyms if they aren't
obvious. Give a little background; don't just assume the recruiter will
understand what you did and why it's impressive. For example, a
research project called "Continuous Deep Q-Learning with
Model-Based Acceleration" doesn't make sense to most people. But a
"Bird Bot Developed Using Machine Learning" is more memorable
and intriguing to the average nontechnical recruiter.

Only Include Things That Make You Look Good


03 Your resume should make you shine, so don't be humble or play it cool.
If you deserve credit for something, put it on your resume. But never
lie or overstate the truth. It's easy for a recruiter or other company
employee to chat with you about projects and quickly determine if you
did it all or if it was a three-person group project you're passing off as
your own. And technical interviewers love to ask probing questions
about projects, so don't overstate your contributions.
Another way to look good is to not volunteer information that may not
work in your favour. Sounds obvious enough, but most people think
they can get away with it. For example, you don't have to list your GPA!

9
Only write down your GPA if it helps. A 3.2 at IIT might be okay to list,
but a 3.2 at a lesser-known college might not be worth listing if you
apply to, say, Google, especially if Google doesn't usually recruit from
your college. Why? Because Google, being Google, might be
expecting you to be at the top of your class with a 4.0 GPA from a
non-target university. As a result, a 3.2 might look bad.

You Probably Don't Need a Skills or Technologies Section


04 Another section of the resume packed with neutral details is the skills
and technologies section. Traditional resume advice promotes
packing in loads of keywords. You can eliminate this section entirely or
shorten it to two lines max (if you do choose to include it). There are
several reasons why we advocate shortening or removing the skills
and technologies section.

First off, we need to address why traditional resume advice advocates


for including this section. The reason is the applicant tracking system
(ATS) that has an algorithm that flags a recruiter on receiving an
application relevant to the job description with the help of keywords
listed in the resume. Also, anything you list on your resume is fair game
for the interviewer to question you about. Filling this section with tools
you aren't familiar with to please the ATS algorithm can easily backfire
during the interview.

Mistakes College Students and New Grads Often Make


05 Listing too many volunteer experiences and club involvements from
high school and college is a frequent mistake we see college students
and new grads make on their resumes. Building a resume isn’t like
college applications, where listing your involvement in university
soccer or a chess club membership means something. It's great that
you are a civically engaged, respectable human involved with your
community, but competitive tech companies and Wall Street firms are
selecting you for your ability to turn data into insight, and not much
other than that.
Attending ACM meetings or going to the data science club is
practically worthless as far as resume material goes. Unless your
involvement was significant, don't list it.

10
Competencies Required to Become
a Data Analyst
Technical Skills - Data Analysis and Programming

50

40
Avg % of Listings

30

20

10

0
SQL
Excel
Tableau
Python
R
SAS

SQL Server
Oracle

HIve
C++

Redshift
NoSQL
Alteryx
Linux
Matlab
Scala
C#
PowerPoint

Power BI
Hedoop
Java
SPSS
C
AWS

MySQL
Taradata
Azure

Stata
Javascript
Spark

1. SQL
Upon hearing the term "data scientist," buzzwords such as predictive
analytics, big data and deep learning may leap to mind. So, let's not beat
around the bush: data wrangling isn't the most fun or sexy part of being a
data scientist. However, as a data scientist, you will likely spend a great deal of
your time working on writing SQL queries to retrieve and analyse data. As
such, almost every company you interview with will test your ability to write
SQL queries. These questions are practically guaranteed if you are
interviewing for a data scientist role on a product or analytics team or if you're
after a data science-adjacent role like data analyst or business intelligence
analyst. Sometimes, data science interviews may go beyond just writing SQL
queries and cover the basic principles of database design and other big data
systems. This focus on data architecture is particularly true in early-stage
startups, where data scientists often take an active role in data engineering
and data infrastructure development.

11
How SQL Interview Questions Are Asked?

Because most analytics workflows require quick slicing and dicing of data in
SQL, interviewers will often present you with hypothetical database tables
and a business problem and then ask you to write SQL on the spot to get to
an answer. This is an especially common early interview question
conducted via a shared coding environment or an automated remote
assessment tool.

Because the industry uses many different flavours of SQL, these questions
aren't usually testing your knowledge of database-specific syntax or
obscure commands. Instead, interviews are designed to test your ability to
translate reporting requirements into SQL.

For example, at a company like Facebook, you might be given a table on


user analytics and asked to calculate the month-to-month retention. Here,
it is relatively straightforward what the query should be, and you're
expected to write it. Some companies might make their SQL interview
problems more open-ended. For example, Amazon might give you tables
about products and purchases and then ask you to list the most popular
products in each category. Robinhood may give you a table and ask why
users are churning. Here, the tricky part might not be just writing the SQL
query but also figuring out collaboratively with the interviewer what
"popular products" or "user churn" means in the first place.

Finally, some companies might ask you about the performance of your SQL
query. While these interview questions are rare, and companies don't
expect you to be a query optimisation expert, knowing how to structure a
database for performance and avoiding slow-running queries can be
helpful. This knowledge can also come in handy when you are asked more
conceptual questions about database design and SQL.

Tips for Solving SQL Interview Questions:

First off, don't jump into SQL questions without fully understanding the
problem. Before you start whiteboarding or typing out a solution, it's crucial
to repeat the problem so you can be sure you've understood it correctly.

12
Next, try to work backwards, especially if the answer needs multiple joins,
subqueries and common table expressions (CTEs). Don't overwhelm
yourself trying to figure out the multiple parts of the final query at the same
time.

Instead, imagine you had all the information you needed in a single table so
that your query was just a single SELECT statement. Working backwards
slowly from this ideal table, one SQL statement at a time, try to end up with
the tables you originally started with.

Here are some sample questions asked during interviews in some


well-known companies

2. Coding

Every Superman has his kryptonite, but as a Data Scientist, coding can't be
yours. Between data munging, pulling in data from APIs and setting up data
processing pipelines, writing code is a near-universal part of a Data Scientist's
job. This is especially true at smaller companies, where data scientists tend to
wear multiple hats and are responsible for productionising their analyses and
models. Even if you are the rare data scientist who never has to write
production code, consider the collaborative nature of the field. Having strong
computer science fundamentals will give you a leg up when working with
software and data engineers.

Data Science interviews often take you on a stroll down memory lane back to
your Data Structures and Algorithms class (you did take one, right?) to test
your programming foundation. These coding questions test your ability to
manipulate data structures like lists, trees and graphs, along with your ability
to implement algorithmic concepts such as recursion and dynamic
programming. You're also expected to assess your solution's runtime and
space efficiency using Big O notation.

Here are some recommended online resources that you can use to practice
your coding skills before your interview:
Python Exercises, Practice Questions and Solutions - GeeksforGeeks
Python Exercises (w3schools.com)

13
Approaching Coding Questions

Coding interviews typically last 30 to 45 minutes and come in a variety of


formats. Early in the interview process, coding interviews are often conducted
via remote coding assessment tools like HackerRank, Codility, or CoderPad.
Sometimes in onsite interviews, it's typical to write code on a whiteboard.
Regardless of the format, the approach outlined below to solve coding
interview problems applies.

After receiving the problem:


Don't jump right into coding. It's crucial first to make sure you are solving
the correct problem. Due to language barriers, misplaced assumptions
and subtle nuances that are easy to miss, misunderstanding the problem
is a frequent occurrence. To prevent this, make sure to repeat the question
back to the interviewer so that the two of you are on the same page. Clarify
any assumptions made, like the input format and range, and be sure to ask
if the input can be assumed to be non-null or well formed. As a final test to
see if you've understood the problem, work through an example input and
see if you get the expected output. Only after you've done these steps are
you ready to begin solving the problem.

When brainstorming is a solution:


First, explain at a high level how you could tackle the question. This usually
means discussing the brute-force solution. Then, try to gain an intuition for
why this brute-force solution might be inefficient and how you could
improve upon it. If you're able to land on a more optimal approach,
articulate how and why this new solution is better than the first brute force
solution provided. Only after you've settled on a solution is it time to begin
coding.

When coding is the solution:


Explain what you are coding. Don't just sit there typing away, leaving your
interviewer in the dark. Because coding interviews often let you pick the
language you write code in, you're expected to be proficient in the
programming language you choose. As such, avoid pseudocode in favour
of proper compile-able code. While there is time pressure, don't take many
shortcuts when coding. Use clear variable names and follow good code

14
organisation principles. Write well-styled code; for example, follow PEP 8
guidelines when coding in Python. While you are allowed to cut some
corners, assuming a helper method exists, be explicit about it and offer to
fix it later.

After you're done coding:


Make sure there are no mistakes or edge cases you didn't handle. Then
write and execute test cases to prove you solved the problem. At this point,
the interviewer should dictate which direction the interview heads. They
may ask about the time and space complexity of your code. Sometimes
they may ask you to refactor and clean the code, especially if you cut some
corners while coding the solution. They may also extend the problem,
often with a new constraint. For example, they may ask you not to use
recursion and instead tell you to solve the problem recursively. Or they
might ask you not to use surplus memory and instead solve the problem
in place. Sometimes, they may pose a tougher variant of the problem as a
follow-up, which might require starting the problem-solving process all
over again.

Here are some sample questions asked during interviews in some


well-known companies.

15
Business/Product Intuition -
Metrics and Identifying Opportunities for Impact

Overview: A magikarp, a one-legged man in a kicking contest and an ejector


seat in a helicopter. These three are examples of things more valuable than a
data scientist with a weak product sense and business acumen. Because
data scientists often work cross-functionally with product managers (PMs)
and business stakeholders to help create product roadmaps and understand
the root cause of various business problems, they are expected to have a
strong product and business intuition. It's not just data scientists who can
expect product-sense interview questions. These topics are also frequently
covered during product analyst, data analyst, and business intelligence
analyst interviews. Between questions on the art of selecting product
metrics, troubleshooting A/B test results and weighing business trade-offs,
the scope of product interview questions is massive.

Before we dive into specific product management topics, it is important for


you to first get a glimpse at the four most common types of product-focused
data science interview questions:

Defining a product metric:


What metrics would you define to measure the success of a new product
launch? If a product manager (PM) thought it was a good idea to change
an existing feature, what metrics would you analyse to validate their
hypothesis?

Diagnosing a metric change:


How would you investigate the root cause behind a metric going up or
down? What if other counter metrics changed at the same time? How
would you handle the metric trade-offs?

Brainstorming product features:


At a high level, should a company launch a particular new product? Why or
why not? For an existing product, what feature ideas do you have to
improve a certain metric?

16
Designing A/B tests:
How would you set up an A/B test to measure the success of a new
feature? What are some likely pitfalls you might run into while performing
A/B tests, and how would you deal with them?

By keeping these frequently asked types of questions top of your mind, we


hope you will concretely apply the following high-level advice to ace product
questions.

Framework for Approaching Product Interview Questions

The tips below work for approaching product questions as well as for the
occasional business question:

Ask clarifying questions:


Make sure you understand the user flow for a product, who the end users
are for the product, who the other stakeholders are that are involved with
this problem and what product and business goals we aim to achieve by
solving the problem. Even if you have researched the company and
product and know many details, frame your knowledge as a question so
you don't inadvertently head down the wrong path. For example: "I know
Robinhood's mission is to democratise finance for all. It seems this crypto
wallet feature is meant to democratise access to cryptocurrencies, which
can also help us better compete with Coinbase. Am I on the right track?"

Establish problem boundaries:


These are big problems. Scope them down. Establish with your interviewer
what you're purposely choosing to ignore to solve the problem within the
time frame of the interview.

Talk Out Loud:


You may have heard of this one many times. These interviews are held to
evaluate your thought process. And until Elon Musk invents mind reading
at Neuralink, you need to voice out your thoughts!

Be conversational:
Don't talk at the interviewer; speak with them. Engage them in

17
conversation from time to time as a means of checking in. For instance,
"I think a good metric for engagement on YouTube is the average time
spent watching videos, so I'll focus on that. How does that sound?"

Keep goals forefront:


It's easy to get lost in technical details. Never forget your answer stems
from the company's mission and vision, which you hopefully articulated at
the start of your conversation!

Bring in outside experience tactfully:


Because these problems are rooted in the real world, it is okay to flex your
past domain experience. Just don't go overboard and come across as
arrogant or cargo cult-y by saying, "This is the only way a problem should
be solved, because that's how we solved it at Google."

If you think these are too many tips to keep in mind, simply remember this
one thing when solving product problems. Pretend you've already been hired
at the company as a data scientist. You're just having a meeting about the
problem with another co-worker. When you adopt this mindset that you're
already working for the company, behaviours like talking out loud with your
"co-worker" or keeping the company mission in mind should and will come
naturally to you.

How to Develop Your Product Sense

What is Product Sense? Don't let the term faze you. This isn't an innate gift
you're born with but rather a skill that can be developed over time. Because
data scientists help product managers (PMs) quantitatively, understand the
business and look for opportunities for product improvement within the
data, they play a crucial role during the product roadmap creation process. As
such, questions asking you to brainstorm new products and features are very
common during product-focused data science interviews. The best way to
improve your performance on this type of problem is to improve your general
product sense.

18
By following the tips in this section to enhance your overall product
sensibilities, you won't freeze up like a deer in the headlights in your next
interview with Google when you're asked to brainstorm features to help
students better use Google Hangouts. Instead, you'll tackle the problem with
the confidence of Sundar Pichai after yet another Alphabet quarterly
earnings beat.

The Daily Habit You Need to Build Your Product Sense

An easy way to develop your product sense is by analysing the products you
naturally encounter in your daily life. When using a product, think about:

Who was this product created for?

What's the main problem it was designed to solve?

What are the product’s end-user benefits (this is bigger than simply what
problem it solves!)?

How do the visual design and marketing copy help convey the product’s
purpose and benefits?

How does the product tie in with the company’s mission and vision?

A great deal of good product sense is having empathy for a product’s or


service’s users. That’s why, you must try to put yourself in a user's shoes when
answering the above questions while analysing a product or service,. Take
Snapchat, for example. Sure, you can post photos to your story or send
messages to people on Snapchat. But so can iMessage, WhatsApp,
Instagram, and Messenger. At a deeper level, Snapchat is about being able to
stay in touch with your closest friends in a casual, authentic way. That's why
opening the Snapchat app puts you directly on the camera, to make it
frictionless to express yourself and live in the moment - two core elements of
Snapchat’s company mission. It's also why photos and messages disappear
by default - this lowers the barrier to expressing yourself and pushes you to
share whatever you captured rather than spending time editing a photo you
know will soon be gone.

19
Statistics

Law of large numbers (LLN)


The Law of Large Numbers (LLN) states that if you sample a random variable
independently a large number of times, the measured average value should
converge to the random variable's true expectation. This is important in
studying the longer-term behaviour of random variables over time. As an
example, a coin might land on heads five times in a row, but over a large ‘n’
number of times, we would expect the proportion of heads to be
approximately half of the total flips. Similarly, a casino might experience a loss
on any individual game but should see a predictable profit over the long run.

Central Limit Theorem (CLT)


The Central Limit Theorem (CLT) states that if you repeatedly sample a
random variable many times, the distribution of the sample mean will
approach a normal distribution regardless of the initial distribution of the
random variable. The CLT provides the basis for much of the hypothesis
testing, which is discussed shortly. At a very basic level, you can consider the
implications of this theorem on coin flipping: the probability of getting some
number of heads flipped over a large n should be approximately that of a
normal distribution. Whenever you're asked to reason about any distribution
over a large sample size, you should remember to think of the CLT, regardless
of whether it is Binomial, Poisson or any other distribution

Hypothesis Testing
The process of testing whether a sample of data supports a particular
hypothesis is called hypothesis testing. Generally, hypotheses concern
properties of interest for a given population, such as its parameters, like u (for
example, the mean conversion rate among a set of users). The steps in testing
a hypothesis are as follows:

State a null hypothesis and an alternative hypothesis. Either the null


01 hypothesis will be rejected (in favour of the alternative hypothesis), or
it will fail to be rejected (although failing to reject the null hypothesis
does not necessarily mean it is true, but rather that there is not
sufficient evidence to reject it).

20
Use a particular test statistic of the null hypothesis to calculate the
02 corresponding p-value.

03 Compare the p-value to a certain significance level a.

Since the null hypothesis typically represents a baseline (e.g., the marketing
campaign did not increase conversion rates, etc.), the goal is to reject the null
hypothesis with statistical significance and hope that there is a significant
outcome.

Understanding hypothesis testing is the basis of A/B testing, a topic


commonly covered in tech companies' interviews. In A/B testing, various
versions of a feature are shown to a sample of different users, and each
variant is tested to determine if there was an uplift in the core engagement
metrics.

For example, you are working for Uber Eats, which wants to determine
whether email campaigns will increase its product's conversion rates. To
conduct an appropriate hypothesis test, you would need two roughly equal
groups (equal with respect to dimensions like age, gender, location, etc.). One
group would receive the email campaigns, and the other group would not be
exposed. The null hypothesis, in this case, would be that the two groups
exhibit equal conversion rates, and the hope is that the null hypothesis would
be rejected.

Test Statistics

A test statistic is a numerical summary designed to determine whether the


null hypothesis or the alternative hypothesis should be accepted as correct.
More specifically, it assumes that the parameter of interest follows a
particular sampling distribution under the null hypothesis.

For example, the number of heads in a series of coin flips may be distributed
as a binomial distribution. But with a large enough sample size, the sampling
distribution should be approximately normally distributed. Hence, the
sampling distribution for the total number of heads in a large series of coin

21
flips would be normally distributed.

p-values and Confidence Intervals

Both p-values and confidence intervals are commonly covered topics during
interviews. A p-value is a probability of observing the value of the calculated
test statistic under the null hypothesis assumptions. Usually, the p-value is
assessed relative to some predetermined level of significance (0.05 is often
chosen).

In conducting a hypothesis test, an a, or measure of the acceptable


probability of rejecting a true null hypothesis, is typically chosen prior to
conducting the test. Then, a confidence interval can also be calculated to
assess the test statistic. This is a range of values that, if a large sample were
taken, would contain the parameter value of interest (1-a)% of the time. For
instance, a 95% confidence interval would contain the true value 95% of the
time. If 0 is included in the confidence intervals, then we cannot reject the
null hypothesis (and vice versa).

Type 1 and 2 errors:

There are two errors that are frequently assessed: type I error, which is also
known as a "false positive" and type II error, which is also known as a "false
negative." Specifically, a type I error is when one rejects the null hypothesis
when it is correct, and a type II error is when the null hypothesis is not
rejected when it is incorrect.

Usually 1-a is referred to as the confidence level, whereas 1-ẞ is referred to as


the power. If you plot sample size versus power, generally, you should see a
larger sample size corresponding to a larger power. It can be useful to look at
power to gauge the sample size needed for detecting a significant effect.
Generally, tests are set up in such a way as to have both 1-a and 1-ẞ relatively
high (say at 0.95 and 0.8, respectively).

In testing multiple hypotheses, it is possible that if you ran many experiments


- even if a particular outcome for one experiment is very unlikely - you would

22
see a statistically significant outcome at least once. So, for example, if you set
a = 0.05 and run 100 hypothesis tests, then by pure chance you would expect
5 of the tests to be statistically significant. However, a more desirable
outcome is to have the overall a of the 100 tests be 0.05. This can be done by
setting the new a to a/n, where n is the number of hypothesis tests (in this
case, a/n = 0.05/100= 0.0005). This is known as Bonferroni correction, and
using it helps make sure that the overall rate of false positives is controlled
within a multiple testing framework.

Communication Skills

Communication skills are a subjective topic and are highly dependent on the
perception of the interviewer, the language the interview is conducted in and
many other factors. So here is a list of don'ts in a data science/analyst
interview in terms of communication. These are also generic to any job
interview.

Poor Communication: You are more likely to succeed in landing the job if you
can speak with the interviewer openly and have a fruitful conversation. The
key to success in the field of data science is communication. You will need to
deliver your data analysis to your team; therefore, you'll need to be able to
communicate effectively. The interview will be directed by you; therefore, it
goes without saying that you will speak more than the interviewer.

Not asking Questions: "The one who does not ask any questions remains a
fool forever," is a proverb. "The one who asks a question might seem a fool for
five minutes." A skilled interviewer will always give you the chance to ask
questions at the end of a data science interview. Do not think twice! Now is
the moment to inquire as to whether you are a good fit for this new position.
This enables the interviewer to gauge your commitment to the job.

Sounding Under or Overconfident: Talking about topics you don't fully


understand is just too awkward. During one interview, a candidate babbled
nonstop for 25 minutes about artificial neural networks (ANN). After that, he
realised that he always had focused on logistic regression and was not
well-versed in ANN. Even worse, he initiated the talk about it. He failed the
interview because he was just too confident. Talking only about topics with
which you are at ease is effective.

23
Not giving an explanation to your answers: Don't respond with a simple yes
or no to inquiries that call for your explanation. The interviewer will think
more highly of you for the position if you can clearly explain your response. Try
to justify your response; be specific and tie in hints (the interviewer might
help you with this). Bring one of your favourite projects to the data science
interview and be sure to know every little nuance of it.

Using incorrect phrases: The following five phrases should never be used
during a data science interview:
◆ I don't really have any weaknesses.
◆ I have no queries currently.
◆ I believe
◆ I think I am incapable of handling this task
◆ I shall try

24
Tech Stack or Tools Used for
Data Analyst

Structured Query Language (SQL)

One of the best rising fields with several career prospects is data science. SQL
is one of the most crucial skills to master to become a great data scientist.

SQL, also known as Structured Query Language, is one of the simplest


computer languages used in data science. Data scientists commonly use
relational databases like MySQL, SQL Server and Oracle, which support SQL
and are used in databases. According to pay scale studies, a data scientist
with SQL expertise makes INR 7,22,421 lakhs annually.

Data about businesses are kept in databases, and with the aid of a
database management system, these data are processed and managed.

SQL is the programming language that is most frequently used for


processing and interacting with databases.

Numerous database systems are supported by MySQL, SQL Server, Oracle


and others. However, different database systems implement SQL
standards in different ways.

SQL is used for several operations on data that is kept in databases,


including record creation, deletion and updating.

The current Big Data platforms employ SQL as a primary API for relational
databases and are based on SQL.

SQL is used by big data platforms like Hadoop and Spark to manage
relational database systems and handle structured data.

25
1. Python

Python is a popular choice among data scientists. This programming


language is widely used in data science and is a requirement for practically all
job postings involving data modelling and analytics. Here's why Python has
become the data science industry standard.

Python is beginner-friendly
Although not always programmers, data scientists should be tech-savvy. In
the middle of their careers, people from finance, marketing, HR and
academics frequently transition to data science and pick up new skills. In
data science, tools that are simpler to master are more likely to succeed.

For those without any prior IT experience, Python is the ideal choice due to its
simplicity and ease of use. For experts with varied backgrounds, it is quite
accessible. It might just take a few weeks to learn how to use Python to
process data and create straightforward models.

Python offers a variety of tools for handling statistics and math


The built-in mathematical operators, addition (+), subtraction (-), division (/)
and multiplication (*) can be used to execute the fundamental arithmetic
operations. You can use the math module for more complex mathematical
operations, including exponential, logarithmic, trigonometric and power
functions. With just a few lines of code, this module enables the execution of
intricate mathematical procedures. For instance, the Python math module
makes it simple to apply trigonometric and hyperbolic functions, compute
combinations and permutations using factorials and simulate periodic
functions.

Numerous libraries in the Python programming language, including


statistics, NumPy, SciPy and Pandas, offer direct access to a wide range of
statistical techniques. Such descriptive statistics as mean, median, mode,
weighted mean, variance, correlation, outliers, etc., are all readily available.

26
Python excels at data visualisation
Data visualisation produces a lot of data insights. Python's default library for
data visualisation is called matplotlib. Regarding the variety of alternatives
provided and the flexibility of the plots, it does so. However, using this library
to create anything sophisticated can take some time. Fortunately, additional,
more user-friendly data visualisation tools are based on matplotlib. Check out
the seaborn, Plotly and Bokeh libraries if you want to construct complex plots
using Python.

2. R
Data Science has become vital to studying data and drawing conclusions
from it. Raw data is transformed into supplied data products by industries.
The raw data must be processed using a number of crucial technologies in
order to do this. One of the programming languages that offers a powerful
environment for information analysis, processing, transformation and
visualisation is R. R, a GNU project, is a language and environment for
statistical computing and graphics.

For many statisticians who wish to get involved in creating statistical models
to address challenging problems, it is their first choice. There are several
packages in R that are useful for all types of sciences, including astronomy,
biology, etc. R was initially used for academic purposes, but it is currently also
used in industries.

Complex statistical modelling is carried out using R, a sophisticated


language. Additionally, R supports operations on vectors, matrices and arrays.
R is renowned for its graphical libraries, which enable users to construct
beautiful graphs and render them incomprehensible to users.

R Shiny, which is used to embed visualisations in web pages and offers a high
degree of interaction to the users, also enables users to create web
applications.

Furthermore, a key component of data science is data extraction. R offers the


ability to integrate your R code with database management systems in order

27
to achieve this. Additionally, R gives you access to a number of complex data
analytics capabilities, including the creation of prediction models and
machine learning algorithms. R also provides several packages for image
processing.

3. Tableau

Tableau is a leading data visualisation tool for data analysis and business
intelligence.

Exploratory data analysis is made simple by it.


EDA use in data science is frequently disregarded by many. But it is also one
of the key elements that determines whether your model succeeds or fails.
Before creating the model, the ability to visualise immediately without
writing any code is really useful.

Design of a captivating presentation


You may create visually appealing charts and visuals with the aid of
numerous codes written in python and other languages. However, switching
to Tableau enables a data scientist to make flexible and lovely graphics
without writing any code.

Seamlessly integrates with SQL


You may quickly copy and paste SQL queries to accomplish anything that is
possible in SQL. As a result, there is a need for training in tableau nowadays.

28
Behavioural Interview
You may not agree that behavioural questions are important. You might
think that the behavioural interview is all fluff and that you can simply wing
it. But companies take this very seriously. Amazon, for example, interviews
every single candidate on the company leadership principles like "customer
obsession" and "invent and simplify" in their bar raiser interviews. Uber also
includes a bar-raiser round that focuses on behavioural questions about cul-
ture fit, previous work experience and past projects. If you want to work
there, you must take behavioural interviews seriously. So here are some
pointers to keep in mind for your behavioural interview.

The behavioural interview is an integral part of an interview and consists of


questions that help the interviewer assess candidates based on actual
experiences in prior jobs or situations. Even the "friendly chat" time at the
start of the interview can essentially be a behavioural interview. So, while
you might not have an explicit calendar invite for a "Behavioural Interview",
you can be very well sure that you are being constantly evaluated. That
casual, icebreaker of a question, "So, tell me about yourself..." that's an
interview question! For every job, and at practically every round, there will
be a behavioural component, whether it is explicit or not.

Behavioural interview questions can happen with a recruiter before


getting to technical rounds. In which case, you might not even get to the
technical interview rounds if you don’t do well here. You are also being
evaluated during your technical interviews, where the first 5-10 minutes
are usually carved out for a casual chat about your past projects and your
interest in the company. It could also occur during a casual lunch to
understand how you behave outside of the interview setting.

At the end of the on-site interview, they know you can do the work, but are
you someone they want to personally work with? You'll meet with your
future boss, and maybe even their boss, where they'll both try to sell you on
the company but also see if you'd be a good culture fit. The reality is you are
constantly being assessed! That's why, based on frequency alone, preparing
for and practising answers to these questions is well worth the effort.

29
Popular Interview Questions for
a Data Analyst’s Position
Here are some popular interview questions asked in some well-known
companies.
*Possible answers have been provided for the first few questions for the reader’s benefit

On SQL

1. Facebook: What is a database view? What are some advantages the


views have over tables?
Possible Answer:

A database view is the result of a particular query within a set of tables. Unlike
a normal table, a view does not have a physical schema. Instead, a view is
computed dynamically whenever it is requested. In the underlying tables if
the view’s references are changed, then the view itself will change
accordingly. Views have several advantages over tables:

1. Views can simplify workflows by aggregating multiple tables, thus


abstracting the complexity of underlying data or operations.

2. Since views can represent only a subset of the data, they provide limited
exposure of the table's underlying data and hence increase data security.

3. Since views do not store actual data, there is significantly less memory
overhead.

2. Expedia: Say you have a database system where most of the queries
made were UPDATES/INSERTS/DELETES. How would this affect your
decision to create indices? What if the queries made were mostly
SELECTS and JOINS instead?
Possible Answer:

SQL statements that modify the database, like UPDATE, INSERT and DELETE,

30
need to change not only the rows of the table but also the underlying
indexes. Therefore, the performance of those statements depends on the
number of indexes that need to be updated. The larger the number of
indexes, the longer it takes those statements to execute. On the flip side,
indexing can dramatically speed up row retrieval since no underlying indexes
need to be modified. This is important for statements performing full table
scans, like SELECTS and JOINS.

Therefore, for databases used in online transaction processing (OLTP)


workloads, where database updates and inserts are common, indexes
generally lead to slower performance. In situations where databases are used
for online analytical processing (OLAP), where database modifications are
infrequent but searching and joining the data is common, indexes generally
lead to faster performance.

3. Microsoft: What is a primary key? What characteristics does a good


primary key have?
Possible Answer:

A primary key uniquely identifies an entity. It can consist of multiple columns


(known as a composite key) and cannot be NULL.

Stability: a primary key should not change over time.

Uniqueness: having duplicate (non-unique) values for the primary key


defeats the purpose of uniquely identifying each row

Irreducibility: no subset of columns in a primary key is itself a primary key.


Said another way, removing any column from a good primary key means
that the key's uniqueness property would be violated.

4. Amazon: Describe some advantages and disadvantages of relational


databases vs NoSQL databases.

5. Capital One: Say you want to set up a MapReduce job to implement a


shuffle operator whose input is a dataset and whose output is a
randomly ordered version of that same dataset. At a high level, describe
the steps in the shuffle operator's algorithm.

31
6. Amazon: Name one major similarity and difference between a WHERE
clause and a HAVING clause in SQL.

7. KPMG: Describe what a foreign key is and how it relates to a primary key

8. Microsoft: Describe what a clustered index and a non-clustered index


are. Compare and contrast the two.

On R
9. Explain data import in R language
Answer:

R Commander is used to import data in R language. To start the R


commander GUI, the user must type in the command Rcmdr into the
console

10. How missing values and impossible values are represented in R language?
Answer:

NaN (Not a Number) is used to represent impossible values, whereas NA (Not


Available) is used to represent missing values. The best way to answer this
question would be to mention that deleting missing values is not a good idea
because the probable cause for missing values could be some problem with
data collection or programming or the query. It is good to find the root cause
of the missing values and then take the necessary steps to handle them.

11. Two vectors X and Y are defined as follows – X <- c(3, 2, 4) and Y <- c
(1, 2). What will be the output of vector Z that is defined as Z <- X*Y
Answer: c(3,4,4)

12. dplyr package is used to speed up data frame management code.


Which package can be integrated with dplyr for large fast tables?

13. What will be the output of log (-5.8) when executed on R console?

32
14. What is the difference between data frame and a matrix in R?

On Python
15. What is the use of the split function in Python?
Answer:

Whenever there is a need to break bigger strings or a line into several small
strings, you need to use the split() function in Python.
The split() function still works if the separator is not specified by considering
white spaces as the separator to separate the given string or given line.

The syntax to define a split() function in Python is as follows:


split(separator, max)
Where, separator represents the delimiter based on which the given string or
line is separated max represents the number of times a given string or a line
can be split up. The default value of max is -1. In case the max parameter is not
specified, the split() function splits the given string or the line whenever a
separator is encountered

16. What is the difference between a list and a tuple?


Answer:
In short, the main difference between tuples vs lists in Python is mutability.
◆ A list is mutable.
◆ A tuple is immutable.
◆ In other words, a Python list can be changed, but a tuple cannot.
◆ This means a tuple performs slightly better than a list.
◆ But at the same, a tuple has fewer use cases because it cannot be modified

17. How would you convert a list to an array?


Answer:
To convert a list to array in Python, use the np.array() method. The np.array()
is a numpy library function that takes a list as an argument and returns an
array containing all the list elements.

33
Code:
import numpy as np
elon_list = [11, 21, 19, 18, 29]
elon_array = np.array(elon_list)
print(elon_array)
print(type(elon_array))

18. What are the advantages of NumPy arrays over Python lists?

19. How do you reverse a string in Python?

20. How do you select both rows and columns from the data frame?

On Product Sense
21. Facebook: Imagine the social graphs for both Facebook and Twitter.
How do they differ? What metrics would you use to measure how these
social graphs differ?
Answer:

A bad answer glazes past the nuances of the social graph and jumps straight
into defining metrics. For clarity, not just for you, dear reader, but also for the
hypothetical interviewer asking this question, we'll structure our answer into
three steps.

Step 1: Explaining User Behavior on Each Platform

Before explaining how the social graphs of Facebook and Twitter differ, it's
crucial to first consider how each platform's average user interacts with their
respective platform. Facebook is mostly about friendships, and so two users
on the same social graph are mutual friends. Twitter, on the other hand, is
more focused on followership, where one user typically follows another (who
is usually an influential figure) without getting a follow back. Thus, Twitter
likely has a small number of people with a very large followership, whereas on
Facebook, that pattern appears less often.

34
Step 2: Describing the Social Graph and Its Differences Between Facebook
and Twitter

Modelled as a graph, let's say each user is represented as a node, and the
edge linking two nodes denotes a relationship (typically, friendship on
Facebook and fellowship on Twitter) between the two users whose nodes the
edge connects. Most nodes on Twitter would have low degrees, but a small
number of nodes (those of influential people) would have very high degrees,
resulting in a "hub-and-spoke" social graph for that platform.

Step 3: Defining Metrics to Measure the Social Graph

One way to quantify the difference between the two platforms' social graphs
is by looking at the distributions of friendships/followership represented by
the social graphs of each platform's typical users. Because a typical node's
degrees—that is, the number of connections it has to other nodes—should
capture the difference in these platforms' social graphs, one concrete metric
would thus be the average degree among all nodes on each platform.
Alternatively, to obtain an even more detailed understanding of the
differences between the two social networks, we could construct
box-and-whisker plots of the platforms' degrees among all their respective
nodes.

Yet another way of looking at the two graphs would be to check the
distribution of degrees across all nodes in each platform's network. Likely, the
Twitter distribution would show a greater amount of right skewness than
that of Facebook. Metrics that quantify a distribution's skewness or kurtosis
could thus be used to describe the difference between the two platforms'
degree distributions and, hence, social graphs.

22. Uber: Why does surge pricing exist? What metrics would you track to
ensure that surge pricing was working effectively?
Answer:

For any metrics definition question, it's important to first explain the
business goal of the product or feature and then explain the related

35
stakeholders before ultimately landing on good metrics to measure the
success of that feature.

Step 1: Explain Uber's Motivation for Surge Pricing

You don't have to be an econ major to realise that surge pricing is about
fixing imbalances between supply and demand. In the case of Uber, such an
imbalance could result from either a lack of drivers or an excess number of
potential riders. Therefore, surge pricing's goal would be to increase supply
by enticing more drivers to use the app through increased pay and reduce
demand by raising prices for riders.

Step 2: Consider Stakeholders Related to Surge Pricing

A nuanced answer would consider the various stakeholders involved in surge


pricing beyond just the immediately obvious drivers and riders. For example,
a good candidate would mention associated business functions within Uber
that could be affected by the surge pricing algorithm not working effectively.

Step 3: Define Metrics & Counter Metrics for Surge Pricing

Surge-specific metrics are the duration of the surge, the surge pricing
multiplier, and the number of riders and drivers in the affected area. We
should also track the following metrics during surge periods: the number of
rides taken, number of rides canceled, total revenue made for both Uber and
Uber's drivers, total profit made by both Uber and Uber's drivers and the
average ride wait time. These are all standard metrics but critical to monitor
to ensure the business is healthy during surge periods.

In addition, topline metrics like user's lifetime value (LTV), driver retention,
rides taken, active daily riders, and drivers should also be tracked so that we
can be sure surge pricing isn't having adverse impacts on the business
overall.

As with any good metrics definition question, a discussion on counter


metrics is important. Even if the surge pricing is bringing in extra money, one
counter metric to implement would be the net promoter score (NPS). Surge

36
pricing can annoy users, for whom frequently fluctuating sky-high prices can
be a source of frustration. And then there's the potential for mistakes or users
in a less-than-sober state accidentally making a purchase. It can even be a PR
risk. Like clockwork, every New Year's day, there's a news story about
someone getting drunk and taking an $800 Uber by accident. Between bad
PR, frustrated users and potentially increased support tickets, some metrics
like NPS that ensure that this is a quality program, is key.

23. Airbnb: What factors might make A/B testing metrics on the Airbnb
platform difficult?

24. Google: We currently pay the Mozilla foundation nine figures per year
for Google to be the default search engine on Firefox. The deal is being
renegotiated, and Mozilla is now asking for twice the money. Should we
take the deal? How would you estimate the upper bound on what Google
should be willing to pay?

25. LinkedIn: Assume you were working on LinkedIn's Feed. What metrics
would you use to track engagement? What product ideas do you have to
improve these engagement metrics?

26. Lyft: Your team is trying to figure out whether a new rider app with
extra UI features would increase the number of rides taken. For an A/B
test, how would you split users and ensure that your tests have balanced
groups?

27. Amazon: If you were to plot the average revenue per seller on the
Amazon marketplace, what would the shape of the distribution look like?

28. Facebook: Besides posts, Facebook is legally obligated to remove?


What other types of posts should Facebook take down? What features
would you use to identify these posts? What are the trade-offs that need
to be considered when removing these posts?

Reference Links: W3Schools Online Web Tutorials, Kaggle: Your Machine Learning
and Data Science Community, Tableau Practice Test - Practice Test Geeks

37
Dos and Don’ts
During an Interview
DOs
Practice Business Questions:

A business case question will typically be asked during an interview. It is one


of the most typical interview questions. So, make sure you put a lot of work
into that section. To answer a business case issue, an individual must possess
data knowledge and critical thinking skills. It provides the recruiting manager
with information about your suitability for the position. You will benefit
greatly from practising these questions with someone with experience in
data science interviews. It boosts your self-assurance during an actual
interview, giving you the upper hand over competing applicants.

Know Every Project Inside-out:

Never pass up the chance to discuss papers or projects pertinent to the


applied position. The employer wants to know how you handled the
assignment, if you finished it, how it turned out and what you learned from it.
You mention working on an AI application, for instance. You might be asked
to expound on that by the interviewer. This is the ideal chance to
demonstrate your abilities and experience, dazzle the interviewer and
dominate the data science interview.

Constantly Practice ML, Modeling and Statistics Questions:

You will need to demonstrate a solid background in machine learning,


modelling and statistics for a significant portion of all data science problems.
To expand and deepen your knowledge and expertise with each of these
topics, you must put in the extra effort and time. Working through tutorials or
quiz issues is an excellent way to start the process. This provides you with a
solid understanding of these areas and an overarching perspective on how
they connect to data science. But more importantly, review your
mathematical concepts and formulas. Your advantage over individuals who
are less aware of this type of stuff is increased by this.

38
DONTs

Pretend to Know the Answer:

The fear of being rejected is one possible explanation for seeming to know
the solution. However, the contrary is preferable, where you either attempt
the answer based on what you know or state that you have no idea. Although
it may come as a shock to some, it is better to admit ignorance than lie, fake,
or evade the question. It's okay to not know everything, and some
interviewers might only be checking your knowledge to see what you need to
be taught in the future rather than testing you. It is also fantastic to know that
you were completely honest and direct in your interview so that when the
time comes, you really do land the role.

Go on Long Tangents:

The interviewer will also evaluate your level of expertise, understanding, and
correctness. How do they achieve this? They pose a question to you, and you
respond in a way that demonstrates your understanding. Answering the
question directly will demonstrate that you have done a thorough research
and that you are informed about the subject you are discussing. However, if
you start going into detail about things that really don't need to be explained,
it will appear that you don't know much about the subject, and you'll also
annoy the interviewer, which is not a good sign. So, always try to provide a
direct response. Crisp and simple explanations are preferred, and detailed
responses should only be given in response to specific inquiries.

In conclusion:

◆ Landing a job takes time, requires patience.


◆ A crisp but well-defined resume is a must.
◆ The tech stack for data science is mainly SQL, R and Python.
◆ Business acumen gets accumulated over time.
◆ Keep practising coding rounds; they are the most critical.
◆ Get into the core of statistics than just knowing the code to implement it.
◆ Behavioural round can be a game changer despite having good tech skills.

39
Copyright © 2022 Emeritus | All rights rеѕеrvеd.
No раrt оf this book may bе reproduced in any fоrm оr by аnу еlесtrоniс
оr mесhаniсаl means inсluding infоrmаtiоn storage аnd rеtriеvаl ѕуѕtеmѕ,
withоut реrmiѕѕiоn in writing from thе author.

emeritus.org/in Follow us on:

You might also like