0% found this document useful (0 votes)
258 views

Hypothesis Testing - Interview Questions in Business Analytics

Uploaded by

rohit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
258 views

Hypothesis Testing - Interview Questions in Business Analytics

Uploaded by

rohit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2/23/2019 4.

Hypothesis Testing - Interview Questions in Business Analytics

 Interview Questions in Business Analytics

PREV NEXT
⏮ ⏭
3. Introduction to Basic Statistics 5. Correlation and Regression
  🔎

© Bhasker Gupta 2016

Bhasker Gupta, Interview Questions in Business Analytics , 10.1007/978-1-


4842-0599-0_4

4. Hypothesis Testing
1
Bhasker Gupta

(1)Bangalore, Karnataka, India

Hypothesis testing forms the next building block in learning statistical


techniques. Now that you are familiar with probability distributions, the
next step is to validate a data point or whether a sample falls into these
distributions. Building a hypothesis is the first step in conducting an
experiment or designing a survey. Don’t just look on hypothesis testing as a
statistical technique, but try to understand the core principles of this
concept.

Q: What is a hypothesis?
It is a supposition or assertion made about the world. A hypothesis is the
starting point of an experiment, in which an assertion is made about some
available data, and further investigation will be conducted to test if that
assertion is correct or not.

Q: What is hypothesis testing?


It is the process in which statistical tests are used to check whether or not a
hypothesis is true, using data. Based on hypothetical testing, we choose to
accept or reject a hypothesis.

An example: Research is being conducted on the effect of TV viewing on


obesity in children. A hypothesis for this would be that children viewing
more than a certain amount of hours of television are obese. Data is then
collected, and hypothesis testing is done to determine whether the
hypothesis is correct or not.

Q: Why is hypothesis testing necessary?


When an event occurs, it can be the result of a trend, or it can occur by
chance. To check whether the event is the result of a significant occurrence
or merely of chance, hypothesis testing must be applied. In the preceding
example of TV viewing and obesity, the hypothesis may be incorrect, and
the data may show that it is merely chance that watching television makes
some children obese.

Q: What are the criteria to consider when


developing a good hypothesis?
A hypothesis is the initial part of a research study. If the hypothesis formed
is incorrect, the research study is also likely to be incorrect; therefore, it
should be properly considered and contemplated and should include the
following criteria:

The hypothesis should be logically consistent and make sense with


regard to literature and language.

The hypothesis should be testable. If a hypothesis cannot be tested, it


has no use.

It should be simple and clear, to avoid possible confusion.

Q: How is hypothesis testing performed?


There are several statistical tests available for hypothesis testing. The first
step is to formulate a probability model based on the hypothesis. The
probability model is also decided on the basis of the data available and the
informed judgment of the researcher. Then, depending on the answers
required, the appropriate statistical tests are selected.

Q: What are the various steps of hypothesis


testing?
Hypothesis testing is conducted in four steps.

1. Identification of the hypothesis needed to be tested, for example,


research to check the obesity in teenagers.

2. Selection of the criterion upon which a decision as to whether the


hypothesis is correct or not is to be taken. For example, in the
preceding problem, the criterion could be the body mass index (BMI)
of the teenagers.

3. Determining from the random sample the statistics we are interested


in. We select a random sample and calculate the mean. For example,
a random sample of 1,000 teenagers is selected from a population,
and their mean BMI is calculated.

4. Compare the result with the expected result, to check the validity.
The discrepancy between the expected and real result helps to decide
whether the claim is true or false.

Q: What is the role of sample size in analytics?


Sample size for a statistical test is very important. Sample size is inversely
proportional to standard error, i.e., the larger the sample size, the lesser the
standard error and the greater the reliability. However, larger sample size
means that a very small difference can become statistically significant,
which may not be clinically or medically significant. The two main aspects
of any study are generalizability (external validity) and validity (internal

Find answers on the fly, validity).


or master something new. Subscribe today. See pricing options.
Large samples have generalizability but not validity aspects.

https://learning.oreilly.com/library/view/interview-questions-in/9781484205990/A335095_1_En_4_Chapter.html 1/4
2/23/2019 4. Hypothesis Testing - Interview Questions in Business Analytics

Q: What is standard error?


The standard error (denoted by σ) is the standard deviation of a statistic. It
reflects the variation caused by sampling. It is inversely proportional to
sample size.

Q: What are null and alternate hypotheses?


A null hypothesis is the statement about a statistic in a population that is
assumed to be true. It is the starting point of any research study. Based on
statistical tests, a decision is taken as to whether the assumption is right or
wrong.

An alternative hypothesis is the contradictory statement that states what is


wrong with the null hypothesis.

We test the validity of a null hypothesis and not of an alternative


hypothesis. An alternative hypothesis is accepted when the null hypothesis
is proved to be wrong.

An example would be a study conducted to determine the mean height of a


class of students. The researcher believes that the mean height is 170
centimeters (cms). In this case,

H0: μheight = 170 cms

HA: μheight ≠ 170 cms

Q: Why are null and alternate hypotheses


necessary?
Following are the reasons null and alternate hypotheses are necessary:

The two hypotheses provide a rough explanation of the occurrences.

They provide a statement to the researcher that acts as the base in a


research study and is directly tested.

They provide the outline for reporting the interpretations of the


study.

They behave as a working instrument of the theory.

They verify whether or not the test is maintained and is detached


from the investigator’s individual standards and choices.

Q: How are the results of null/alternate hypotheses


interpreted?
Statistical tests are conducted to check the validity of null hypotheses .
When a null hypothesis is proved to be wrong, the alternate hypothesis is
accepted. Consider, for example, a courtroom scenario. When a defendant
is brought to trial, a null hypothesis is that he is innocent. The jury
considers the evidence to decide whether or not the defendant is guilty.

In the preceding courtroom example, if there is insufficient evidence, the


jury will free the defendant rather than convicting him or her. Similarly, in
statistics, a null hypothesis is accepted if the research fails to prove
otherwise, rather than endorsing an alternative hypothesis.

Q: What is meant by “level of significance”?


Level of significance is the criteria by which a decision is reached. In the
courtroom example, the level of significance can be stated as the minimum
level of evidence required by the jury to reach a verdict regarding the guilt
or innocence of the defendant. Similarly, in statistics, it is the criterion by
which a null hypothesis is rejected. Level of significance is denoted by a.

Where to establish the level of significance is determined by the alternative


hypothesis. If the null hypothesis is true, the sample mean is equal to the
mean population on average. If a = 5%, it means 95% of all the sample
means lie within the range of μ±s. Let us consider an example in which the
null hypothesis is that in the United States children watch three hours of
TV. The level of significance is set at 95%. The other 5% value lies outside
the range of μ±s. The alternative hypothesis states that the children do not
watch three hours of TV (either more or less).

If a sample has a mean of four hours, we will calculate the outcome by


determining its likelihood. We can see, then, how far the number of
standard deviations for this result is from the mean. If the significance level
is decided at 95%, and the distance from mean is more than one standard
deviation from the mean, it implies that the null hypothesis is true.

Q: What is test statistics?


Test statisticsrefers to a mathematical formula determining the likelihood
of finding sample outcomes, if the null hypothesis is true, to make a
decision regarding the null hypothesis.

If the level of significance is set at 95% and the test statistic value is less
than 0.05, this would mean that the null hypothesis is wrong and should be
rejected. Therefore, the researcher can take either of the following two
decisions:

Reject the null hypothesis, when the test statistic is less than a.

Retain the null hypothesis, when the test statistic is greater than a.

Q: What are the di erent types of errors in


hypothesis testing?
When we perform hypothesis testing, there can be errors such as falsely
accepting or rejecting the null hypothesis. There are two types of
distinguished errors : type I errors and type II errors. Refer to Figure 4-1
for more details.

Figure 4-1. Type I & Type II errors

A type I error occurs when a null hypothesis is incorrectly rejected and an


alternate hypothesis is accepted. The type I error rate or significance level
is denoted by a. It is generally set at 5%. In the courtroom example, if the
judge convicts an innocent defendant, he/she is committing a type I error.

A type II error, or error of second kind, occurs when a null hypothesis is


incorrectly accepted when the alternate hypothesis is true. If a type I error

Find answers on the fly, isdenoted


ora case
master something new. Subscribe today. See pricing options.
of a false positive, a type II error is a case or a false negative. It is
by b and is related to the power of a test. In the example of the

https://learning.oreilly.com/library/view/interview-questions-in/9781484205990/A335095_1_En_4_Chapter.html 2/4
2/23/2019 4. Hypothesis Testing - Interview Questions in Business Analytics

courtroom trial, if a judge lets a guilty defendant free, he is committing a


type II error.

Q: What is meant by the statement “A result was


said to be statistically significant at the 5% level.”?
The result would be unexpected if the null hypothesis were true. In other
words, we reject the null hypothesis.

Q: What are parametric and non-parametric tests?


In a parametric statistical test, assumptions such as that a population is
normally distributed or has an equal-interval scale are made about the
parameters (defining properties) of the population distribution. A non-
parametric test is one that makes no such assumptions.

Q: What di erentiates a paired vs. an unpaired


test?
When we are comparing two groups, we have to decide whether to perform
a paired test . A repeated-measures test, as it is called, is used when
comparing three or more groups.

When the individual values are unpaired or matched or related among one
another between groups, we use an unpaired test. In cases in which before
and after effects of a study are required, a paired or repeated-measures test
is used. In the case of measurements on matched/paired subjects, or in one
of repeated lab experiments at dissimilar times, each with its own control,
paired or repeated-measures tests are also used.

Paired tests are selected for closely correlated groups. The pairing can’t be
based on the data being analyzed, but before the data were collected, when
the subjects were matched or paired.

Q: What is a chi-square test?


2
A chi-square (χ ) test is used to examine if two distributions of categorical
variables are significantly different from each other. Categorical variables
are the variables in which the value is in a category and not continuous,
such as yes and no or high, low, and medium or red, green, yellow, and
blue. Variables such as age and grade-point average (GPA) are numerical,
2
meaning they can be continuous or discrete. The hypothesis for a χ test
follows:

H0: There is no association between the variables.

HA: There is association between them.

The type of association is not specified by the alternative hypothesis


though. So interpretation of the test requires closer attention to the data.

Q: What is a t-test?
A t-test is a popular statistical test to draw inferences about single means or
about two means or variances, to check if the two groups’ means are
statistically different from each other, where n < 30 and the standard
deviation is unknown.

Figure 4-2 here shows idealized distributions. As represented in the figure,


the means of the control and treatment group will most likely be located at
different positions. The t-test checks if the means are statistically different
for the two groups.

Figure 4-2. Idealized distributions

The t-test judges the difference between the means relative to the spread or
variability of the scores of the two groups.

Q: What is a one-sample t-test?


A one-sample t-test compares the mean of a sample to a given value,
usually the population mean or a standard value. Basically, it compares the
observed average (sample average) with the expected average (population
average or standard value), adjusting the value of the number of cases and
the standard deviation.

Q: What is a two-sample t-test?


The purpose of the two-sample t-test is to determine if two population
means are significantly different. The test is also known as the independent
samples t-test, since the two samples are not related to each other and can
therefore be used to implement a between-subjects design. In addition to
the assumption of independence, both distributions must be normal, and
the population variances must be equal (i.e., homogeneous).

Q: What is a paired-sample t-test?


The purpose of the repeated-measures t-test (or paired-sample t-test ) is to
test the same experimental units under different treatment conditions—
usually experimental and control—to determine the treatment effect,
allowing units to act as their own controls. This is also known as the
dependent samples t-test, because the two samples are related to each
other, thus implementing a within-subjects design. The other requirement
is that sample sizes be equal, which is not the case for a two-sample t-test.

Q: Briefly, what are some issues related to t-tests?


The biggest issue with t-tests results from the confusion of its application
as opposed to the z-test. Both statistical tests are used for almost the same
purpose, except for a slight difference; the difference being when to use
which test. When a sample is large (n ≥ 30), and whether the population
standard deviation is known or not, a z-test is used. For a limited sample (n
< 30), when the standard deviation of the population is unknown, a t-test is
chosen.

Find answers on the fly, or master something new. Subscribe today. See pricing options.

https://learning.oreilly.com/library/view/interview-questions-in/9781484205990/A335095_1_En_4_Chapter.html 3/4
2/23/2019 4. Hypothesis Testing - Interview Questions in Business Analytics

Recommended / Playlists / History / Topics / Settings / Get the App / Sign Out
© 2019 Safari. Terms of Service / Privacy Policy
PREV NEXT
⏮ ⏭
3. Introduction to Basic Statistics 5. Correlation and Regression

Find answers on the fly, or master something new. Subscribe today. See pricing options.

https://learning.oreilly.com/library/view/interview-questions-in/9781484205990/A335095_1_En_4_Chapter.html 4/4

You might also like