Statistical Methods and Testing of Hypothesis
Statistical Methods and Testing of Hypothesis
COMPUTER SCIENCE
SEMESTER - II (CBCS)
PAPER VI
STATISTICAL METHODS AND
TESTING OF HYPOTHESIS
Published by
Director
Institute of Distance and Open learning ,
University of Mumbai,Vidyanagari, Mumbai - 400 098.
Unit I
Unit II
2. Hypothesis Testing 15
Unit III
3. Non-Parametric Tests 33
*****
Syllabus
2. Kulkarni, M.B., Ghatpande, S.B. and Gore, S.D. (1999): Common statistical tests.
Satyajeet Prakashan, Pune
4. Gupta, S.C. and Kapoor, V.K. (4th Edition) : Applied Statistics, S. Chand and
Son’s, New Delhi
*****
UNIT I
1
STANDARD DISTRIBUTIONS
CONTENTS OF MODULE
Unit Structure
1.0 Objective
1.1 Introduction
1.2 Study Guidance
1.3 Standard Distributions
1.3.1 Random, Discrete and continuous variable
1.3.2 Probability Mass Function
1.3.3 Probability Density Function
1.3.4 Expectation
1.3.5 Variance
1.3.6 Cumulative Distribution Function
1.3.7 Reliability
1.4 Introduction and properties of following distributions
1.5 Binomial Distribution
1.6 Normal Distribution
1.7 Chi-square test
1.8 T-test
1.9 F-test
1.10 Summary
1.11 Unit End Questions
1.12 References
1.13 Further Readings
1.0 OBJECTIVES
Students will be able to:
Identify the types of random variables.
Understand the concept of Probability distribution.
Enable students to understand various types of distributions.
1.1 INTRODUCTION
The science of statistics deals with assessing the uncertainty of inferences
drawn from random samples of data. This chapter focuses on random
variables its types and their probability distribution. To assess the outcome
1
of an experiment it is desirable to associate a real number X with the Standard Distributions
possible outcome of an event. The concept of “randomness” is Contents of Module
fundamental to the field of statistics. Probability is not only used for
calculating the outcome of one event but also can summarize the
likelihood of all possible outcomes. The relationship between each
possible outcome for a random variable and its probabilities is called a
probability distribution. Probability distributions are an important
foundational concept in probability and the names and shapes of common
probability distributions will be familiar. The structure and type of the
probability distribution vary based on the properties of the random
variable, such as continuous or discrete, and this, in turn, impacts how the
distribution might be summarized or how to calculate the most likely
outcome and its probability.
2
Statistical Methods And
P a X b f x dx
b
Testing of Hypothesis a
(2) The sum of the probabilities for each value of the random variable
must be equal to one.
px 1
n
i.e, i i
X 0 1 2
P[X=x] 1/4 2/4 1/4
The area between the density curve and horizontal X-axis is equal to
1,
i.e. f x dx 1
Note: Please note that the probability mass function is different from the
probability density function. f(x) does not give any value of probability
directly hence the rules of probability do not apply to it.
Eg.: Let X be a continuous random variable with the PDF is given by
x, 0 x 1
F X find p [0.2<x<1.2]
2 x ,1 x 2
Solution:
P 0.2 X 1.2 f x dx
1.2
0.2
2 x dx
1 1.2
0.2
xdx
1
1 1.2
x2 x2
2x
2 0.2 2 1
1 1
0.02 2.4 0.72 2
2 2
1 1
0.02 1.68 2
2 2
0.66
E.g. Find the expected value of the following probability distribution from
the given probability distribution table
4
Statistical Methods And
Testing of Hypothesis x -1 -2 -3 0 1 2
Solution:
Expected value,
E X i 1 xi P xi
n
= 0.42
V X EX2 EX 2
where E(X) is the expected value
5
E X 2 x2 p x Standard Distributions
Contents of Module
V X EX2 EX 2
where E(X) is the expected value
E X 2 x 2 f x dx
X 1 2 3 4 5 6
P(X) 0.2 0.15 0.1 0.2 0.15 0.2
E X i ` xi p xi
n
3.55
E X 2 x2 P x
15.85
V X E X 2 E X
2
15.85 3.55
2
15.85 12.6025
3.2475
Mean = 3.55
Variance = 3.2475
6
Statistical Methods And
Eg: The p.d.f of random variable X is f X 6 x x 2 ,0 x 1 Find
Testing of Hypothesis
Mean and variance?
Mean E X xf x dx
6x x x 2 dx
1
1
x3 x 4
6
3 4 0
1 1
6
3 4
1
6
12
1
2
E X 2 x 2 f x dx
x 2 6 x x 2 dx
1
6 6 x3 x 4 dx
1
1
x 4 x5
6
4 5 0
1
6
20
3
10
V X E X 2 E X
2
3 1
10 4
1
20
1
Mean =
2
7
1 Standard Distributions
Variance Contents of Module
20
1.3.6 Cumulative Distribution Function:
When we are dealing with inequalities, for instance, X < a, the resulting
set of the outcome of elements will contain all the elements lesser than a
that is ” to a.
1.3.7 Reliability:
Reliability is dependent on probability for measuring and describing its
characteristics.The probability that the component survives until some
time t is called reliability R(t) of the component where X be the lifetime or
the time to failure of a component.
Bernoulli’s Trial:
Bernoulli‟s trials are events or experiments which results in two mutually
exhaustive outcome one of them is termed as success and the other is
failure. For example , when an unbiased coin is tossed we can define
success as getting tail and hence getting head is failure
N
e / 2 2 where =
x
2
The equation of the normal curve is
2
standard 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 =𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑚𝑒𝑎𝑛
x
We can transform the variable x to z here z is called normal
variate.
9
2. The mean, median, and mode of a normal distribution are identical. Standard Distributions
Contents of Module
3. The total area under the normal curve is unity.
4. Normal distributions are denser in the center and less dense in the
tails.
8. The position and shape of the normal curve depend upon , 𝑎𝑛𝑑 𝑁
10
Statistical Methods And
Testing of Hypothesis
1.8 T - DISTRIBUTION
The t-Distribution, also known as Student‟s t-distribution is the probability
distribution that estimates the population parameters when the sample size
is small and the population standard deviation is unknown.
It resembles the normal distribution and as the sample size increases the t-
distribution looks more normally distributed with the values of means and
standard deviation of 0 and 1 respectively.
Properties of t-Distribution:
1. The graph of the t distribution is also bell-shaped and symmetrical
with a mean zero.
2. The t-distribution is most useful for small sample sizes, when the
population standard deviation is not known, or both.
3. The student distribution ranges from to (infinity).
4. The shape of the t-distribution changes with the change in the degrees
of freedom.
5. The variance is always greater than one and can be defined only when
the degrees of freedom v 3
6. It is less peaked at the center and higher in tails, thus it assumes a
platykurtic shape.
11
7. The t-distribution has a greater dispersion than the standard normal Standard Distributions
distribution. And as the sample size „n‟ increases, it assumes the Contents of Module
normal distribution. Here the sample size is said to be large when
n 30 .
v2
Mean = v1 , for v2 2
v 2 2
2v2 2 v 1 v2 2
Variance = ,
v1 v 2 2 v2 4
2
1.10 SUMMARY
We discussed about random variable and its different types. There are two
types of probability distribution, discrete and continuos.A random variable
assumes only a finite or countably infinite number of values are called a
discrete random variable. A continuous random variable can assumes
values uncountable number of values. Discrete random variable is
associated with probability mass function and that of continuous related
with probability density function. Expected value and variance of the
discrete and continuous distribution were defined. We learnt some
standard distributions and its properties and these distributions will be
applicable in testing of hypothesis. The application methods of probability
12
Statistical Methods And can be seen in modeling of text and Web data, network traffic modeling,
Testing of Hypothesis probabilistic analysis of algorithms and graphs, reliability modeling,
simulation algorithms, data mining, and speech recognition.
x
(i) f x where x =0 , 1 , 2 , 3 , 4 ,5
15
2 x
(ii) f x where x = 3, 4, 5
3
1
(iii) f x
where x = 1, 2
2
4. Consider tossing of a fair coin 3 times Define X = number of times tails
occurred
Value 0 1 2 3
x 2 4 6 8 10
P(x) 0.3 0.2 0.2 0.2 0.1
13
6. A random variable x has following probability distribution Standard Distributions
Contents of Module
x 0 1 2 3 4 5 6
P(x) k 2k 3k 5k 4k 2k K
7. A bag contains 4 Red and 6 White balls. Two balls are drawn at
random and gets Rs.10 for each red and Rs.5 for each white ball..
Find his mathematical expectation.
8. A continuous distribution of a variable X in the range (-3, 3) is
defined by
(i) Verify that the area under the curve is unity.
(ii) Find the mean and variance of the above distribution.
1
F x 3 x - 3 x -1
2
16
1
6 2x -1 x 1
2
16
1
3 x 1 x 3
2
16
1.12 REFERENCES
1. Probability and Statistics with Reliability, Queuing and Computer
Science Applications, Kishor S. Trivedi, 2016 by John Wiley &Sons,
Inc., 1946.
2. Fundamentals of Mathematical Statistics by S.C. Gupta , 10th Edition,
2002.
*****
14
UNIT II
2
HYPOTHESIS TESTING
Unit Structure
2.0 Objective
2.1 Introduction
2.2 Hypothesis Testing
2.3 Null Hypothesis (𝐻o)
2.4 Alternate Hypothesis (𝐻1)
2.5 Critical Region
2.6 P-Value
2.7 Tests based on T
2.8 Normal and F Distribution
2.9 Analysis of Variance
2.10 One Way analysis of variance
2.11 Two-way analysis of variance
2.12 Summary
2.13 Unit End Questions
2.14 References for Future Reading
2.0 OBJECTIVE
Statistics is referred to as a process of collecting, organizing and
analyzing data and drawing conclusions.
The statistical analysis gives significance to insignificant data or
numbers.
Statistics is “a branch of mathematics that deals with the collection,
analysis, interpretation, and presentation of masses of numerical data.
2.1 INTRODUCTION
The science of collecting, organizing, analyzing and interpreting data
in order to make decisions.
Statistics is used to describe the data set and to draw conclusion about
the population from the data set.
15
Inferential Method: This method uses confidence interval and Hypothesis Testing
significance test which are part of applied statistics.
16
Statistical Methods And
Testing of Hypothesis
Type II Error:
When we accept a hypothesis when it should be rejected
▸ If the sample being tested falls into either of the critical areas, the
alternative hypothesis is accepted instead of the null hypothesis.
17
▸ One tail test: A one-tailed test is a statistical test in which the critical Hypothesis Testing
area of a distribution is one-sided so that it is either greater than or less
than a certain value, but not both.
▸ If the sample being tested falls into the one-sided critical area, the
alternative hypothesis will be accepted instead of the null hypothesis.
One-tailed tests are applied to answer for the questions: Is our finding
significantly greater than our assumed value? Or: Is our finding
significantly less than our assumed value?
Two-tailed tests are applied to answer the questions: Are the findings
different from the assumed mean?
Level of Significance:
18
Statistical Methods And
Testing of Hypothesis 2.6 P -VALUE
Z Score:
▸ e -μ)/ σ/√N)
▹ If Z > Zc , reject Ho
Question:
Solution:
Step 1- Write given values
Population
Parameter
N = 50
19
Hypothesis Testing
▸
▸ LOS = = 0.01= 1 %
Step 2- Propose H0
α=0.05 (5 %) α=0.01 (1 %)
Two-tailed Test Zc=1.96 Zc= 2.58
One-tailed Test Zc=1.645 Zc= 2.33
Μ 1800
Σ 100
N 50
1850
Step 6 – Inference
20
Statistical Methods And
Testing of Hypothesis
α=0.05 (5 %) α=0.01 (1 %)
Z=2.4748
μ 74.5
σ 8
N 200
75.9
Step 6 – Inference
Z =2.4748, Zc=1.645
As Z > Zc, reject Ho.
Therefore, we can support the claim at 0.05 LOS. i.e., the performance of
the school is better than population
Zc= 1.96
Step 5:
Z =2.4748
Step 6 : Inference
Therefore, we can support the claim at 0.05 LOS. i.e., the performance of
the school is different than the population
α 0.05 5 %) α 0.01 1 %)
22
Statistical Methods And
Testing of Hypothesis
Z Score
Mean
Z= ( -μ)/(σ/√N)
Proportion
Z= (P -p)/√(pq/N)
3. Identify test-
6. Inference-
23
Question Hypothesis Testing
N = 200
Step 2- Propose H0
Zc= 1.645
Z= (P -p)/√(pq/N)
Z=- 4.714
α 0.05 5 %) α 0.01 1 %)
p 0.9
q 0.1
P 0.8
N 200
24
Statistical Methods And Step 6 – Inference
Testing of Hypothesis
Z = -4.714, Zc=-1.645
Therefore, we cannot support the claim at 0.05 LOS. i.e., the medicine is
not 90% effective.
n(E) = 6
n(S) = 36
q = 1-p = 0.833
N = 100
P=23/100=0.23
LOS = α = 0.05= 5 %
25
Step 4: Get table value of Zc for LOS α=0.05 (5 %) Hypothesis Testing
Zc= 1.96
α=0.05 (5 %) α=0.01 (1 %)
Z= (P -p)/√(pq/N)
Z=0.1689
p 0.167
q 0.833
P 0.23
N 100
Step 6: Inference
Z =0.1689, Zc=1.96
As Z < Zc, Accept Ho.
Therefore, we can support the claim at 0.05 LOS. i.e., the dice are fair.
26
Statistical Methods And z score, or z statistic is replaced by a suitable t score, or t statistic.
Testing of Hypothesis
Q.10 individuals are chosen at random from a population and their height
(in inches) is found to be – 63, 63, 64, 65, 66, 69, 69, 70, 70, 71. Find
students t by considering population mean to be 65.
Solution:
Formula-
Given-
N = 10
μ = 65
Given:
μ = 0.050 in
N = 10
= 0.053 in
27
σ x = 0.003 Hypothesis Testing
Propose Hypothesis:
t= 3
At 5% LOS
tc= 2.26
t=3
As t > tc à Reject Ho at 5% LOS
At 1% LOS
tc=3.25
t=3
As t < tc à Accept Ho at 1% LOS
Where,
N1= Sample 1 size
N2= Sample 2 size
σ1 = Population 1 SD
σ2= Population 2 SD
S1= Sample 1 SD
28
Statistical Methods And S2= Sample 2 SD
Testing of Hypothesis
Q. Two samples of sizes 9 and 12 are drawn from two normally
distributed populations having variances 16 and 25 respectively. If the
sample variances are 20 and 8, determine whether the first sample has a
significantly larger variance than the second sample at significance levels
of (a)0.05 (b) 0.01
(F0.95=2.95, F0.99=4.74)
Solution:
Given:
N1 = 9
N2 = 12
σ1^2 = Population 1 variance =16
σ2^2 = Population 2 variance = 25
S1^2 = Sample 1 variance = 20
S2^2 = Sample 2 variance = 8
At 5% LOS
Fc= 2.95
F = 4.03
As F > Fc à We can conclude that the variance of sample 1 is significantly
larger than that for sample 2.
At 1% LOS
Fc =4.74
F = 4.03
As F < Fc à Variance of sample 1 is not larger than that for sample 2.
29
2.9 ANALYSIS OF VARIANCE (ANOVA) Hypothesis Testing
where µ = group mean and k = number of groups. If, however, the one-
way ANOVA returns a statistically significant result, we accept the
alternative hypothesis (HA), which is that there are at least two group
means that are statistically significantly different from each other.
30
Statistical Methods And
Testing of Hypothesis 2.11 TWO-WAY ANALYSIS OF VARIANCE
A two-way ANOVA is used to estimate how the mean of a quantitative
variable changes according to the levels of two categorical variables. Use
a two-way ANOVA when you want to know how two independent
variables, in combination, affect a dependent variable.
Example: You are researching which type of fertilizer and planting
density produces the greatest crop yield in a field experiment. You assign
different plots in a field to a combination of fertilizer type (1, 2, or 3) and
planting density (1=low density, 2=high density), and measure the final
crop yield in bushels per acre at harvest time.
You can use a two-way ANOVA to find out if fertilizer type and planting
density influence average crop yield.
A two-way ANOVA with interaction tests three null hypotheses at the
same time:
There is no difference in group means at any level of the first
independent variable.
There is no difference in group means at any level of the second
independent variable.
The effect of one independent variable does not depend on the effect
of the other independent variable (a.k.a. no interaction effect).
A two-way ANOVA without interaction (a.k.a. an additive two-way
ANOVA) only tests the first two of these hypotheses.
The following columns provide all of the information needed to
interpret the model:
Sum sq is the sum of squares (a.k.a. the variation between the group
means created by the levels of the independent variable and the overall
mean).
Mean sq shows the mean sum of squares (the sum of squares divided
by the degrees of freedom).
F value is the test statistic from the F-test (the mean square of the
variable divided by the mean square of each parameter).
Pr(>F) is the p-value of the F statistic, and shows how likely it is that
the F-value calculated from the F-test would have occurred if the null
hypothesis of no difference was true.
31
2.12 SUMMARY Hypothesis Testing
At the end of this chapter one can draw conclusion based on the data
available. Data will be processed, summarized and results can be
generated and in graphs it will be displayed.
https://www.scribbr.com/statistics/two-way-anova/
*****
32
UNIT III
3
NON-PARAMETRIC TESTS
Unit Structure
3.0 Objective
3.1 Introduction
3.2 Non-Parametric Test Definition
3.3 Need of Non-Parametric Test Definition
3.4 Sign Test
3.5 Wilcoxon‘s Signed Rank Test
3.6 Run Test
3.7 Kruskal-Walis Test
3.8 Post-hoc analysis of one-way analysis of variance:
3.9 Duncan‘s test Chi-square test of association
3.10 Summary
3.11 Unit End Questions
3.12 References for Future Reading
3.0 OBJECTIVE
This type of statistics can be used without the mean, sample size, standard
deviation, or the estimation of any other related parameters when none of
that information is available. Since nonparametric statistics makes fewer
assumptions about the sample data, its application is wider in scope than
parametric statistics.
3.1 INTRODUCTION
A non-parametric test (sometimes called a distribution free test) does not
assume anything about the underlying distribution (for example, that the
data comes from a normal distribution). That‘s compared to parametric
test, which makes assumptions about a population‘s parameters (for
example, the mean or standard deviation); When the word ―non
parametric‖ is used in stats, it doesn‘t mean that you know nothing about
the population. It usually means that you know the population data does
not have a normal distribution.
Sign Test:
The sign test compares the sizes of two groups. It is a non-parametric or
―distribution free‖ test, which means the test doesn‘t assume the data
comes from a particular distribution, like the normal distribution. The
sign test is an alternative to a one sample t test or a paired t test. It can
also be used for ordered (ranked) categorical data. The null hypothesis for
the sign test is that the difference between medians is zero.
Step1: Subtract set 2 from set 1 and put the result in the third
column.
4 positives.
12 negatives.
Step 3: Add up the number of items in the sample and subtract, we get a
difference of zero for (in column 3). The sample size in this question was
17, with one zero, so n = 16.
Step 4: Find the p-value using a binomial distribution table or use a
binomial calculator.
.5 for the probability. The null hypothesis is that there are an equal
number of signs (i.e., 50/50). Therefore, the test is simple binomial
experiment with a .5 chance of the sign being negative and .5 of it
being positive (assuming the null hypothesis is true).
16 for the number of trials.
35
4 for the number of successes. ―Successes‖ here is the smaller of either Non-Parametric Tests
the positive or negative signs from Step 2.
The p-value is 0.038, which is smaller than the alpha level of 0.05. We
can reject the null hypothesis and there is a significant difference.
36
Statistical Methods And Step 1: State the null and alternative hypotheses.
Testing of Hypothesis
H0: The median difference between the two groups is zero.
HA: The median difference is negative. (e.g., the players make less free
throws before participating in the training program)
Step 2: Find the difference and absolute difference for each pair.
Step3:
Step 4: Find the sum of the positive ranks and the negative ranks.
37
Step 5: Reject or fail to reject the null hypothesis. Non-Parametric Tests
The test statistic, W, is the smaller of the absolute values of the positive
ranks and negative ranks. In this case, the smaller value is 29.5. Thus, our
test statistic is W = 29.5.
To determine if we should reject or fail to reject the null hypothesis, we
can reference the critical value found in the Wilcoxon Signed Rank Test
Critical Values Table that corresponds with n and our chosen alpha level.
If our test statistic, W, is less than or equal to the critical value in the
table, we can reject the null hypothesis. Otherwise, we fail to reject the
null hypothesis.
The critical value that corresponds to an alpha level of 0.05 and n = 13
(the total number of pairs minus the two we didn‘t calculate ranks for
since they had an observed difference of 0) is 17.
Since in test statistic (W = 29.5) is not less than or equal to 17, we fail to
reject the null hypothesis
Source: This Question and Solution is taken from the link: How to
Perform the Wilcoxon Signed Rank Test - Statology
38
Statistical Methods And the occurrence of similar events that are separated by events that are
Testing of Hypothesis
different.
Wolfowitz runs test, which was developed by mathematicians
Abraham Wald and Jacob Wolfowitz.
A runs test is a statistical analysis that helps determine the
randomness of data by revealing any variables that might affect data
patterns.
Technical traders can use a runs test to analyze statistical trends and
help spot profitable trading opportunities.
For example, an investor interested in analyzing the price movement
of a particular stock might conduct a runs test to gain insight into
possible future price action of that stock.
A nonparametric test for randomness is provided by the theory of
runs. To understand what a run is, consider a sequence made up of
two symbols, a and b, such as
aa bbb a bb aaaaa bbb aaaa
The problem discussed is from Schaum‘ Outline series by Murray
Spiegel, fouth edition.
In tossing a coin, for example, a could represent ‗‗heads‘‘ and b
could represent ‗‗tails.‘‘ Or in sampling the bolts produced by a
machine, a could represent ‗‗defective‘‘ and b could represent
‗‗nondefective.
A run is defined as a set of identical (or related) symbols contained
between two different symbols or no symbol (such as at the
beginning or end of the sequence).
Proceeding from left to right in sequence (10), the first run, indicated
by a vertical bar, consists of two a‘s; similarly, the second run
consists of three b‘s, the third run consists of one a, etc. There are
seven runs in all.
It seems clear that some relationship exists between randomness and
the number of runs. Thus, for the sequence
a b a b a b a b a b a b
there is a cyclic pattern, in which we go from a to b, back to a again,
etc., which we could hardly believe to be random. In such case we
have too many runs (in fact, we have the maximum number possible
for the given number of a‘s and b‘s). On the other hand, for the
sequence
39
There seems to be a trend pattern, in which the a‘s and b‘s are Non-Parametric Tests
grouped (or clustered) together. In such case there are too few runs,
and we would not consider the sequence to be random. Thus, a
sequence would be considered nonrandom if there are either too
many or too few runs, and random otherwise.
To quantify this idea, suppose that we form all possible sequences
consisting of N1 a‘s and N2 b‘s, for a total of N symbols in all N1 +
N2 = N. The collection of all these sequences provides us with a
sampling distribution: Each sequence has an associated number of
runs, denoted by V. In this way we are led to the sampling
distribution of the statistic V. It can be shown that this sampling
distribution has a mean and variance given, respectively, by the
formulas
40
Statistical Methods And Rank the data from 1 for the smallest value of the dependent variable
Testing of Hypothesis
and next smallest variable rank 2 and so on… (if any value ties, in
that case it is advised to use mid-point), N being the highest variable.
Compute the test statistic
Determine critical value from Chi-Square distribution table
Finally, formulate decision and conclusion
The test statistic for the Kruskal Wallis test denoted as H is given as
follows:
41
Null Hypothesis H0: The distribution of operator scores is same Non-Parametric Tests
Right tailed chi-square test with 95% confidence level, and df =3,
critical χ2 value is 7.815
No Moderate
Heavy smokers
Smokers Smokers
Hypertension 21 36 30
No hypertension 48 26 19
Solution:
Ho: Presence or absence of hypertension is independent of smoking.
H1: Presence or absence of hypertension is dependent of smoking.
No Moderate
Heavy smokers
Smokers Smokers
o
O o
Hypertension 21 36 30 RT1=87
No hypertension 48 26 19 RT2= 93
CT3=49
Total=180 CT1 =69 CT2=62
Total=180
RT=Row Total and CT=Column Total
No Smokers Moderate Heavy
43
E Smokers smokers Non-Parametric Tests
e e
(RT1 x
Hypertension RT1xCT2/Total RT1xCT3/Total
CT1)/Total
No (RT2 x (RT2 x (RT2 x
hypertension CT1)/Total CT2)/Total CT3)/Total
Moderate Heavy
No Smokers
Smokers smokers
E
e e
Hypertension 87*69/180 87*62/180 87*49/180
No hypertension 93*69/180 93*62/180 93*49/180
No Moderate Heavy
Smokers Smokers smokers Total
O O O
Hypertension 21 36 30 87
No
48 26 19 93
hypertension
Total 69 62 49 180
o e (0-e)2/e
21 33.35 4.5734
36 29.967 1.2177
30 23.683 1.6849
48 35.65 4.2780
26 32.033 1.1363
19 25.316 1.5761
= 14.46
χ^2=14.46
χ_tab^2=5.99
As χ^2> χtab^2, Reject H0 at 5% LOS.
Therefore, we can conclude that Presence or absence of hypertension is
dependent of smoking.
The Chi-square test of independence determines whether there is a
statistically significant relationship between categorical variables. It is a
hypothesis test that answers the question—do the values of one
categorical variable depend on the value of other categorical variables?
This test is also known as the chi-square test of association.
44
Statistical Methods And Null hypothesis: There are no relationships between the categorical
Testing of Hypothesis
variables. If one variable is known, it does not help you predict the
value of another variable.
Alternative hypothesis: There are relationships between the
categorical variables. Knowing the value of one variable does help
you predict the value of another variable.
The Chi-square test of association works by comparing the distribution
that you observe to the distribution that you expect if there is no
relationship between the categorical variables.
For a Chi-square test, a p-value that is less than or equal to your
significance level indicates there is sufficient evidence to conclude that
the observed distribution is not the same as the expected distribution. You
can conclude that a relationship exists between the categorical variables.
A Chi-square test of independence to determine whether there is a
statistically significant association between shirt color and deaths. We
need to use this test because these variables are both categorical variables.
Shirt color can be only blue, gold, or red. Fatalities can be only dead or
alive.
The problem discussed is from https://statisticsbyjim.com/hypothesis-
testing/chi-square-test-independence-example/
Eg -The color of the uniform represents each crewmember‘s work area.
We will statistically assess whether there is a connection between uniform
color and the fatality rate.
45
Non-Parametric Tests
Both p-values are less than 0.05. Reject the null hypothesis and there
is a relationship between shirt color and deaths.
3.10 SUMMARY
In statistics, nonparametric tests are methods of statistical analysis that do
not require a distribution to meet the required assumptions to be analyzed
(especially if the data is not normally distributed).
It is also referred to as distribution-free tests. Nonparametric tests serve as
an alternative to parametric tests such as T-test or ANOVA that can be
employed only if the underlying data satisfies certain criteria and
assumptions.
46
Statistical Methods And Given- χ_tab^2=9.49
Testing of Hypothesis
Q2. The PQR Company claims that the lifetime of a type of battery that it
manufactures is more than 250 hours (h). A consumer advocate wishing
to determine whether the claim is justified measures the lifetimes of 24 of
the company‘s batteries; the results are listed below. Assuming the
sample to be random, determine whether the company‘s claim is justified
at the 0.05 significance level. Work the problem first by hand, supplying
all the details for the sign test
Q5. In 30 tosses of a coin the following sequence of heads (H) and tails
(T) is obtained:
HTTHTHHHTHHTTHT
HTHHTHTTHTHHTHT
(a) Determine the number of runs, V.
47
(b) Test at the 0.05 significance level whether the sequence is random. Non-Parametric Tests
Work the problem first by hand, supplying all the details of the runs test
for randomness.
*****
48