0% found this document useful (0 votes)
88 views224 pages

22st202 - P&s Notes - 2024 - Dr. K. Kalyani

The document discusses non-parametric tests, particularly focusing on one-sample tests, their advantages, disadvantages, and types. It details the sign test as a simple non-parametric method for testing population medians, including its assumptions and procedures. Examples illustrate the application of the sign test in real-world scenarios, emphasizing its use when data does not meet the normality assumption required for parametric tests.

Uploaded by

nandinivemula229
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views224 pages

22st202 - P&s Notes - 2024 - Dr. K. Kalyani

The document discusses non-parametric tests, particularly focusing on one-sample tests, their advantages, disadvantages, and types. It details the sign test as a simple non-parametric method for testing population medians, including its assumptions and procedures. Examples illustrate the application of the sign test in real-world scenarios, emphasizing its use when data does not meet the normality assumption required for parametric tests.

Uploaded by

nandinivemula229
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 224

simple calculation compare to sample observations because they generally use ranks One-Sample Tests

parametric tests because they rather than actual values of the sample observations.
generally use ranks rather than
actual values of the sample
observations.

Because of using ranks they In non-parametric tests the required sample size reduces if
are also applicable even if data samples observations are tied as we will see during
is available in ordinal calculation.
measurement scale.

Now, you can answer the following exercise.


E1) Write any three advantages and disadvantages of non-parametric tests
compare to parametric tests?

Types of Non-parametric Tests


After discussing the advantages and disadvantages of the non-parametric tests,
now, you are interesting to know type of non-parametric tests. Broadly, the
non-parametric tests are classified into three categories as:
1. One-sample Tests
2. Two-sample Tests
3. k-sample Tests.

In this unit, we will discuss one-sample tests and other type of tests will be
discussed in subsequent units of this block.
Types of One-sample Tests
Some of the commonly used one-sample tests are listed below:
1. Sign Test
2. Wilcoxon Signed-Rank Test
3. Run Test
4. Kolmogorov-Smirnov Goodness of Fit Test
Let us discuss these tests one by one in subsequent sections.

13.3 SIGN TEST


The sign test is the simplest of the non-parametric tests. It is called the sign test The hypothesis
because this test is based on signs as plus and minus as you will see when we concerning the median
proceed through an example. The sign test is used as an alternative of the t-test cannot be considered as
for testing population median instead of population mean under the non-parametric testing
circumstances when the parent population is not normal. Further it also works problem in strict sense
if data is available in ordinal scale while for t-test we needed data at least in (median is itself a
interval scale. parameter) but it is taken
as a valid non-parametric
Assumptions testing problem since the
This test works under following assumptions: distribution of the test
statistic does not involve
(i) The sample is selected from the population with unknown median. the parent population
(ii) The variable under study is continuous. distribution.

(iii) The variable under study is measured on at least ordinal scale.


7
Non-Parametric Tests Let us discuss general procedure of this test:
Let X1 ,X 2 ,...,X n be a set of n random observations arranged in the order in
which they occur taken from the parent population having unknown median  .
Suppose, we wish to test the hypothesis about the specified value  0 of
population median  . So we can take the null and alternative hypotheses as

H 0 :    0 and H 1 :    0 for two-tailed test 


H 0 :    0 and H1 :    0 
or  for one-tailed test 
H 0 :    0 and H1 :    0 

After setting null and alternative hypotheses, sign test involves following steps:
Step 1: First of all, we convert the given observations into a sequence of plus
and minus signs. For this, we subtract the postulated value of median
( 0 ) from each observation, that is, we obtain the differences X i   0
for all observations and check their signs. Another way, we compare
each observation X i with  0 . If Xi is greater than  0 , that is, Xi   0 ,
then we replace the observation Xi by a plus sign and if Xi is less than
 0 , that is, Xi   0 , then we replace Xi by a minus sign. But when the
observation Xi which is equal to  0 give no information in terms of
plus or minus signs so we exclude all such observations from the
analysis part. Due to such observations our sample size reduces and
let reduced sample size be denoted by n.
Step 2: After that count the number of plus signs and number of minus signs.
Suppose they are denoted by S and S respectively.
Step 3: When null hypothesis is true and the population is dichotomised on
the basis of postulated value of median  0 then we expect that the
number of plus signs (success) and number of minus signs (failure)
approximately equal. And number of plus signs or number of minus
signs follows binomial distribution (n, p = 0.5). For convenient
consider the smaller number of plus or minus signs. If plus sign (S ) is
less than minus sign (S ) then we will take plus sign (S ) as success
and minus sign (S ) as failure. Similarly, if minus sign (S ) is less than
The critical values of this plus sign (S ) then we will assume minus sign (S ) as success and plus
test are not generally in
tabular form and slightly
sign (S ) as failure.
difficult to obtain whereas Step 4: To take the decision about the null hypothesis, we use concept of
p-value is easy to obtained p-value. We have discussed p-value in Unit 9 of this course. For
with the help of cumulative
p-value we determine the probability that test statistic is less than or
binomial table so we use
equal to the calculated value of test statistic i.e. actually observed plus
concept of p-value for take
or minus signs. Since distribution of number of plus or minus signs is
the decision about the null
hypothesis.
binomial (n, p = 0.5) therefore, this probability can be obtained with
the help of Table I given in Appendix at the end of this block which
provide the cumulative binomial probability and compare this
probability with the level of significance (α). Here, test statistic
depends upon the alternative hypothesis so the following cases arise:
8
For one-tailed test: One-Sample Tests

Case I: When H 0 :    0 and H1 :    0 (right-tailed test)


In this case, we expect that number of minus signs (S−) is smaller than
number of plus signs (S+) therefore, the test statistic(S) is the number
of minus signs (S−). The p-value is determined as
p-value = P S  S 
If p-value is less than or equal to α, that is, P S  S     then we
reject the null hypothesis at α level of significance and if the p-value
is greater than α then we do not reject the null hypothesis.
Case II: When H 0 :    0 and H1 :    0 (left-tailed test)
In this case, we expect that number of plus signs (S+) is smaller than
number of minus signs (S−) therefore, the test statistic(S) is the
number of plus signs (S+). The p-value is determined as
p-value = P S  S  Here, we take p-value as
 the probability less than or
If p-value is less than or equal to α, that is, P S  S    then we
equal to observed value of
reject the null hypothesis at α level of significance and if the p-value the test statistic in each
is greater than α then we do not reject the null hypothesis. case because in this test we
consider the smaller sign
For two-tailed test: When H 0 :    0 and H1 :    0 so critical region lies in
lower tail.
For two tailed test, the test statistic(S) is the smaller of number of plus
signs (S+) and minus signs (S−), that is,
S  min S ,S 

and approximate p-value is determined as


p-value = 2P S  S  ; if S is small

p-value = 2P S  S  ; if S  is small


If p-value is less than or equal to α, then we reject the null hypothesis
at α level of significance and if the p-value is greater than α then we
do not reject the null hypothesis.
For large sample  n  20  :
For a large sample size n greater than 20, we use normal
approximation to binomial distribution with mean
1 n
E(S)  np  n   … (1)
2 2
and variance
1 1 n
Var(S)  npq  n    … (2)
2 2 4
Therefore in this case, we use normal test (i.e. Z-test which is
described in Unit 10 of Block 3 of this course). The test statistic of
Z-test is given by
S  E  S
Z
SE  S
9
Non-Parametric Tests n
S
 2 ~ N  0,1 [Using equations (1) and (2)] … (3)
n
4
After that, we calculate the value of test statistic Z and compare it
with the critical value given in Table 10.1 of the Unit 10 of this
course at prefixed level of significance α. Take the decision about the
null hypothesis as described in the Section 10.2 of Unit 10 of this
course.
Note 1: Here, you may shock on the point that we used n > 20 for large sample
instead of n > 30, because binomial distribution with p = 0.5 approximated by
normal distribution less than 30.
Let us do some examples to become more user friendly with this test.
Example 1: The breaking strength (in pounds) of a random sample of 10 ropes
made by a manufacturer is given by
163 165 165 160 171 158 151 162 169 172
Use the sign test to test the manufacturer’s claim that the average breaking
strength of a rope is greater than 160 pounds at 5% level of significance.
Solution: Here, distribution of the population of the breaking strengths of the
ropes is not given. So the assumption of normality for t-test is not fulfilled.
Also sample size is small so we can not use Z-test. So we go for sign test.
Here, we want to test the manufacturer’s claim that the average (median)
breaking strength ( ) of a rope is greater than 160 pounds. So the claim is
  160 and its complement is   160. Since complement contains the
equality sign so we can take the complement as the null hypothesis and the
claim as the alternative hypothesis. Thus,
H 0 :    0  160 and H 1 :   160
Since the alternative hypothesis is right-tailed so the test is right-tailed test.
For applying sign test, we compare each observation with  0 ( 160) and
replacing each observation greater than 160 with a plus sign and each
observation less than 160 with a minus sign and discarding the one observation
which equals to 160, we get
+ + + + − − + + +
By counting, we have
S   number of plus signs  7
S   number of mimus si gns  2
n = total number of plus and minus signs = 9
Since alternative hypothesis is right-tailed so the test statistic (S) is the number
of minus signs (S−), that is,
S = number of minus signs (S−) = 2
Here, n = 9(< 20) so it is a case of small sample. Thus, to take the decision
about the null hypothesis, we determine p-value with the help of Table I given
in Appendix at the end of this block. Here, n = 9, p = 0.5 and r = 2. Thus, we
have
10
p-value = P S  2 One-Sample Tests

 0.0899
Since p-value = 0.0899 > 0.05     . So we do not reject the null hypothesis
and reject the alternative hypothesis i.e. we reject the manufacturer’s claim at
5% level of significance.
Thus, we conclude that the sample provide us sufficient evidence against the
claim so manufacturer’s claim that the breaking strength of a rope is greater
than 160 pounds is not true.
Example 2: An economist believes that the median starting salary for a
computer programmer in a certain city is Rs 12,000. To verify this claim, 12
computer programmers with similar backgrounds who recently got the jobs
were randomly selected. Their starting salaries (in Rs) were 18000, 15000,
12000, 10000, 13000, 12000, 10000, 16000, 11000, 9000, 10000 and 9000. At
1% level of significance give your conclusion where you reached after using
sign test.
Solution: Here, distribution of the salaries of the computer programmers is not
given. So the assumption of normality for t-test is not fulfilled. Also sample
size is small so we can not use Z-test. So we go for sign test.
Here, we want to test that the median starting salary ( ) for a computer
programmer in a certain city is Rs 12000. So our claim is   12000 and it
complement is   12000. Since claim contains the equality sign so we can
take the claim as the null hypothesis and the complement as the alternative
hypothesis. Thus,
H 0 :    0  12000 and H1 :   12000
Since the alternative hypothesis is two-tailed so the test is two-tailed test.
For applying sign test, we replacing each value greater than  0 (  12000) with a
plus sign and each value less than 12000 with a minus sign and discarding the
two values which are equal to 12000, we get
+ + − + − + − − − −
By counting, we have
S   number of plussigns  4
S   number of minus signs = 6
n = total number of plus and minus signs = 10
Since alternative hypothesis is two-tailed so the test statistic(S) is the minimum
of number of plus signs (S+) and minus signs (S−), that is,
S  min S ,S   min 4, 6  4

Here, n =10 (< 20), so it is a case of small sample. Thus, to take the decision
about the null hypothesis, we determine p-value with the help of Table I given
in Appendix at the end of this block. Here, n = 10, p = 0.5 and r = 4. Thus, we
have
p-value  P S  4   0.3770

11
Non-Parametric Tests Since p-value = 0.37707 > 0.01(α) so we do not reject the null hypothesis i.e.
we support the claim at 1% level of significance.
Thus, we conclude that the sample fails to provide us sufficient evidence
against the claim so we may assume that median salary of the newly appointed
computer programmers is Rs 12000.
Example 3: The following data give the milk production (in thousand kg) in
full cream by 40 different dairies:
17 15 20 29 19 19 22 25 27 9
24 20 17 6 24 14 15 23 24 26
19 23 28 19 16 22 24 17 20 13
19 10 23 18 31 13 20 17 24 14
Use the sign test to test that median ( ) production of milk in dairies is 21.5
thousand kg at 1% level of significance.
Solution: Here, distribution of the milk production of the different dairies is
not given. So the assumption of normality for t-test is not satisfied. So we go
for sign test.
Here, we want to test that median production ( ) of milk in dairies is 21.5
thousand kg. So our claim is   21.5 and it complement is   21.5. Since
claim contains the equality sign so we can take the claim as the null hypothesis
and the complement as the alternative hypothesis. Thus,
H 0 :    0  21.5 and H 1 :   21.5
Since the alternative hypothesis is two-tailed so the test is two-tailed test.
For applying sign test, we replacing each value greater than 21.5 (median
value) with a plus sign and each value less than 21.5 with a minus sign, we get
− − − + − − + + + −
+ − − − + − − + + +
− + + − − + + − − −
− − + − + − − − + −
By counting, we have

S  number of plussigns  16

S   number of minus signs = 24


n = total number of plus and minus signs = 40
Since alternative hypothesis is two-tailed so the test statistic (S) is given by
S  min S ,S   min 16, 24  16

Here, n = 40 (> 20), so it is a case of large sample. In this case, we use Z-test.
The test statistic of Z-test is given by
n
S
Z 2 ~ N  0,1
n
4

12
40 One-Sample Tests
16 
 2
40
4
16  20
  1.27
10
The critical (tabulated) values for two-tailed test at 1% level of significance are
± zα/2 = ± z0.005 = ± 2.58.
Since calculated value of Z (= −1.27) is greater than the critical value
(= − 2.58) and less than the critical value (= 2.58), that means it lies in
non-rejection region, so we do not reject the null hypothesis i.e. we support the
claim at 1% level of significance.
Decision according to p-value:
Since test is two-tailed, therefore,

p-value = 2P  Z  z   2P  Z  1.27

 2  0.5  P 0  Z  1.27 

 2  0.5  0.3980   0.204

Since p-value (= 0.204) is greater than ( 0.01) so we do not reject the null
hypothesis i.e. we support the claim at1% level of significance.

Thus, we conclude that the sample fails to provide us sufficient evidence


against the claim so we may assume that the median production of milk in
dairies is 21.5 thousand kg.
Now, you can try the following exercises.

E2) Give one difference between t-test and sign test for one-sample.
E3) A Metropolitan Area Road Transport Authority claims that the average
(median) waiting time on a well travelled rout is 20 minutes. A random
sample of 12 passengers showed weighting times of 22, 13, 17, 14, 25,
26, 19, 20, 22, 30, 10 and15 minutes. Test the Metropolitan Area Road
Transport Authority (MARTA) claim’s that average (median) waiting
time on a well traveled rout is 20 minutes by sign test at 1% level of
significance.
E4) The following data shows the weight (in kg) of a random sample of 30
cadets of a centre:
49 46 57 40 45 57 50 65 34 50
58 46 47 42 53 67 40 52 49 66
53 48 68 40 53 49 61 54 48 38

Use sign test to examine whether the median weight of all the cadets of
the centre is 50 kg at 5% level of significance.

13
Non-Parametric Tests
13.4 WILCOXON SIGNED-RANK TEST
In the previous section, we discussed sign test. We recall that the sign test
utilizes only information whether the difference between each observation and
the postulated value of the median is positive or negative, that is, it considers
only sign of differences. This test is fine if the information about the
observation sample is available in ordinal scale only. But if the measurement of
the observation is available interval or ratio scales then choice of sign test is
not recommended. Because this test do not take into account the information
available in terms of the magnitude of the differences. To overcome this
drawback of the sign test Wilcoxon signed-rank test do the job for us, which
takes into account the information of signs as well as of magnitude of
differences. Since Wilcoxon signed-rank test use more information than sign
test so it is more powerful than the sign test. The Wilcoxon signed-rank test is
also used as an alternative of the t-test.
Assumptions
Wilcoxon signed-rank test requires following assumptions to work:
(i) The sample is selected from the population with unknown median.
(ii) The sampled population is symmetric about its median.
(iii) The variable under study is continuous.
(iv) The variable under study is measured on at least interval scale.
Let us discuss general procedure of this test:
Let X1 , X2 ,..., Xn be a random sample of size n from a continuous symmetric
population. Let  be median of the population. Here, we wish to test the
hypothesis about the specified value  0 of population median  . So we can
take the null and alternative hypotheses as
H 0 :    0 and H1 :    0 for two-tailed test 
H 0 :    0 and H1 :    0 
or   for one-tailed test 
H 0 :    0 and H1 :    0 
After setting null and alternative hypotheses, Wilcoxon signed-rank test
involves following steps:
Step 1: We subtract  0 from each observation and obtain the difference di
with their plus and minus sign as
d i  Xi   0 for all observations
But when the observation Xi equal to  0 give no information in
terms of signs as well as magnitude so we exclude all such
observations from the analysis part. Due to such observations our
sample size reduces and let reduced sample size be denoted by n.
Step 2: After that, we find the absolute value of these di’s obtained in Step 1
as d 1 , d 2 ,..., d n .
Step 3: In this step, we are ranked di ’s (obtained in Step 2) with respect to
their magnitudes from smallest to largest, that is, the rank 1 is given
to the smallest of di ’s, rank 2 is given to the second smallest and so
14
on up to the largest di ’s. If several values are same (tied), we assign One-Sample Tests
each the average of ranks they would have received if there were no For more detailed about
repeated (tied) ranks
repetition. please go through
Step 4: Now assign the signs to the ranks which the original differences have. Section 7.4 of the Unit
2 of MST-002.
Step 5: Finally, we calculate the sum of the positive ranks (T+) and sum of
negative ranks (T−) separately.
Under H0, we expect approximately equal number of positive and
negative ranks. And under the assumption that population under study
is symmetric about its median we expect that sum of the positive
ranks (T+) and sum of negative ranks (T−) are equal.
Step 6: Decision Rule:
To take the decision about the null hypothesis, the test statistic is the
smaller of T+ and T−. And the test statistic is compared with the
critical (tabulated) value for a given level of significance (α) under the
condition that the null hypothesis is true. Table II given in Appendix
at the end of this block provides the critical values of test statistic at α
level of significance for both one-tailed and two-tailed tests. Here, test
statistic depends upon the alternative hypothesis so the following
cases arise:
For one-tailed test:
Case I: When H0 :    0 and H1 :    0 (right-tailed test)
In this case, we expect that sum of negative ranks (T−) is smaller than
sum of positive ranks (T+) therefore, the test statistic (T) is the sum of Here, we consider that if
negative ranks (T−). test statistic less than or
equal to critical value of
If computed value of test statistic (T) is less than or equal to the the test statistic then we
critical value T at α level of significance, that is, T  T then we reject null hypothesis in
each case because in this
reject the null hypothesis at α level of significance, otherwise we do
test we consider the
not reject the null hypothesis. smaller sum so critical
Case II: When H0 :    0 and H1 :    0 (left-tailed test) region lies in lower tail.

In this case, we expect that sum of positive ranks (T+) is smaller than
sum of negative ranks (T−) therefore, the test statistic (T) is the sum of
positive ranks (T+).
If computed value of test statistic (T) is less than or equal to the
critical value T at α level of significance, that is, T  T then we
reject the null hypothesis at α level of significance, otherwise we do
not reject the null hypothesis.
For two-tailed test:
When H0 :    0 and H1 :    0
In this case, the test statistic (T) is the smaller of sum of positive
ranks (T+) and sum of negative ranks (T−), that is,
T  min T ,T 
If computed value of test statistic (T) is less than or equal to the
critical value Tα/2 at α level of significance , that is, T  T / 2 then we
reject the null hypothesis at α level of significance, otherwise we do
not reject the null hypothesis.
15
Non-Parametric Tests For large sample (n > 25):
For a large sample size n greater than 25, the distribution of test
statistic (T) approximated by a normal distribution with mean
n ( n  1)
E T   … (4)
4
and variance
n n  12n  1
Var T   … (5)
24
The proof of mean and variance of T is beyond the scope of this
course.
Therefore in this case, we use normal test (Z-test). The test statistic of
Z-test is given by
T  ET  T  E T 
Z 
SE  T  Var  T 

n  n  1
T
 4 ~ N  0,1  Using equations  … (6)
n  n  1 2n  1  (4) and (5) 
24
After that, we calculate the value of test statistic Z and compare it
with the critical value given in Table 10.1 at prefixed level of
significance α. Take the decision about the null hypothesis as
described in the Section 10.2 of Unit 10 of this course.
Now, it is time to do some examples based on above test.
Example 4: A random sample of 15 children of one month or older shows the
following pulse rates (beats per minute):
119, 120, 125, 122, 118, 117, 126, 114, 115, 126, 121, 120, 124, 127,
126
Assuming that the distribution of pulse rate is symmetric about its median and
continuous, is there evidence to suggest that the median pulse rate of one
month or older children is 120 beats per minute at 5% level of significance?
Solution: Here, distribution of pulse rate is not given. So the assumption of
normality for t-test is not fulfilled although all the assumptions of Wilcoxon
signed-rank test hold. So we go for this test.
Here, we want to test that median pulse rate ( ) of children of one month or
older is 120 beats per minute. So our claim is   120 and it complement is
  120. Since claim contains the equality sign so we can take the claim as the
null hypothesis and the complement as the alternative hypothesis. Thus,
H 0 :    0  120  median pluse rate is equal to 120

H1 :   120  median pluse rate is not equal to 120


Since the alternative hypothesis is two-tailed so the test is two-tailed and the
test statistic is the smaller of sum of positive ranks (T+) and sum of negative
ranks (T−), that is,
T  min T ,T 

16
Calculation for T: One-Sample Tests
S. Beats per Difference Absolute Value Rank of Signed
No. Minute d  X   of Difference d Rank
(X)  X  120 d
1 119 −1 1 1.5 −1.5
2 120 Tie --- --- ---
3 125 5 5 7.5 7.5
4 122 2 2 3.5 3.5
5 118 −2 2 3.5 −3.5
6 117 −3 3 5 −5
7 126 6 6 10.5 10.5
8 114 −6 6 10.5 −10.5
9 115 −5 5 7.5 −7.5
10 126 6 6 10.5 10.5
11 121 1 1 1.5 1.5
12 120 Tie --- --- ---
13 124 4 4 6 6
14 127 7 7 13 13
15 126 6 6 10.5 10.5

From the above calculations, we have


T+ = 7.5 + 3.5 + 10.5 + 10.5 + 1.5 + 6 + 13 + 10.5 = 63.0
T− = 1.5 + 3.5 + 5 + 10.5 + 7.5 = 28.0
n = number of non-zero di’s =13
Putting the values in test statistic, we have
T  min T ,T  min 63.0,28.0  28.0
The critical (tabulated) value for two-tailed test corresponding n = 13 at 5%
level of significance is 18.
Since calculated value of test statistic T (= 28.0) is greater than the critical
value (= 18) so we do not reject the null hypothesis i.e. we support the claim at
5% level of significance.
Thus, we conclude that the sample fails to provide us sufficient evidence
against the claim so we may assume that median pulse rate of children of one
month or older is 120 beats per minute.
Example 5: The following data show the weight (in kg) of a random sample
of 30 cadets of a college:
49 46 57 37 45 57 50 65 34 50 58
46 47 42 53 67 40 52 49 66 53 48
68 40 53 49 61 54 48 38
Assume that distribution of the weight of the cadets is symmetric about its
median, test to examine whether the median weight of all the cadets of the
college is 50 kg by useing Wilcoxon signed-rank at 5% level of significance.
Solution: Here, distribution of weight of the cadets is not given. So the
assumption of normality for t-test is not fulfilled. We are given to us that
distribution of the weight of the cadets is symmetric about its median and
assumption of continuity holds because the characteristic weight is continuous
in nature. So we go for Wilcoxon signed-rank test.

17
Non-Parametric Tests Here, we want to test that median weight ( ) of all the cadets of a college is 50
kg. So our claim is   50 and it complement is   50. Since claim contains
the equality sign so we can take the claim as the null hypothesis and the
complement as the alternative hypothesis. Thus,
H 0 :    0  50  median weight is equal to 50kg

H1 :   50  median weight is not equal to 50kg


Since the alternative hypothesis is two-tailed so the test is two-tailed test and
the test statistic is the smaller of sum of positive ranks (T+) and sum of negative
ranks (T−), that is,
T  min T ,T 
Calculation for T:
S. Weight Difference Absolute Value Rank of Signed
No. (in kg) d  X   of Difference d Rank
(X)  X  50 d
1 49 −1 1 2 −2
2 46 −4 4 12 −12
3 57 7 7 15.5 15.5
4 37 −13 13 23 −23
5 45 −5 5 14 −14
6 57 7 7 15.5 15.5
7 50 Tie --- --- ---
8 65 15 15 24 24
9 34 −16 16 25.5 −25.5
10 50 Tie --- --- ---
11 58 8 8 17.5 17.5
12 46 −4 4 12 −12
13 47 −3 3 8.5 −8.5
14 42 −8 8 17.5 −17.5
15 53 3 3 8.5 8.5
16 67 17 17 27 27
17 40 −10 10 19.5 −19.5
18 52 2 2 5 5
19 49 −1 1 2 −2
20 66 16 16 25.5 25.5
21 53 3 3 8.5 8.5
22 48 −2 2 5 −5
23 68 18 18 28 28
24 40 −10 10 19.5 −19.5
25 53 3 3 8.5 8.5
26 49 −1 1 2 −2
27 61 11 11 21 21
28 54 4 4 12 12
29 48 −2 2 5 −5
30 38 −12 12 22 −22

From the above calculations, we have


T+ = 15.5 + 15.5 + 24 + 17.5 + 8.5 + 27 + 5 + 25.5 + 8.5 + 28 +8.5 + 21+ 12
= 216.5
18
T− = 2 + 12 + 23 + 14 + 25.5 + 12 + 8.5 + 17.5 + 19.5 + 2 + 5 + 19.5 + 2 + 5 + 22 One-Sample Tests

= 189.5
n = number of non-zero di’s = 28
Putting the values in test statistic, we have
T  min T ,T   min 216.5,189.5  189.5
Also n = 28 (> 25) therefore, it is the case of large sample. So in this case, we
use Z-test. The test statistic of Z-test is given by
T  E T 
Z ~ N  0,1 
SE  T 
n(n  1) 28(28  1)
where, E  T     203 and
4 4
n  n  1   2n  1  28  28  1   2  28  1 
SE  T     43.91
24 24
Putting the values in test statistic Z, we have
189.5  203
Z   0.31
43.91
The critical (tabulated) values for two-tailed test at 5% level of significance are
± zα/2 = ± z0.025 = ± 1.96.
Since calculated value of Z (= −0.31) is greater than the critical value
(= − 1.96) and less than the critical value (= 1.96), that means it lies in non-
rejection region, so we do not reject the null hypothesis i.e. we support the
claim.
Thus, we conclude that the sample fails to provide us sufficient evidence
against the claim so we may assume that the median weight of all the cadets of
the college is 50 kg.
Now, you can try the following exercises in the same manner.
E5) Give one main difference between sign test and Wilcoxon signed-rank
test.
E6) The observations of a random sample of size 10 from a distribution
which is continuous and symmetric about its median are given below:
20.2, 24.1, 21.3, 17.2, 19.8, 16.5, 21.8, 18.7, 17.1, 19.9
Use Wilcoxon test to test the hypothesis that the sample is taken from a
population having median greater than 18 at 5% level of significance.
E7) The breaking strength (in pounds) of a random sample of 29 rods made
by a manufacturer is given as follows:
19 30 28 35 23 37 36 32 40 24
31 33 21 36 30 32 26 40 38 28
30 41 22 43 25 28 24 37 17
On the basis of this sample test the manufacturer’s claim that the
average (median) breaking strength of the rope is 30 pounds by
Wilcoxon singed-rank test at 1% level of significance. Assume that
breaking strength of the rods is symmetric about its median.

19
* Cone bation
s eleads coth
d coveotien
data (Smvslwng ot ; ).elamerds
>Ahe eloton ship Gls too
’Here effeels -the-omatho
Uaouable t tndame
tas. a Correlaton
H
Vboiabte
deiatien (Pestive
ypes et omelabon:
postive. Conredion - Sne ire ctien A A()
Coaten. y pasitve may
be
Sides
r hond
toclose Vemy eeey
i oe 2o¢h plels th,e ’f
vit -the toucoom
d's falling
3 |2
2
9 iogpa) Satev *
Speoomon'3
Ronk (S)
Correlbbon
(SR) octhcients kal ()
ouieient peansen's
k lterei of Sstte )
heobsevng byQiaqa
oriebh
Method *
of
'y
latben Come Corels'
to on. Sinea Cureo
anelatog) imeat No- 4.
(inson data -the motrhing btim Come 3.irea
(otadtionOott Nàynive 3-
Kanl peoasen's (aefeient d qormvelaton:
o
(ovl*.9)

No ot acseg
bued praluet

2
4464
69 42 289
58 8).25
26
1222 2-25
84 5) 4284 Syp25
62 15 930 306. 25

45 15 G5 306.2s

214

y=32-5
+153.25

(1 844)

JI48425
I 12 182

(2r96)
* 268-25

l6.848

= 153-5
(9 634)
153-85
199-516
tles blw bey

-there is a 4ve
Cemelaom blo
Veos X latinip.
Cntibution y
Cortibet on|

1 S00 So0

00
o-089
5

300
54289
2 900
50 |21-8099
45 2-4889
525
420
3405
6-53 y= 184-16 05 A-332 16l o20-8

Cov a9 Lay-(9)

-(sv05)- (59)(ey 16)


56 -5- 98|-5+
ov (ay) =-914-25

=y268368
-l63-8I9
286
(7-26(I63-21)

Hene ies b 120.


. s 1
ngetve Comleien blw dreethor
thot meom s devates n oppes te

pen aiptta debla 343,225.

1424
1898 |I6 6066

445
1842
661 1314 B7653744904
-8916 441lo67,630
6, 545

*= 6545
Cov .9)g- 0)
(12 47on)- (130) (t68 32)
2249421-8 - 2,2033082

338

-yIGqct-624y
(ro6t65)
-n8326
-J213526

(u63-o8(p8.3)
YG13
5916)-02
Caimes
X
29 992
382-8
16 96
64 15 960
21 210
|4G6.12
25 61 246-204l
1525
35 20

- yo3 2285-132l95

Y=25.4l ý=21

53-285-589.9 ]
Cov 9) 33-375

T-85
V ) 9e95)
=V826.y9 V313 -54
18.067

38-345 33 345
(1s06)f.4o) 319-7859 tis 4ve
|0.04
Rigesion
Cinvoluing b 2 waibly ) Co7icent.-
ienpu reqreesion Ragrenion
eiuaniahnditerdeyt
dipndant' Cx-x) y atbx
y-= byx
ine

Þ y

bxy . |byx

cov (x9)/

Cov(x,y) Cov(x,y)
bay
(o.o) (66-v2)764-8)(ss35)-
(a)0) Xy- Cov(xiy):
oyx
Covlu,
y)
CX-) byyy-9:
Regrenton
calls. f no
wht grsn
so'F X:when igesten. 8
thu bewould
and
ie the truct ns Gehticient Regrsion h4indd
44-2 5535 62
i313
62
I58.
||
y3-2 |023
82
656 byx
19:5
74
1542
68
416 7
384"2 Tmjonte
(a;-y+ cal X
ND
Tanjetzlne tinusgeny
and calli
Jq2q:2

I|-52
13218

byx 13.2-8

y-y byx (t- )

Y- 88S (o19 )(x- s6: 42)


16o4) +8-85
y ( .19)X -
- 756

T
t x &o'
Further Mathematics AS Unit 2: Further Statistics A

The Exponential Distribution


The exponential distribution with parameter 𝜆 can be used to model the time between two
successive events from a Poisson distribution with mean 𝜆.

Specification Content
• Statistical distributions: exponential distribution
• Find and use the mean and variance of an exponential distribution
1 1
– knowledge and use of: If 𝑌~Exp(𝜆), then 𝐸(𝑌) = 𝜆 and 𝑉𝑎𝑟(𝑌) = 𝜆2 .
• Use the exponential distribution as a model for intervals between events
𝑑
– learners will be expected to know that 𝑑𝑥 (𝑒 𝑘𝑥 ) = 𝑘𝑒 𝑘𝑥 .

Probability Density Function


The probability density function for the exponential distribution is:
−𝜆𝑥
𝑓(𝑥) = { 𝜆𝑒 , 𝑥≥0
0 otherwise.

Here, 𝜆 is a parameter for the distribution, which is constant. If a random variable 𝑋 has this
distribution, we write 𝑋~Exp(𝜆).

Mean and Variance


1 1
The mean of the exponential distribution is 𝜆. The standard deviation is also 𝜆. Therefore, the
1
variance is 𝜆2 . Let 𝑋~Exp(𝜆), then:
1 1 1
𝐸(𝑋) = , Var(𝑋) = , SD(𝑋) = .
𝜆 𝜆2 𝜆

These results can be derived using integration, but are not required for this unit as
knowledge of integration from Mathematics A2 Unit 3, as well as knowledge of improper
integrals from Further Mathematics A2 Unit 4 is needed.

Derivative and integral of 𝑒 𝑘𝑥


For this module, it is important to know the following derivative:
𝑑 𝑘𝑥
(𝑒 ) = 𝑘𝑒 𝑘𝑥 .
𝑑𝑥

Conversely, we have that:


𝑒 𝑘𝑥
∫ 𝑘𝑒 𝑘𝑥 dx = 𝑒 𝑘𝑥 + 𝐶 and ∫ 𝑒 𝑘𝑥 dx = + 𝐶.
𝑘

Cumulative Distribution Function


We derive the cumulative distribution function as follows:
𝑥
𝐹(𝑥) = ∫ 𝜆𝑒 −𝜆𝑡 dt
0
𝑥
𝜆𝑒 −𝜆𝑡
=[ ]
−𝜆 0
𝑥
= [−𝑒 −𝜆𝑡 ]0
= −𝑒 −𝜆𝑥 + 𝑒 0
= 1 − 𝑒 −𝜆𝑥 .

Therefore, the cumulative distribution function is given by:


−𝜆𝑥
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = { 1 − 𝑒 , 𝑥≥0
0 otherwise.

3
Further Mathematics AS Unit 2: Further Statistics A

Calculating Probabilities for Continuous Random Variables


The exponential distribution is an example of a continuous random variable. The rules for
calculating probabilities using continuous random variables are different to those for discrete
random variables. Let 𝑋 be a continuous random variable and let 𝑎, 𝑏, 𝑐 and 𝑑 be constants.
Then:
• 𝑃(𝑋 = 𝑎) = 0
• 𝑃(𝑋 ≥ 𝑏) = 𝑃(𝑋 > 𝑏) = 1 − 𝑃(𝑋 ≤ 𝑏)
• 𝑃(𝑐 ≤ 𝑋 ≤ 𝑑) = 𝑃(𝑐 < 𝑋 ≤ 𝑑) = 𝑃(𝑐 ≤ 𝑋 < 𝑑) = 𝑃(𝑐 < 𝑋 < 𝑑) = 𝑃(𝑋 ≤ 𝑑) − 𝑃(𝑋 ≤ 𝑐).

Worked Example 1
The interval, 𝑋 seconds, between cars passing a point on a motorway follows an exponential
distribution with probability density function
−2𝑥
𝑓(𝑥) = { 2𝑒 , 𝑥≥0
0 otherwise.
(i) State the mean and variance of 𝑋.
(ii) State the cumulative distribution function.
(iii) Calculate the probability that (give all answers to 3 significant figures):
a. The interval until the next car passes is between 1 and 2 seconds.
b. The interval until the next car passes is longer than 3 seconds.
c. The interval until the next car passes is less than 1.5 seconds.
(iv) State a distribution that could be used to model the number of cars passing the
point each second, giving the values of any parameters.

Solution:
(i) This is an exponential distribution with parameter 𝜆 = 2. Therefore, the mean is
1 1 1 1
𝐸(𝑋) = = and Var(𝑋) = 2 = .
𝜆 2 𝜆 4

−2𝑥
(ii) The cumulative distribution function is 𝐹(𝑥) = { 1 − 𝑒 , 𝑥≥0
0 otherwise.

(iii) Each probability can be calculated using the cumulative distribution function.
a. We calculate 𝑃(1 ≤ 𝑋 ≤ 2) as follows:
𝑃(1 ≤ 𝑋 ≤ 2) = 𝑃(𝑋 ≤ 2) − 𝑃(𝑋 ≤ 1)
= 𝐹(2) − 𝐹(1)
= (1 − 𝑒 −4 ) − (1 − 𝑒 −2 )
= 𝑒 −2 − 𝑒 −4
= 0.117.
b. We calculate 𝑃(𝑋 > 3) as follows:
𝑃(𝑋 > 3) = 1 − 𝑃(𝑋 ≤ 3)
= 1 − 𝐹(3)
= 1 − (1 − 𝑒 −6 )
= 𝑒 −6
= 0.00248.
c. We calculate 𝑃(𝑋 < 1.5) as follows:
𝑃(𝑋 < 1.5) = 𝐹(1.5)
= 1 − 𝑒3
= 0.950.

(iv) The Poisson distribution with mean 2 could be used to model the number of cars
passing the point each second.
Note: the exponential distribution says that the mean length of an interval is 0.5
seconds. This could correspond to an average of 2 cars passing the point every
second, as specified in the corresponding Poisson distribution.
4
Further Mathematics AS Unit 2: Further Statistics A

Worked Example 2
The number of potholes on a random 1 km section of rural road has a Poisson distribution
with mean 1.6. Let 𝑋 km be the distance between successive potholes on the road.
(i) Show that, for 𝑥 ≥ 0,
𝑃(𝑋 > 𝑥) = 𝑒 −1.6𝑥 .
(ii) Derive the cumulative distribution function (CDF) and probability density function
(PDF) for 𝑋.
(iii) Calculate the mean and median distance between successive potholes. State which
of the mean or median is largest, and what this suggests about the distribution of the
distance between successive potholes.

Solution:
(i) The number of potholes on a randomly chosen 1 km stretch of rural road can be
modelled by Po(1.6). We know that 𝑋 is the distance between two successive potholes. If
𝑋 > 𝑥, this means there are no faults in the first 𝑥 km. The number of the faults in the first 𝑥
km can be modelled using 𝑌~Po(1.6𝑥).

Hence:
𝑒 −1.6𝑥 × (1.6𝑥)0
𝑃(𝑋 > 𝑥) = 𝑃(𝑌 = 0) = = 𝑒 −1.6𝑥 .
0!

(ii) The CDF can be obtained using 𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = 1 − 𝑃(𝑋 > 𝑥) = 1 − 𝑒 −1.6𝑥 .

Therefore:
−1.6𝑥
𝐹(𝑥) = { 1 − 𝑒 , 𝑥≥0
0 otherwise.

The PDF can then be derived by differentiating, i.e.


𝑓(𝑥) = 𝐹 ′ (𝑥) = 0 − (−1.6𝑒 −1.6𝑥 ) = 1.6𝑒 −1.6𝑥 .

Hence:
−1.6𝑥
𝑓(𝑥) = { 1.6𝑒 , 𝑥≥0
0 otherwise.
1 1
(iii) The mean distance is 𝜆 = 1.6 = 0.625 km = 625 m.

The median, 𝑚, satisfies 𝐹(𝑚) = 0.5. Hence:


1 − 𝑒 −1.6𝑚 = 0.5
𝑒 −1.6𝑚 = 0.5
−1.6𝑚 = ln 0.5
ln 0.5
𝑚=
−1.6
𝑚 = 0.433.
Therefore, the median distance is 433 m.

The median distance of 433 m is smaller than the mean distance of 625 m. This suggests
that the distribution of the distance between successive potholes is skewed to the right
(positively skewed).

5
Further Mathematics AS Unit 2: Further Statistics A

Worked Example 3
A monitor issues a warning signal when an action is needed as part of a production process.
The interval, 𝑋 hours, between successive signals follows an exponential distribution with
parameter 0.08.
(i) Find the probability that the interval between the next two signals is:
a. Between 10 and 20 hours;
b. Less than two hours;
c. Longer than 50 hours.
(ii) State the mean and standard deviation of the intervals between successive
signals.
(iii) Following a warning signal, what is the longest time the production process could
be left unsupervised whilst ensuring the probability of missing the next signal is
less than 0.01?

Solution:
(i) We have an exponential distribution with parameter 𝜆 = 0.08. The cumulative
−0.08𝑥
distribution function is given by 𝐹(𝑥) = { 1 − 𝑒 , 𝑥≥0
0 otherwise.

a. 𝑃(10 ≤ 𝑋 ≤ 20) = 𝐹(20) − 𝐹(10) = (1 − 𝑒 −1.6 ) − (1 − 𝑒 −0.8 ) = 0.247.


b. 𝑃(𝑋 < 2) = 𝐹(2) = 1 − 𝑒 −0.16 = 0.148.
c. 𝑃(𝑋 > 50) = 1 − 𝑃(𝑋 ≤ 50) = 1 − 𝑒 −4 = 0.0183.
1 1 1
(ii) 𝐸(𝑋) = 𝜆 = 0.08 = 12.5 hours and SD(𝑋) = 𝜆 = 12.5 hours.

(iii) We wish to find the time, 𝑡, for which:


𝑃(𝑋 < 𝑡) = 0.01
1 − 𝑒 −0.08𝑡 = 0.01
𝑒 −0.08𝑡 = 0.99
−0.08𝑡 = ln 0.99
ln 0.99
𝑡=
−0.08
𝑡 = 0.1256 hours
𝑡 = 7.54 minutes.
Therefore, the production process should be left for no longer than 7.54 minutes
to ensure the probability of missing a signal is less than 0.01.

Summary of Key Points for the Exponential Distribution


If 𝑋~Exp(𝜆), then:
1. The probability density function 𝑓(𝑥) is:
−𝜆𝑥
𝑓(𝑥) = { 𝜆𝑒 , 𝑥≥0
0 otherwise.

2. The mean, variance and standard deviation are:


1 1 1
𝐸(𝑋) = , Var(𝑋) = 2 , SD(𝑋) = .
𝜆 𝜆 𝜆

3. The cumulative distribution function 𝐹(𝑥) is:


−𝜆𝑥
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = { 1 − 𝑒 , 𝑥≥0
0 otherwise.

Useful Results:
𝑑 𝑘𝑥 𝑒 𝑘𝑥
(𝑒 ) = 𝑘𝑒 𝑘𝑥 , ∫ 𝑘𝑒 𝑘𝑥 dx = 𝑒 𝑘𝑥 + 𝐶, ∫ 𝑒 𝑘𝑥 dx = + 𝐶.
𝑑𝑥 𝑘
6
weibull d'stri buion :

totherwise

Remak :
weibulL dsibtion d onventedd
hen a l ,
ento exþorendia lsbitton

LycDF ok ueibwl distu | -e


-CHA)
Mean of ueibul dst

Mean E(x) : (1+)

L> when
decreasing ature grate
Constant

when
ak the Lie tlme Rours a
spring
has Cumtlatne olist fane
oþeathd
Fex)
; othwise

0.001 Gnd
aing savinl3okr.
probabiliby that new
@ The
paobability that a
o he additional
ustttout taiwe fon 5okss ond savives
3oh

30)
pCx> ) |-p(re

|-(-e
- (oo0! (a6>))

|- p cso)

p(s) i- P%<5o)

| -f(5U)
((-eoo01(&o))")
1-

0q936

0q475

You might also like