0% found this document useful (0 votes)
16 views

Stat Module I

Uploaded by

Biniyam Gizaw
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Stat Module I

Uploaded by

Biniyam Gizaw
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 102

CHAPTER ONE

SAMPLING AND SAMPLING DISTRIBUTION

Introduction

Dear Students! Welcome to Module I's first chapter.

A sample refers to a smaller, manageable version of a larger group. It is a subset


containing the characteristics of a larger population. Sampling is a process in statistical
analysis where researchers take a predetermined number of observations from a larger
population. The method of sampling depends on the type of analysis being performed.
Samples are use in statistical testing when population sizes are too large for the test to
include all possible members or observations. Sampling has lower costs and faster data
collection than measuring the entire population and can provide insights in cases where it
is infeasible to sample an entire population.

In this regard, this chapter covers two main topics such as sampling theory and sampling
distributions. The chapter includes a definition of sampling as well as the rationale for
selecting a sampling survey over a census survey. The classification of sampling
techniques and their subclassifications has also been discussed. Basic statistical
characteristics of sample distributions, sample means, and sample proportions have also
been introduced.

Objectives of the Chapter

After studying this Chapter, students should be able to:

 understand the meaning of sampling and census survey

 explain the difference and the nature of sampling techniques

 obtain knowledge about sampling distribution of sample mean and sample


proportion

Statistics For Management II Page 1


 know the difference between when to use t distribution and Z distribution.

1.1. Sampling Theory

1.1.1. Basic Definitions

Dear student! Data can be obtained from existing sources or from surveys and
experimental studies designed to collect new data through census (total enumeration of
the population) or sampling. Instant population refers to all items that have been chosen
for study. While sample refers to a portion or subset of the population selected. Some
basic concepts of sampling are discussed below.

Population: Total of items about which information is desired. In other population or


universe means, the entire mass of observations, which is the parent group from which a
sample is to be formed.

The term population or universe conveys a different meaning than a traditional one. In
census survey, the count of individuals (men, women and children) is known as
population. But in research methodology population means the characteristics of a
specific group. For example, secondary school teachers of East Hararge Zone, who have
some specific features (teaching experience, male and female, academic qualification.
teaching attitudes, teaching aptitude etc.). Another example, high school students of
Harar town who have some specific characteristics (age group, boys and girls personality,
scholastic aptitude, academic motivation etc.).

Population can be classified into two categories- finite and infinite. The population is said
to be finite if it consists of a fixed number of elements so that it is possible to enumerate
in its totality. Examples of finite population are the populations of a city, the number of
workers in a factory, etc. An infinite population is that population in which it is
theoretically impossible to observe all the elements. In an infinite population the number
of items is infinite. Example of infinite population is the number of stars in sky. From
practical consideration, we use the term infinite population for a population that cannot
be enumerated in a reasonable period of time.

Statistics For Management II Page 2


Sample: It is part of the population that represents the characteristics of the population. A
sample refers to a smaller, manageable version of a larger group. It is a subset containing
the characteristics of a larger population. Samples are used in statistical testing when
population sizes are too large for the test to include all possible members or observations.
A sample should represent the population as a whole and not reflect any bias toward a
specific attribute. Figure 1.1 below shows population and sample as well as their
relationship graphically.

Figure 1.1. Population and Sample

Sampling: It is the process of selecting the sample for estimating the population
characteristics. In other words, it is the process of obtaining information about an entire
population by examining only a part of it. Sampling is selecting sampler (or part of the
items from the population) from populations. Mathematically, we can describe samples
and populations by using measures such as the mean, median, mode, and standard
deviation. When these terms describe the characteristics of a sample, they are called
statistics. When they describe the characteristics of a population, they are referred
parameters.

Sampling Unit: Elementary units or group of such units which besides being clearly
defined, identifiable and observable, are convenient for purpose of sampling are called
sampling units. For instance, in a family budget enquiry, usually a family is considered as
the sampling unit since it is found to be convenient for sampling and for ascertaining the
required information. In a crop survey, a farm or a group of farms owned or operated by a
household may be considered as the sampling unit.

Statistics For Management II Page 3


Sampling Frame: A list containing all sampling units is known as sampling frame.
Sampling frame consists of a list of items from which the sample is to be drawn.

Sample Survey: An investigation in which elaborate information is collected on a sample


basis is known as sample survey.

Parameter: Characteristics of the population. For example, population mean, proportion,


variance, standard deviation, etc.

Statistic: Characteristics of the sample. For example, sample Mean, proportion, etc.

If we are convinced that the sample statistics are accurate estimate of the population
characteristics, we could use sample statistics to estimate the population parameter
without measuring the entirety of the items under study. In order to be consistent,
tacticians use lower case roman letters to denote sample statistics, and Greek or capital
letters to denote population parameters. Table 1.1 below reveals summaries of the
definitions and the symbols.

Table 1.1 Summary of the difference between populations and Samples

Population Sample
Definition Collection of all items being dealt Subjects of the population
Characteristics Parameter Statistic
Symbols Population Size = N Sample size = n
Population Size =  Sample size = x
Target Population:Population standard Deviation
A target population =  group
is the entire Sample standard
about whichDeviations = s is
information
desired and conclusion is made.

Sampled Population: The population, which we actually sample, is the sampled


population. It is also called survey population. Figure 1.2 below shows the relationship
between target population and sample population graphically.

Statistics For Management II Page 4


Figure 1.2. Target Population and Sample Population

Sampling With and Without Replacement: Sampling schemes may be without


replacement ('WOR' - no element can be selected more than once in the same sample) or
with replacement ('WR' – an element may appear multiple times in the one sample). For
example, if we catch fish, measure them, and immediately return them to the water before
continuing with the sample, this is a WR design, because we might end up catching and
measuring the same fish more than once. However, if we do not return the fish to the
water (e.g. if we eat the fish), this becomes a WOR design.

Sample Design: Sample design refers to the plans and methods to be followed in
selecting sample from the target population and the estimation technique formula for
computing the sample statistics. These statistics are the estimates used to infer the
population parameters.

Figure 1.3. Sampling Breakdown.

1.1.2. The Need for Sampling

Statistics For Management II Page 5


Dear learner! In the previous section of this chapter we have discussed basic concepts of
population and sample. Although there are many advantages with the census method, the
cost, effort and the time required to conduct census survey is very large, unless the
population is very small, and in many cases it is so prohibitive that one rarely uses this
method in surveys. Sampling involves an examination of a small portion of the
elementary units in a population. Samples are used in a variety of settings where research
is conducted. Scientists, marketers, government agencies, economists, and research
groups are among those who use samples for their studies and measurements. Using
whole populations for research comes with challenges. Researchers may have problems
gaining ready access to entire populations. And, because of the nature of some studies,
researchers may have difficulties getting the results they need in a timely fashion. This is
why people samples are used. Using a smaller number of people who represent the entire
population can still produce valid results while reducing time and resources.

Although, a census operation gives a more reliable data, sampling method is more desired
when:
 the population is very large, i.e., infinite and it would be impossible to conduct
census surveys;
 when quick results are required it would be appropriate to conduct sample surveys
rather than census surveys;
 in studies involving destruction of the elementary units under study, it would only
be appropriate to go for sample testing. Items such as light bulbs and ammunition
often must be destroyed as a part of testing process;
 cost of conducting surveys would be very prohibitive in census method, and
therefore, it is advisable to carry out a sample survey, and lastly; and
 sometimes accuracy may be lost because of the large size of the population.
Sampling involves a small portion of the population and therefore, would involve
very few people for conducting surveys and for data collection and compilation.
This would not be so in the census method and the chances of committing errors
would increase.

Statistics For Management II Page 6


Activities
- How do you describe the term sample and sampling?

- What are the reasons for studding and implementing sampling?

1.1.3. Stages of Sampling Process

Dear learners! In order to answer the research questions, it is doubtful that the
researcher should be able to collect data from all cases. Thus, there is a need to select a
sample. The entire set of cases from which a researcher's sample is drawn is called the
population. Since researchers neither have the time nor the resources to analyze the entire
population, they apply sampling techniques to reduce the number of cases. The sampling
process comprises several stages:

1. Define the population.


2. Specifying the sampling frame.
3. Specifying the sampling unit.
4. Selection of the sampling method.
5. Determination of sample size.
6. Specifying the sampling plan.
7. Selecting the sample.

Define the Population: Population must be defined in terms of elements, sampling units,
extent and time. Because there is very rarely enough time or money to gather information
from everyone or everything in a population, the goal becomes finding a representative
sample (or subset) of that population.

Sampling Frame: As a remedy, we seek a sampling frame which has the property that
we can identify every single element and include any in our sample. The most
straightforward type of frame is a list of elements of the population (preferably the entire
population) with appropriate contact information. A sampling frame may be a telephone

Statistics For Management II Page 7


book, a city directory, an employee roster, a listing of all students attending a university,
or a list of all possible phone numbers.

Sampling Unit: A sampling unit is a basic unit that contains a single element or a group
of elements of the population to be sampled. The sampling unit selected is often
dependent upon the sampling frame. If a relatively complete and accurate listing of
elements is available (e.g. register of purchasing agents) one may well want to sample
them directly. If no such register is available, one may need to sample companies as the
basic sampling unit.

Sampling Method: The sampling method outlines the way in which the sample units are
to be selected. The choice of the sampling method is influenced by the objectives of the
research, availability of financial resources, time constraints, and the nature of the
problem to be investigated. All sampling methods can be grouped under two distinct
heads, that is, probability and non-probability sampling.

Sample Size: The sample size calculation depends primarily on the type of sampling
designs used. However, for all sampling designs, the estimates for the expected sample
characteristics (e.g. mean, proportion or total) desired level of certainty, and the level of
precision must be clearly specified in advanced. The statement of the precision desired
might be made by giving the amount of error that we are willing to tolerate in the
resulting estimates. Common levels of precisions are 5% and 10%.

Sampling Plan: In this step, the specifications and decisions regarding the
implementation of the research process are outlined. As the interviewers and their co-
workers will be on field duty of most of the time, a proper specification of the sampling
plans would make their work easy and they would not have to reverting operational
problems.

Select the Sample: The final step in the sampling process is the actual selection of the
sample elements. This requires a substantial amount of office and fieldwork, particularly
if personal interviews are involved.

Statistics For Management II Page 8


1.1.4. Sampling and Non-sampling Errors

Dear student! Two major types of error can arise when a sample of observations is taken
from a population: sampling error and non-sampling error. Anyone reviewing the results
of sample surveys and studies, as well as statistics practitioners conducting surveys and
applying statistical techniques, should understand the sources of these errors.

Sampling Error

Sampling error refers to differences between the sample and the population that exists
only because of the observations that happened to be selected for the sample. Sampling
error is an error that we expect to occur when we make a statement about a population
that is based only on the observations contained in a sample taken from the population.

To illustrate, suppose that we wish to determine the mean annual income of North
American blue-collar workers. To determine this parameter we would have to ask each
North American blue-collar worker what his or her income is and then calculate the mean
of all the responses. Because the size of this population is several million, the task is both
expensive and impractical. We can use statistical inference to estimate the mean income
of the population if we are willing to accept less than 100% accuracy. We record the
incomes of a sample of the workers and find the mean of this sample of incomes. This
sample mean is an estimate, of the desired, population mean. But the value of the sample
mean will deviate from the population mean simply by chance because the value of the
sample mean depends on which incomes just happened to be selected for the sample. The
difference between the true (unknown) value of the population mean and its estimate, the
sample mean, is the sampling error. The size of this deviation may be large simply
because of bad luck-bad luck that a particularly unrepresentative sample happened to be
selected. The only way we can reduce the expected size of this error is to take a larger
sample.

Given a fixed sample size, the best we can do is to state the probability that the sampling
error is less than a certain amount. It is common today for such a statement to accompany
the results of an opinion poll. If an opinion poll states that, based on sample results, the

Statistics For Management II Page 9


incumbent candidate for mayor has the support of 54% of eligible voters in an upcoming
election, the statement may be accompanied by the following explanatory note: “This
percentage is correct to within three percentage points, 19 times out of 20.” This
statement means that we estimate that the actual level of support for the candidate is
between 51% and 57%, and that in the long run this type of procedure is correct 95% of
the time.

Non-sampling Error

Non-sampling error is more serious than sampling error because taking a larger sample
won’t diminish the size, or the possibility of occurrence, of this error. Even a census can
(and probably will) contain non-sampling errors. Non-sampling errors result from
mistakes made in the acquisition of data or from the sample observations being selected
improperly.

Errors in data acquisition - This type of error arises from the recording of incorrect
responses. Incorrect responses may be the result of incorrect measurements being taken
because of faulty equipment, mistakes made during transcription from primary sources,
inaccurate recording of data because terms were misinterpreted, or inaccurate responses
were given to questions concerning sensitive issues such as sexual activity or possible tax
evasion.

Non-response error - Non-response error refers to error (or bias) introduced when
responses are not obtained from some members of the sample. When this happens, the
sample observations that are collected may not be representative of the target population,
resulting in biased results.

Non-response can occur for a number of reasons. An interviewer may be unable to


contact a person listed in the sample, or the sampled person may refuse to respond for
some reason. In either case, responses are not obtained from a sampled person, and bias is
introduced. The problem of non-response is even greater when self-administered
questionnaires are used rather than an interviewer, who can attempt to reduce the non-
response rate by means of callbacks.

Statistics For Management II Page 10


Selection bias - Selection bias occurs when the sampling plan is such that some members
of the target population cannot possibly be selected for inclusion in the sample.

Activities
- What are the main stages of sampling process?
- What is sampling error?
- What is non - sampling error?

1.1.5. Types of Sampling

Dear learner! In statistics, there are two methods of selecting samples from populations:
Random or probability sampling, and Non-random, non-probability or judgment
sampling.

(I) Probability (Random) Sampling: - is sampling when all items (i.e., each
element) in the population have a chance of being chosen in the sample and the
probability of each element of the population included in the sample is known.
There are several probabilities sampling technique that will be discussed later.

(II) Non-probability (Non-random) sampling: - is a sampling methodology


where personal knowledge and opinion play major role in identifying which
elements of the population are to be included in the sample, and the probability
of an element from the population to be included in the sample is not known.
Just like the probability sampling, the non-probability sampling has various
techniques of selecting a sample that will be discuss later.

Probability Sampling

There are a number of techniques of taking probability sample. But here only four
important techniques have been discussed as follows:
1. Simple random sampling.
2. Systematic sampling.
3. Stratified sampling.
4. Cluster sampling.

Statistics For Management II Page 11


1. Simple Random Sampling

Simple Random Sampling: - is selecting samples so that each possible sample has an
equal chance of being picked, and each element in the population has the same
probability of being included in the sample and is independent of whether some other
element is chosen. Example: Suppose that a restaurant has four branches (N, S, E and W)
and that it wants to select samples of two branches at a time in order to evaluate the
operation of the branches. Using simple random sample there are six different samples of
size 2 that can be drawn from the population (i.e., the four branches). These six samples
are (NS); (NE); (NW); (SE); (SW); and (EW). The probability of each sample is 1/6 to be
selected from the population and the probability of an element in the sample is ½.

In another understanding a simple random sample is one in which each element of the
population has an equal and independent chance of being included in the sample i.e., a
sample selected by randomization method is known as simple-random sample and this
technique is simple random-sampling. A randomization is a method and is done by using
a number of techniques such as: tossing a coin, throwing a dice, lottery method, blind
folded method and random table of ‘Tippett’s Table’.

Advantages
(a) It requires a minimum knowledge of population.
(b) It is free from subjectivity and free from personal error.
(c) It provides appropriate data for our purpose.
(d) The observations of the sample can be used for inferential purpose.

Disadvantages
(a) The representativeness of a sample cannot be ensured by this method.
(b) This method does not use the knowledge about the population.
(c) The inferential accuracy of the finding depends upon the size of the sample.

2. Systematic Sampling

Systematic sampling is an improvement over the simple random sampling. This method
requires the complete information about the population. There should be a list of

Statistics For Management II Page 12


information of all the individuals of the population in any systematic way. Now we
decide the size of the sample.

Let sample size = n and population size = N. Now we select each N/nth individual from
the list and thus we have the desired size of sample which is known as systematic sample.
Thus, for this technique of sampling population should be arranged in any systematic
way.

Illustration: - Suppose that there are 1000 resident or households in one Keblle with
different income levels. If the statistician/researcher has the list of all households
randomly listed and wants to study the income disparity in that Kebelle by taking 50
samples. Since there are 1000 households the sampling can be accomplished by taking
1000
every 20th household on the list [ ]. To determine which of the first 20 elements to
50
being with the statistician/researcher can randomly chose a number from 1 to 20. Once
this number is chosen (let’s say 3), then the statistician selects the 3 rd, 23rd, 33rd, 43rd,
households from the list. Such kind of sampling is systematic sampling.

Often systematic sampling is regarded as identical as the simple random sampling. This is
true only if the elements of the population are in random order on the list. This means the
elements of the population are in random order on the list. This means the elements of the
population on the list are not in a sort of periodicity or any other type of pattern on the
list.

Advantages
(a) This is a simple method of selecting a sample.
(b) It reduces the field cost.
(c) Inferential statistics may be used.
(d) Sample may be comprehensive and representative of population.
(e) Observations of the sample may be used for drawing conclusions and
generalizations.

Disadvantages

Statistics For Management II Page 13


(a) This is not free from error, since there is subjectivity due to different ways of
systematic list by different individuals. Knowledge of population is essential.
(b) Information of each individual is essential.
(c) This method can’t ensure the representativeness.
(d) There is a risk in drawing conclusions from the observations of the sample.

3. Stratified Sampling

Stratified Sampling is a sampling technique in which the population is divided in to strata


and random sample is taken from the elements in each stratum. The basic idea in
formulating strata is to sub divide the population in to a relatively homogenous groups
within the strata, and subdivide the population so that relatively greater variations or
heterogeneity exists with regard to the characteristics measured between strata (or sub
divisions).

It is an improvement over the earlier method. When employing this technique, the
researcher divides his population in strata on the basis of some characteristics and from
each of these smaller homogeneous groups (strata) draws at random a predetermined
number of units. Researcher should choose that characteristic or criterion which seems to
be more relevant in his research work.

Illustration: If a researcher wants to deal with the income inequality situation in Adama
city. The researcher can divide the households in to different groups. As follows:
o Civil Servant
o Merchant
o Petty Traders & local drink sellers

Stratified sampling may be of three types:


 Disproportionate stratified sampling.
 Proportionate stratified sampling.
 Optimum allocation stratified sampling.

Statistics For Management II Page 14


Disproportionate sampling means that the size of the sample in each unit is not
proportionate to the size of the unit but depends upon considerations involving personal
judgment and convenience. This method of sampling is more effective for comparing
strata which have different error possibilities. It is less efficient for determining
population characteristics.

Proportionate sampling refers to the selection from each sampling unit of a sample that
is proportionate to the size of the unit. Advantages of this procedure include
representativeness with respect to variables used as the basis of classifying categories and
increased chances of being able to make comparisons between strata. Lack of information
on proportion of the population in each category and faulty classification may be listed as
disadvantages of this method.

Optimum allocation stratified sampling is representative as well as comprehensive


than other stratified samples. It refers to selecting units from each stratum should be in
proportion to the corresponding stratum the population. Thus, sample obtained is known
as optimum allocation stratified sample.

Advantages of Stratifying Sampling


(a) It is (more precisely third way) a good representative of the population.
(b) It is an improvement over the earlier.
(c) It is an objective method of sampling.
(d) Observations can be used for inferential purpose.

Disadvantages Stratifying Sampling


(a) Serious disadvantage of this method is that it is difficult for the researcher to decide
the relevant criterion for stratification.
(b) Only one criterion can be used for stratification, but it generally seems more than
one criterion relevant for stratification.
(c) It is costly and time-consuming method.
(d) Selected sample may be representative with reference to the used criterion but not
for the other.
(e) There is a risk in generalization.

Statistics For Management II Page 15


4. Cluster Sampling

Cluster Sampling: - is sampling in which one divides the elements in the population in
to a number of clusters or groups. One then begins by choosing at random a sample of
these clusters, after which a simple random sample of the elements in each chosen cluster
is selected. Sometimes, this is referred as two stage cluster sampling. To select the intact
group as a whole is known as a Cluster sampling. In Cluster sampling the sample units
contain groups of elements (clusters) instead of individual members or items in the
population.

Illustration: Still taking the study of the income disparity condition in Adama. In this
case, the Adama city will be classified by locality (i.e., in to Northern, southern part of
Adama, etc.). Once the city is classified in to various clusters, randomly some of the
clusters (i.e., locality in our case) will be chosen and the researcher randomly selects
elements from the chosen cluster.

Advantages
(a) It may be a good representative of the population.
(b) It is an easy method.
(c) It is an economical method.
(d) It is practicable and highly applicable in education.
(e) Observations can be used for inferential purpose.

Disadvantages
(a) Cluster sampling is not free from error.
(b) It is not comprehensive.

Non-probability Sampling Techniques

1. Incidental or Accidental Assignment

The term incidental or accidental applied to those samples that are taken because they are
most frequently available, i.e., this refers to groups which are used as samples of a

Statistics For Management II Page 16


population because they are readily available or because the researcher is unable to
employ more acceptable sampling methods.

Advantages
(a) It is very easy method of sampling.
(b) It is frequently used in behavioral sciences.
(c) It reduces the time, money and energy i.e., it is an economical method.

Disadvantages
(a) It is not a representative of the population.
(b) It is not free from error.
(c) Parametric statistics cannot be used.

2. Judgment Sampling

This involves the selection of a group from the population on the basis of available
information thought. It is to be representative of the total population. Or the selection of a
group by intuition on the basis of criterion deemed to be self-evident. Generally
investigator should take the judgment sample so this sampling is highly risky.

Advantages
(a) Knowledge of the investigator can be best used in this technique of sampling.
(b) This technique of sampling is also economical.

Disadvantages
(a) This technique is objective.
(b) It is not free from error.
(c) It includes uncontrolled variation.
(d) Inferential statistics cannot be used for the observations of this sampling, so
generalization is not possible.

3. Purposive Sampling

Statistics For Management II Page 17


The purposive sampling is selected by some arbitrary method because it is known to be
representative of the total population, or it is known that it will produce well matched
groups. The Idea is to pick out the sample in relation to some criterion, which are
considered important for the particular study. This method is appropriate when the study
places special emphasis upon the control of certain specific variables.

Advantages
(a) Use of the best available knowledge concerning the sample subjects.
(b) Better control of significant variables.
(c) Sample groups data can be easily matched.
(d) Homogeneity of subjects used in the sample.

Disadvantages
(a) Reliability of the criterion is questionable.
(b) Knowledge of population is essential.
(c) Errors in classifying sampling subjects.
(d) Inability to utilize the inferential parametric statistics.
(e) Inability to make generalization concerning total population.

4. Quota Sampling

This combined both judgment sampling and probability sampling. The population is
classified into several categories: on the basis of judgment or assumption or the previous
knowledge, the proportion of population falling into each category is decided. Thereafter
a quota of cases to be drawn is fixed and the observer is allowed to sample as he likes.
Quota sampling is very arbitrary and likely to figure in Municipal surveys.

Advantages
(a) It is an improvement over the judgment sampling.
(b) It is an easy sampling technique.
(c) It is most frequently used in social surveys.

Disadvantages
(a) It is not a representative sample.

Statistics For Management II Page 18


(b) It is not free from error.
(c) It has the influence of regional geographical and social factors.

Since research design is a plan by which research samples may be selected from a
population and under which experimental treatments are administered and controlled so
that their effect upon the sample may be measured. Therefore, a second step in the
establishment of an experimental design is to select the treatments that will be used to
control sources of learning change in the sample subjects.

Activities
- What is probability and non-probability sampling?

- Explain types of probability and non-probability sampling?

- List down advantages and disadvantages of each type of probability and non-
probability sampling?

1.2. Sampling Distribution

Dear student! So far, we have examined how samples can be taken from population.
Using one of the already discussed samples technique if we take several samples from a
population, the statistics of we would compute for each sample need not be the same and
most likely would vary from sample to sample. In this sub-topic we will discuss about
sampling distribution. Sampling distribution is a probability distribution of all the
values of sample statistics. We do have sampling distribution of the mean, proportion etc.
A sampling distribution is created by, as the name suggests, sampling. There are two
ways to create a sampling distribution. The first is to actually draw samples of the same
size from a population, calculate the statistic of interest, and then use descriptive
techniques to learn more about the sampling distribution. The second method relies on
the rules of probability and the laws of expected value and variance to derive the
sampling distribution.

1.2.1. Sampling Distribution of the Mean

Statistics For Management II Page 19


Sampling Distribution of the mean: - is the probability distribution of the sample mean.
To illustrate, we have taken samples from a population and computed mean values for
each sample is referred as sampling distribution of the mean as follows.

Illustration 1

Suppose that a population has five elements (N = 5) 3, 6, 9, 12 and 15. If we draw


samples of 3 (n = 3).

Required:

 Formulate sampling distribution of x

 Estimate population mean

 Estimate mean of the distribution

 Compute standard deviation of the distribution

Solution

First find number samples:

N∁n = 5∁3 = 10

The following may be the elements in the sample.

Samples 3, 6, 9 3,6,12 3,6,15 3,9,12 6, 12, 15

3,9,15 3, 12, 15 6,9,12 6,9,15 9, 12, 15

For each sample we can complete the mean value (i.e., the sample statistics). The
following table reveals the mean value for each sample.

Samples Mean ( x )
3, 6, 9 6
3, 6, 12 7

Statistics For Management II Page 20


3, 6, 55 8
3, 9, 12 8
3, 9, 15 9
3, 12, 15 10
6, 9, 12 9
6, 9, 15 10
6, 12, 15 11
9, 12, 15 12
∑ x =90
We know that the population mean is given as:

μ=
∑ x = 3+6+9+ 12+15
n 5

45
¿ =9
5

This mean value ( μ) varies from some of the sample mean. This leads us in to concept of
sampling distribution.

This is sampling distribution of the mean.

Sample mean ( x ) Frequency Probability ( x )


6 1 0.1
7 1 0.1
8 2 0.2
9 2 0.2
10 2 0.2
11 1 0.1
12 1 0.1
Total 10 1.00

Note: probability is equals to frequency/total

Statistics For Management II Page 21


 Mean of Sampling Distribution ( μ x)

μ x=
∑x
no . of x

6+7+ 8+8+9+10+ 9+10+11+12


¿
10

¿9

 Standard deviation of the distribution (σ x )

σ
σ x=
√n

σ
2
=
∑ ( xi −μ )
2

( 3−9 )2+ (6−9 )2+ ( 9−9 )2+ (12−9 )2 + ( 15−9 )2


¿
5

90
¿ =18
5
σ =√ σ 2

¿ √ 18=4.243

Characteristics/properties of the sampling distribution of the mean

1. Expected value of the sample mean E ( x ) (or the mean of the sample means) is equal
to the population mean. Algebraically  ( x ) = E ( x ) = 

2. Give the population mean (), population standard deviation (σ), the sample size (n)
and population size (N); the standard deviation of the sample mean is given as:
σ
σ ❑= - - - - - - - - - For infinite population.
√n

Statistics For Management II Page 22


σ ❑=
σ
√n √ N−n
N −1
-------- For finite population.

A population is said to be infinite when it is not possible to list or count all the elements
included in the population, (i.e., when the elements are unlimited). Or, in the cases when
the elements in the population are limited, the population may be considered as infinite
when the sample size is small and as rule of thumb statisticians consider the population as
infinite when n  5% of N. A population is said to be finite when n > 0.05 N. The value
N n
N 1
is referred as finite population correction factor.

3. The sampling distribution of the mean is normally distributed regardless of the


population from which it is drawn.

Illustration 2

The average lifetime of a light bulb is 3000 hours with a standard deviation of 696 hours.
A simple random sample of 36 bulbs is taken.

(a) What is the expected value, standard deviation, and shape of the sampling
distribution of x ?

(b) What is the probability that the average life time in the sample will be between
2670.56 and 2809.76 hours?

(c) What is the probability that the average life time in the sample will be equal to or
greater than 3219.24 hours?

(d) What is the probability that the average life time in the sample will be equal to or
less than 3180.96 hours?

(e) How large of a sample needs to be taken to provide a 0.01 probability that the
average life time in the sample will be equal to or greater than 3219.24 hours

Solution:

Statistics For Management II Page 23


a) E ( x )=μ=3000

σ 696
σ x= = =116
√ n √36

b) P ( 2670.56 ≤ x ≤2809.76 )

¿P
[ 2670.56−3000 x−μ 2809.76−3000
116

σx

116 ]
¿ P (−2.84 ≤ Z ≤−1.64 )

¿ 0.0482

c) ( x ≥ 3219.24 )

¿P
[ x −μ 3219.24−3000
σx

116 ]
¿ P ( Z ≥1.89 )

¿ 0.02 94

d) P ( x ≤ 3180.96 )

¿P
[ x −μ 3280.96−3000
σx

116 ]
¿ P ( Z ≤1.56 )

¿ 0.9406

e) 0.01=P ( x ≥3219.24 )

¿P
[ x −μ 3219.24−3000
σx

σx ]
¿ P Z≥( 219.24
σx )
Statistics For Management II Page 24
219.24
Z 0.01 ≈
σx

219.24
2.33=
696
√n

696
2.33× =219.24
√n

n=54.71≈ 55

1.2.2. Sampling Distribution of the Difference Between Two Means

Dear student, in the previous section of this chapter we have discussed about sampling
distribution of the sample mean. Another sampling distribution that you will soon
encounter is that of the difference between two sample means. The sampling plan calls
for independent random samples drawn from each of two normal populations.

Suppose two populations of size N1 and N2 are given. For each sample of size n1 from
first population, compute sample mean x 1 and standard deviation σ x . Similarly, for each
1

sample of size n2 form second population, compute sample mean x 2 and standard
deviation σ x .
2

For all combinations of these samples from these populations, we can obtain the sampling
distribution of the difference of two sample means ( x 1−x 2). The mean and the standard
distributions are given by:
μx1−¿ x2 =¿ ¿ μ x −¿ μ
1 x2 ¿

Since the standard error of a sampling distribution is the standard deviation of the
sampling distribution, the standard error of the difference between means is:

σ

2 2
σ1 σ 2
x 1−¿x = + ¿
2
n1 n2

Statistics For Management II Page 25


Just to review the notation, the symbol on the left contains a sigma (σ), which means it is
a standard deviation. The subscripts σ x 1−¿x 2 ¿ indicates that it is the standard deviation of the
sampling distribution of ( x 1−x 2).

To convert to the standard normal distribution, we use the formula,

Z=¿ ¿

We find the Z score by assuming that there is no difference between the population
means.

Illustration

In a study of annual family expenditures for general health care, two populations were
surveyed with the following results:

Population 1: n1 = 40, x 1 = $346

Population 2: n2 = 35, x 2= $320

If the variances of the populations are σ12 = 2800 and σ22 = 3250, what is the probability
of obtaining sample results ¿) as large as those shown if there is no difference in the
means of the two populations?

Solution
Z≥¿¿

( 346−320 ) −( 0 )
Z≥

√ 2800 3250
40
+
35

Z ≥ 2.04

Z ≥ 2.04=0.5000−0.4793=0.0207

The probability that x 1−¿ x ¿ is as large as given is 0.0207.


2

Statistics For Management II Page 26


1.2.3. Sampling distribution of Proportion

Dear learner, in this part of the chapter we will discuss about sampling distribution of
sample proportion. The sample proportion ( P) is the point estimator of the population
proportion p. The formula for computing the sample proportion is
x
P=
n
Where:
x = the number of elements in the sample that possess the characteristic of interest
n = sample size

The sample proportion ( P)) is a random variable and its probability distribution is called
the sampling distribution of P. The sampling distribution of P is the probability
distribution of all possible values of the sample proportion P.

Illustration 1

Consider a population of N = 5 given numbers 3, 6, 9, 12, and 15. Let’s take even
numbers. Consider a sample of size 3 (n = 3) that are drawn from the population the
samples, sample proportions are given in table below.

Required:

 Compute population proportion

 Formulate sampling distribution of p

Solution

 Compute population proportion

2
 the proportion of even numbers is = 0.4.
3

 Formulate sampling distribution of p

Statistics For Management II Page 27


 First find number samples:

N∁n = 5∁3 = 10
 The following are the elements in the sample.

3, 6, 9 3,6,12 3,6,15 3,9,12 6, 12, 15


3,9,15 3, 12, 15 6,9,12 6,9,15 9, 12, 15
 Estimate sample proportions

Samples Sample Proportion ( P )


3, 6, 9 1/3
3, 6, 12 2/3
3, 6, 15 1/3
3, 9, 12 1/3
3, 9, 15 0/3
3, 12, 15 1/3
6, 9, 12 2/3
6, 9, 15 1/3
6, 12, 15 2/3
9, 12, 15 1/3

 Given the above table can construct the probability distribution of the
sample proportions as shown in the table below.

Probability Distribution of sample proportion ( p)

Sample proportion ( p Frequency Probability ( p)


)
0 1 0.1
3
1 6 0.6
3
2 3 0.3
3
Total 10 1.0

Note: probability is equals to frequency/total.

Statistics For Management II Page 28


Given the above table, probability distribution of sample proportion ( p) is the Sampling
distribution of the proportion. Sampling distribution of the proportion is the probability
distribution of all possible values of the sample proportion ( p).

Properties of the sampling distribution of the proportion ( p)

1. The expected value of the sample proportion ( p) is equal to the population


proportion.

Symbolically: E ( p) = P

Where, E ( p) = is the expected value of the random variable ( p).

P = is the population proportion.

2. Just as with the standard deviation of the sample means ( σ x ), the standard deviation
of the sample proportion (σ p) also depends on whether the population is finite or
infinite. It follows that the standard deviation of the sample proportion is:

σ p=
√ √
N −n
N −1
p (1− p)
N −1
--- for finite population (i.e., n > 0.05 N)


σ p=
p(1− p)
N −1
--- for finite population (i.e., n < 0.05 N)

Where, σ p is the standard deviation of ( p)

P is any given population proportion.

Illustration 2

A new soft drink is being market tested. It is estimated that 60% of consumers will like
the new drink. A sample of 96 taste-tested the new drink.

Required:

(a) Determine the standard error of the proportion

Statistics For Management II Page 29


(b) What is the probability that equal to or more than 70.4% of consumers will
indicate they like the drink?

(c) What is the probability that equal to or more than 30% of consumers will indicate
they do not like the drink?

Solution

(a) Standard error of the proportion

σ p=
√ p ×(1− p)
N
=

0.6 × 0.4
96
=0.05

(b) The probability that equals to or more than 70.4% of consumers will indicate they
like the drink

p−P 0.704−0.6
P( p ≥ 0.704)=P( ≥ )
σp 0.05

P(Z ≥ 2.08)=0.0188

(c) The probability that equals to or more than 30% of consumers will indicate they
do not like the drink. We need to compute the probability that less than 70% of
consumers will indicate they like the drink?

p−P 0.70−0.6
P ( p <0.70 ) =P ( < )
σp 0.05

P(Z <2.00)=0.9772

1.2.4. Sampling Distribution of the Difference Between Two Proportions

Dear student, do you understand what a sampling distribution of sample proportion? To


clarify about sampling distribution of the difference between two proportions, suppose
two populations of size N1 and N2 are given. For each sample of size n1 from first
population, compute sample proportion p1 and standard deviation σ p . Similarly, for each
1

Statistics For Management II Page 30


sample of size n2 form second population, compute sample proportion p2 and standard
deviation σ p .2

For all combinations of these samples from these populations, we can obtain the sampling
distribution of the difference of two sample proportions ( p1−¿ p ¿). The mean and the
2

standard distributions are given by:


μP 1−¿ P 2 =¿ ¿ μ P −¿μ
1 P2 ¿

σ
2

p1−¿ p = P1
(1− P¿¿1)
n1
+P2
(1− P¿¿ 2)
n2
¿¿¿

Z=¿ ¿

If sample size n1 and n2 are large, that is, n1 ≥ 30 and n2 ≥ 30, the sampling distribution of
the difference of two sample proportions is clearly approximated by normal distribution.

Illustration

10% of machines produced by Company A are defective and 5% of those produced by


Company B are defective. A random sample of 250 machines is taken from Company A
and a random sample of 300 machines is taken from Company B. What is the probability
that the difference in sample proportion is less than or equal to 0.02?

Solution

We are given the following information:


μ P −¿μ ¿= P1 – P2 = 0.10 – 0.05 = 0.05; n1 = 250 and n1 = 300
1 P2

The probability that the difference in sample proportion is less than or equal to 0.02 ( P ¿):

Z≤¿¿

Statistics For Management II Page 31


( 0.02 )−( 0.10−0.05 )
Z≤ ¿
√ 0.10 ×0.90 ¿ + 0.05× 0.95
250 300

¿ P ( Z ≤−1.32 )

¿ 0.0934

Hence, the desired probability for the difference P1−¿ P ¿ ≤ 0.02 in sample proportion is
2

0.0934.

Summary

In this chapter we presented the concepts of sampling and sampling distributions. We


demonstrated how a simple random sample can be selected from a finite population and
how a random sample can be collected from an infinite population. The data collected
from such samples can be used to develop point estimates of population parameters.
Because different samples provide different values for the point estimators, point
estimators such as and are random variables. The probability distribution of such a
random variable is called a sampling distribution. In particular, we described the
sampling distributions of the sample mean and the sample proportion.

In considering the characteristics of the sampling distributions of and, we stated that E( x )

= µ and E( p) = P. After developing the standard deviation or standard error formulas for
these estimators, we described the conditions necessary for the sampling distributions of
and to follow a normal distribution. Other sampling methods including stratified random
sampling, cluster sampling, systematic sampling, convenience sampling, and judgment
sampling were discussed.
Statistics For Management II Page 32
Glossary

Sampled population – the population from which the sample is taken.

Parameter – a numerical characteristic of a population, such as a population mean μ, a

population standard deviation σ, a population proportion P, and so on.

Simple random sample – a simple random sample of size n from a finite population of
size N is a sample selected such that each possible sample of size n has the same
probability of being selected.

Random sample - a random sample from an infinite population is a sample selected such
that the following conditions are satisfied: (1) Each element selected comes from the
same population; (2) each element is selected independently.

Sample statistic – is a sample characteristic such as a sample mean, a sample standard


deviation (s), a sample proportion, and so on. The value of the sample statistic is used to
estimate the value of the corresponding population parameter.

Statistics For Management II Page 33


Target population - the population for which statistical inference such as point estimates
are made. It is important for the target population to correspond as closely as possible to
the sampled population.

Sampling distribution – is a probability distribution consisting of all possible values of a


sample statistic.

Finite population correction factor - the term


√ N −n
N −1
that is used in the formulas for

and whenever a finite population, rather than an infinite population, is being sampled.
The generally accepted rule of thumb is to ignore the finite population correction factor
whenever n/N ≤ .05.

Standard error – is the standard deviation of a point estimator.

Central limit theorem - is a theorem that enables one to use the normal probability
distribution to approximate the sampling distribution of whenever the sample size is
large.

Stratified random sampling - is a probability sampling method in which the population


is first divided into strata and a simple random sample is then taken from each stratum.

Cluster sampling - is a probability sampling method in which the population is first


divided into clusters and then a simple random sample of the clusters is taken.

Systematic sampling - is a probability sampling method in which we randomly select


one of the first K elements and then select every Kth element thereafter.

Judgment sampling - is a non-probability method of sampling whereby elements are


selected for the sample based on the judgment of the person doing the study.

Statistics For Management II Page 34


Self-Test Questions

1) What does it mean by sampling theory?

------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
2) Which sampling technique is more favorable justify it
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
-----------------------------------------------------

3) What is quota sampling?


------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------

Statistics For Management II Page 35


4) The foreman of a bottling plant has observed that the amount of soda in each 32-
ounce bottle is actually a normally distributed random variable, with a mean of 32.2
ounces and a standard deviation of 0.3 ounce.
i. If a customer buys one bottle, what is the probability that the bottle will
contain more than 32 ounces? Answer: 0.7486

--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------

ii. If a customer buys a carton of four bottles, what is the probability that the
mean amount of the four bottles will be greater than 32 ounces?
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

5) In a specific election, a state representative received 52% of the votes cast. One year
after the election, the representative organized a survey that asked a random sample
of 300 people whether they would vote for him in the next election. If we assume that
his popularity has not changed, what is the probability that more than half of the
sample would vote for him?

-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
Statistics For Management II Page 36
-------------------------------------------------------------------------------------------------------
-------------------------------------------------

6) Assume there are two species of green beings on Mars. The mean height of Species 1
is 32 while the mean height of Species 2 is 22. The variances of the two species are
60 and 70, respectively and the heights of both species are normally distributed. You
randomly sample 10 members of Species 1 and 14 members of Species 2. What is the
probability that the mean of the 10 members of Species 1 will exceed the mean of the
14 members of Species 2 by 5 or more?

------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------

CHAPTER TWO

STATISTICAL ESTIMATION

Introduction

Dear Students! Welcome to the second chapter of this module. In Chapter 1 we have
discussed about sampling and sampling distribution. The sampling distribution of the
mean shows how far sample means could be from a known population mean. Similarly,
the sampling distribution of the proportion shows how far sample proportions could be
from a known population proportion.

In this chapter we will discuss about statistical estimation or in short estimation. In


estimation, our aim is to determine how far an unknown population mean could be from
the mean of a simple random sample selected from that population; or how far an
unknown population proportion could be from a sample proportion. Those are the

Statistics For Management II Page 37


concerns of statistical inference, in which a statement about an unknown population
parameter is derived from information contained in a random sample selected from the
population.

Objectives of the Chapter

After studying this Chapter, students should be able to:


 understand the meaning of point estimation and interval estimation
 obtain knowledge about point estimation of population mean and population
proportion
 obtain knowledge about interval estimation of population mean and population
proportion
 obtain knowledge about interval estimation of the difference between two
population means

2.1. Basic Concepts

Dear students! Do you know the meaning of statistical inference? Statistical inference
is the act of generalizing from a sample to a population with calculated degree of
certainty. The two forms of statistical inference are estimation and hypothesis testing.
This chapter introduces estimation. The next chapter introduces hypothesis testing.

A statistical population represents the set of all possible values for a variable. In practice,
we do not study the entire population. Instead, we use data in a sample to shed light on
the wider population. The process of generalizing from the sample to the population is
statistical inference.
 Estimation: is the process of using statistics as estimates of parameters. It is any
procedure where sample information is used to estimate/ predict the numerical
value of some population measure (called a parameter).

Statistics For Management II Page 38


 Estimator: refers to any sample statistic that is used to estimate a population
parameter. E.g., x for μ, p for p etc.
 Estimate: is a specific numerical value of our estimator. E.g. x is 9, 2, 5

Types of Estimates:

Dear student! We can categorize two types of estimates about a population: a point
estimate and an interval estimate.

A point estimate: - is a single number that is used to estimate an unknown population


parameter. It is a single value that is measured from a sample and used as an estimate of

the corresponding population parameter.

The most important point estimates (given that they are single values) are:
o Sample mean x for population mean (μ);
o Sample proportion ( p) for population proportion (P);
o Sample variance ( s2) for population variance (σ 2) and
(σ )
s
o Sample standard deviation ( ) for population standard deviation

An interval estimate - is a range of values used to estimate a population parameter. It


describes the range of values with in which a parameter might lie. Stated differently, an
interval estimate is a range of values with in which the analyst can declare with some
confidence that the population parameter will fall.

Activities

- What is estimation?

- How do you perceive point estimation?

- How do you define interval estimation?

2.2. Point Estimators of the Mean and Proportion

Statistics For Management II Page 39


Dear learner! Here we discuss the first part of estimation which point estimation. A
point estimate of a parameter ϴ is a single number that can be regarded as a sensible
value for ϴ. A point estimate is obtained by selecting a suitable statistic and computing
its value from the given sample data. The selected statistic is called the point estimator of
ϴ.

Point estimators of population parameters, while useful, do not convey as much


information as interval estimators. Point estimation produces a single value as an estimate
of the unknown population parameter. The estimate may or may not be close to the
parameter value; in other words, the estimate may be incorrect. An interval estimate, on
the other hand, is a range of values that conveys the fact that estimation is an uncertain
process. The standard error of the point estimator is used in creating a range of values;
thus, a measure of variability is incorporated into interval estimation. Further, a measure
of confidence in the interval estimator is provided; consequently, interval estimates are
also called confidence intervals. For these reasons, interval estimators are considered
more desirable than point estimators.

Illustration One

Suppose that a business statistics professor wants to estimate the mean summer income of
his second-year business students. Selecting 25 students at random, he calculates the
sample mean weekly income to be Br. 400. The point estimate is the sample mean. In
other words, he estimates the mean weekly summer income of all second-year business
students to be Br. 400. Thus, Br. 400 is point estimate value of the actual mean (average)
weekly income of students.

Illustration Two

Statistics For Management II Page 40


Suppose, for example, that the parameter of interest is μ, the true average lifetime of
batteries of a certain type. A random sample of n = 3 batteries might yield observed
lifetimes (hours) x 1=5 :0; x 2=6 : 4; x 3=5 :9.

The computed value of the sample mean lifetime is ( x=5.77 ). It is reasonable to regard
5.77 as a very plausible value of μ "our best guess" for the value of μ based on the
available sample information.

Illustration Three

Suppose we have the sample 10, 20, 30, 40 and 50 selected randomly from a population
its mean (μ) is unknown.

The sample mean, x ,

¿
∑ xi = 10+20+30+ 40+50 =30
n 5

Thus, 30 is a point estimate of μ.

On the other hand, if we state that the mean, μ, is between, x ± 10, the range of values
from 20 (30-10) to 40 (30+10) is an interval estimate.

2.3. Interval Estimators of the Mean and Proportion

2.3.1. Interval Estimation for Population Means (μ)

Dear student! In the previous part of this chapter, we discussed point estimation of
population mean and population proportion. In this part of the chapter, we will discuss
how we estimate the value of population mean based on interval estimation.

Interval estimation consists of two numerical values defining an interval within which
lies the unknown parameter we want to estimate with a specified degree of confidence

Statistics For Management II Page 41


(CL). The values depend on the confidence level which is equal to 1-α (α is the
probability of error). The interval estimate may be expressed as:

Estimator ± Reliability coefficient × standard error

The reliability coefficient is the value of Z α /2 corresponding to the confidence level.

Confidence level α -value Z-value

90% 10% 1.645

95% 5% 1.96

99% 1% 2.58

As a result of the Central Limit Theorem (discussed in Chapter 1) the following z


formula for sample means can be used when sample sizes are large, regardless of the
shape of the population distribution or for smaller sizes if the population is normally
distributed.

x −μ
Z=
σ
√n

Rearranging the formula:


σ
μ=x −Z
√n

Because the sample mean can be greater than or less than the population mean, Z can be
positive or negative. Thus, the preceding expression takes the form:
σ
μ=x ± Z
√n
The value of the population mean ( μ), lies somewhere within this range. Rewriting this
expression yields the confidence interval for population mean:
σ σ
x−Z ≤ μ≤ x+Z
√n √n

Statistics For Management II Page 42


The confidence interval for population mean is affected by:
1. The population distribution, i.e., whether the population is normally
distributed or not.
2. The standard deviation, i.e., whether σ is known or not.
3. The sample size, i.e., whether the sample size, n, is large or not.

Confidence internal estimate of μ - Normal population, σ known

A confidence interval estimate for  is an interval estimate together with a statement of


how confident we are that the interval estimate is correct.

When the population distribution is normal and at the same time σ is known, we can
estimate  (regardless of the sample size) using the following formula.

σ
μ=x ± Z
√n
Where,
x = sample mean
Z = value from the standard normal table reflecting confidence level
σ = population standard deviation
n = sample size
α = the proportion of incorrect statements (α = 1 – Confidence level)
 = unknown population mean

From the above formula we can learn that an interval estimate is constructed by adding
and subtracting the error term to and from the point estimate. That is, the point estimate is
found at the center of the confidence interval.

To find the interval estimate of population mean, μ we have the following steps.
1. Compute the standard error of the mean (σ x)
α
2. Compute from the confidence coefficient.
2
α
3. Find the Z value for the from the table
2

Statistics For Management II Page 43


4. Construct the confidence interval
5. Interpret the results

Illustration 1

The vice president of operations for Ethio Telecom is in the process of developing a
strategic management plan. He believes that the ability to estimate the length of the
average phone call on the system is important. He takes a random sample of 60 calls
from the company records and finds that the mean sample length for a call is 4.26
minutes. Past history for these types of calls has shown that the population standard
deviation for call length is about 1.1 minutes. Assuming that the population is normally
distributed and he wants to have a 95% confidence, help him in estimating the population
mean.

Solution:

 Given: n = 60 calls

x = 4.26 minutes

σ = 1.1 minutes

CL = 0.95

σ 1.1
 Step 1: σ x = = = 0.142
√n √60

 Step 2: α = 1 – CL = 1 – 0.95 = 0.05

α 0.05
= = 0.025
2 2

 Step 3: Zα = Z0.025= 1.96


2

 Step 4: μ = ( x ± Zα × σ x)
2

Statistics For Management II Page 44


= 4.26 ± 1.96 × 0.142

= 4.26 ± 0.28

3.98 ≤ μ≤ 4.54

 Step 5: Conclusion: the vice president of Ethio-Telecom can be can be 95%


confident that the average length of a call for a population is between 3.98 and
4.54 minutes.

Illustration 2

A survey conducted by “Addis Zemen Gazetta” found that the sample mean age of men
was 44 years and the sample mean age of women was 47 years. Altogether, 454 people
from Addis were included in the reader poll – 340 women and 114 men. Assume that the
population standard deviation of age for both men and women is 8 years.
a. Develop a 95% confidence interval estimate for the mean age of the population
men who read the gazetta.
b. Develop a 95% confidence interval estimate for the mean age of the population
women who read the gazetta.
c. Compare the widths of the two interval estimates form part (a) & (b) which one
has a better precision? Why?

Solution:

a. Confidence level estimate of mean for men

 Given: n = 144 men

x = 44 years

σ = 8 years

CL = 0.95

Statistics For Management II Page 45


σ 8
 Step 1: σ x = = = 0.75
√n √144

 Step 2: α = 1 – CL = 1 – 0.95 = 0.05

α 0.05
= = 0.025
2 2

 Step 3: Zα = Z0.025= 1.96


2

 Step 4: μ = ( x ± Zα × σ x)
2

= 44 ± 1.96 × 0.75

= 44 ± 1.47

42.53 ≤ μ≤ 45.47

Step 5: Conclusion: the 95% confidence interval estimate for the mean age of the
population men who read the gazetta is between 42.53 and 45.47 years.

b. Confidence level estimate of mean for women

 Given: n = 340 women

x = 447 years

σ = 8 years

CL = 0.95

σ 8
 Step 1: σ x = = = 0.434
√n √340

 Step 2: α = 1 – CL = 1 – 0.95 = 0.05

α 0.05
= = 0.025
2 2

Statistics For Management II Page 46


 Step 3: Zα = Z0.025= 1.96
2

 Step 4: μ = ( x ± Zα × σ x)
2

= 47 ± 1.96 × 0.434

= 47 ± 0.85

46.15 ≤ μ≤ 47.85

Step 5: Conclusion: the 95% confidence interval estimate for the mean age of the
population women who read the gazetta is between 46.15 and 47.85 years.

c. Part b has a better precision because the sample size is larger as compared with
part a.

Illustration Three

Time magazine reports information on the time required for caffeine from products such
as coffee and soft drinks to leave the body after consumption. Assume that the 99%
confidence interval estimate of the population mean time for adults is 5.6 hrs. to 6.4 hrs.

i. What is the point estimate of the mean time for caffeine to leave the body after
consumption?

ii. If the population standard deviation is 2 hrs., how large a sample was used to
provide the interval estimate?

Solution:

Given: C = 0.99 Confidence interval: 5.6 ≤  ≤6.4

5.6+6.4
i. point estimate ¿ =6 hrs .
2

Statistics For Management II Page 47


Or; add the two questions
σ
5.6=x−Z
√n

σ
6.4=x +Z
√n

12=2 x

x=6 hours

ii. 0.99 σ = 2 hours Confidence interval: 5.6 ≤  ≤6.4 n=?

α = 1- CL = 1- 0.99 = 0.01

α/2 = 0.005

Z α =Z 0.005=2.58
2

σ
6.4=x +Z
√n

2
6.4=6+ 2.58
√n

5.14
0.4=
√n

5.14
√ n=
0.4

√ n=1285

n=165
Confidence interval estimate of μ - Normal population, σ unknown, n large

If we know that the population is normal, and we know the population standard deviation
(σ ), the confidence interval for μ should be constructed in the manner already shown i.e.,

Statistics For Management II Page 48


σ
μ=x ± Z . If the population standard deviation is unknown, it has to be estimated from
√n

the sample; i.e., when σ is unknown, we use sample standard deviation s=


√ ∑ ( x i−x )2 .
n−1
Then, the standard error of the mean (σ x) is estimated by the sample standard error of the
s
mean: s x = .
√n

Therefore, the confidence interval to estimate μ when population standard deviation (σ )


is unknown, population normal and n is large is:

s
μ=x ± Z
√n

Illustration 1

Suppose that a car rental firm in Addis wants to estimate the average number of miles
traveled by each of its cars rented. A random sample of 110 cars rented reveals that the
sample means travel distance per day is 85.5 miles, with a sample standard deviation of
19.3 miles. Compute a 99% confidence interval to estimate μ.

Solution:

Given: n = 110 rented cars

x = 85.5 miles

s = 19.3 miles

CL = 0.99

S 19.3
Step 1: s x = = = 1.84
√n √110

Step 2: α = 1 – CL = 1 – 0.99 = 0.01

Statistics For Management II Page 49


α 0.01
= = 0.005
2 2

Step 3: Zα = Z0 .005= 2.58


2

Step 4: μ = ( x ± Zα × s x )
2

= 85.5 ± 2.58 × 1.84

= 85.5 ± 4.747

80.753 ≤ μ≤ 90.247

Step 5: Conclusion: we state with 99% confidence that the average distance
traveled by rented cars lies between 80.753 and 90.247 miles.

Illustration 2

A study is being conducted in a company that has 800 engineers. A random sample of 50
of these engineers reveals that the average sample age is 34.3 years, and the sample
standard deviation is 8 years. Assuming normality, construct a 98% confidence interval to
estimate the average age of all engineers in this company.

Solution:

Given: n = 50 engineers

N = 800 engineers

x = 34.3 years

s = 8 years

CL = 0.98

Step 1: s x =
S
√ √
N −n
√ n N −1
800−50
800−1
= 1.10

Statistics For Management II Page 50


Step 2: α = 1 – CL = 1 – 0.98 = 0.02

α 0.0 2
= = 0.01
2 2

Step 3: Zα = Z0 .01= 2.33


2

Step 4: μ = ( x ± Zα × s x )
2

= 34.3 ± 2.33 × 1.10

= 34.3 ± 2.56

31.74 ≤ μ≤ 36.86

Step 5: Conclusion: We state with 98% confidence that the mean age of engineers
lies between 31.74 and 36.86 years.

2.3.2. Interval Estimation of the Population Proportion

Dear student! Here we will discuss how we estimate the value of population proportion
based on interval estimation through illustrations.

We know that a sample proportion ( p), is an unbiased estimator of a population


proportion P and if the sample size is large then, the sampling distribution of p is normal
with:

p− p p− p
=


σ
Z= p pq
n
However, here p is unknown and we want to estimate p by p and hence Z becomes:
p− p
Z=

√ pq
n

Statistics For Management II Page 51


That is, σ p is substituted by s p=
√ pq
n
.

Solving for p results in:

p= p ± Z
√ pq
n

Since Z represents the confidence level we write it as:

p= p ± Z α
2 √ pq
n

p= p ± Z α s p
2

Where:

p = sample proportion

q=1− p

α = 1 – CL

n = sample size

p = unknown population proportion

Illustration 1

Recently, a study of 87 randomly selected companies with telemarketing operation was


completed. The study revealed that 39% of the sampled companies had used
telemarketing to assist them in order processing. Using this information estimate the
population proportion of telemarketing companies who use their telemarketing operation
to assist them in order processing taking a 95% confidence level.

Solution:

 Given:

Statistics For Management II Page 52


n = 87 companies

p = 0.39

q = 0.61

CL = 0.95

 Step 1: s p=
√ √
pq
n
=
0.39× 0.61
87
= 0.0523

 Step 2: α = 1 – CL = 1 – 0.95 = 0.05

α 0.05
= = 0.025
2 2

 Step 3: Zα = Z0.025= 1.96


2

 Step 4: p = ( p ± Zα × s p)
2

= 0.39 ± 1.96 × 0.0523

= 0.39 ± 0.1025

0.2875 ≤ p ≤ 0.4925

 Step 5: Conclusion: we state with 95% confidence that the proportion of


companies which use telemarketing to assist order processing lies between 0.2875
and 0.4925.

Illustration 2

A fast-food restaurant took a random sample of 400 customers to determine the


proportion of customers who are female. A confidence interval of .73 to .87 was reported.

a. Find the number of females and the sample proportion

Statistics For Management II Page 53


b. Find the level of confidence of this interval

Solution:

 Given:

n = 400

0.73 ≤ p ≤ 0.87

a) p=? Number of females =?

0 .73+ 0. 87
=0. 80
2
Point estimate =

Or summation of two equations:

0.73 = p −¿ Zα × s p
2

0.87 = p +¿ Zα × s p
2

1.60 = 2 p

p=0.8

Number of females:

(x) = n × p=400 × 0.8=320

b) level of confidence (CL):

p = p −¿ Zα × s p
2

0.87 = 0.8 0 −¿ Zα × s p
2

Statistics For Management II Page 54


0.07 = Zα
2 √ 0.80 × 0.20
400

0.07 = Zα ×0.02
2

3.50 = Zα
2

P¿

CL=0.49977 × 2

CL=99.954 %

Illustration 3

A random sample of 400 faculty members at AAU contained 120 people who believed
that the University should improve its library service. On the basis of this sample
information, an analyst calculated the confidence interval (0.25, 0.35) for the population
proportion of faculty members favoring improvement. What is the level of confidence of
this interval?

Solution:

 Given:

n = 400

x = 120

p=0.30

Interval estimate: 0.25 ≤ p ≤ 0.30

Level of confidence (CL):

p = p −¿ Zα × S p
2

Statistics For Management II Page 55


0.25 = 0.30 −¿ Zα × S p
2

0.05 = Zα ×
2 √ 0.70× 0.30
400

0.05 = Zα × 0.023
2

2.17 = Zα
2

P(Z= 2.17 ) = 0.485

CL=0.485 ×2

CL=97 %

2.4. Interval Estimation of the Difference Between Two Independent


Means

Dear learner! Have you understood methods which estimate interval estimation value of
population mean? Hear we will discuss certain concepts and methods to compute the
interval estimation value of the difference between two population means.

Letting μ1 denote the mean of population 1 and μ2 denote the mean of population 2, we
will focus on inferences about the difference between the means: μ1−μ2 . To make an
inference about this difference, we select a simple random sample of n1 units from
population 1 and a second simple random sample of n1 units from population 2. The two
samples, taken separately and independently, are referred to as independent simple
random samples. In this section, we assume that information is available such that the
two population standard deviations, σ 1 and σ 2, can be assumed known prior to collecting
the samples. We refer to this situation as the σ 1 and σ 2 known case. In the following
example we show how to compute a margin of error and develop an interval estimate of
the difference between the two population means when σ 1 and σ 2 are known.

Statistics For Management II Page 56


Greystone Department Stores, Inc., operates two stores in Buffalo, New York: One is in
the inner city and the other is in a suburban shopping center. The regional manager
noticed that products that sell well in one store do not always sell well in the other. The
manager believes this situation may be attributable to differences in customer
demographics at the two locations. Customers may differ in age, education, income, and
so on. Suppose the manager asks us to investigate the difference between the mean ages
of the customers who shop at the two stores.

Let us define population 1 as all customers who shop at the inner-city store and
population 2 as all customers who shop at the suburban store.

μ1 - mean of population 1 (i.e., the mean age of all customers who shop at the
inner-city store)

μ2 - mean of population 2 (i.e., the mean age of all customers who shop at the
suburban store)

The difference between the two population means is μ1−μ2 .

To estimate μ1−μ2 , we will select a simple random sample of n1 customers from


population 1 and a simple random sample of n2 customers from population 2. We then
compute the two sample means.

x 1- Sample mean age for the simple random sample of n1 inner-city customers

x 2 - Sample mean age for the simple random sample of n2 suburban customers

The point estimator of the difference between the two population means is the difference
between the two sample means (i.e. x 1−x 2). As with other point estimators, the point
estimator x 1−x 2 has a standard error that describes the variation in the sampling
distribution of the estimator. With two independent simple random samples, the standard
error of x 1−x 2 is as follows:

σ x −x =
1 2

σ 12 σ 22
+
n1 n2

Statistics For Management II Page 57


Therefore, interval estimate of the difference between two population means: σ 1 and σ 2
known:

x 1−x 2 ± Zα
2 √ σ 12 σ 22
+
n1 n 2

Illustratioon 1

Let us return to the Greystone example. Based on data from previous customer
demographic studies, the two population standard deviations are known with σ 1=9 years
and σ 2=10 years. The data collected from the two independent simple random samples of
Greystone customers provided the following results.

Inner City Store Suburban Store

Sample Size n1=36 n2 =49


Sample Mean x 1=40 years x 2=35 years

Solution

Using the above expression, we find that the point estimate of the difference between the
mean ges of the two populations is:

x 1−x 2=40−35=5 years

Thus, we estimate that the customers at the inner-city store have a mean age five years
greater than the mean age of the suburban store customers.

Using 95% confidence and Zα =Z0.025 =1.96 , we have interval estimate of:
2

x 1−x 2 ± Zα
2 √ σ 12 σ 22
+
n1 n 2

Statistics For Management II Page 58


40−35 ± 1.96
√ 92 10 2
+
36 49

5 ± 4.06

Thus, the margin of error is 4.06 years and the 95% confidence interval estimate of the
difference between the two population means is:

5−4.06 ≤ μ 1−μ2 ≤5+ 4.06

0.94 ≤ μ1−μ2 ≤ 9.06 years

Illustratioon 2

A research team is interested in the difference between serum uric acid levels in patients
with and without Down's syndrome. In a large hospital for the treatment of the mentally
retarded, a sample of 12 individuals with Down's syndrome yielded a mean of x 1=4.5
mg/100 ml. In a general hospital a sample of 15 normal individuals of the same age and
sex were found to have a mean value of x 2=3.4 mg/100 ml. If it is reasonable to assume
that the two populations of values are normally distributed with variances equal to 1 and
1.5 respectively, find the 95 percent confidence interval for μ1−μ2 .

Give:

n1=12 n2 =15

x 1=45 x 2=3.4

2 2
σ 1 =1σ 1 =1.5

The point estimate for is μ1−μ2 is x 1−x 2

Statistics For Management II Page 59


x 1−x 2=4.5−3.4=1.1

The standard error is:

√ σ 12 σ 22
+
n1 n2

√ 1 1.5
+
12 15
=0.4282

The 95% confidence interval is:

x 1−x 2 ± Zα
2 √ σ 12 σ 22
+
n1 n 2

40−35 ± 1.96 ×0.4282

0.26 ≤ μ1−μ2 ≤1.96

Discussion: As this is a z-interval, we know that the correct value of z to use is 1.96. We
interpret this interval that the difference between the two population means is 1.1 and we
are 95% confident that the true mean lies between 0.26 and 1.94.

2.5. Student’s t-distribution

Dear learner! The previous examples of interval estimation are on the basis of standard
normal distribution (Z test). Standard normal distribution (Z test) is preferable when
population or sample standard deviation is known and the sample size is large (n ≥ 30). If
the sample standard deviation (s) is used as an estimator of the population standard
deviation (σ ) the sample size is small (n < 30), and if the population has a normal
distribution, interval estimation of the population mean can be based up on a probability
distribution known as t-distribution.

Characteristics of t-distribution

1. The t-distribution is symmetric about its mean (0) and ranges from - ∞ to ∞.

Statistics For Management II Page 60


2. The t-distribution is bell-shaped (uni-modal) and has approximately the same
appearance as the standard normal distribution (Z- distribution).
3. The t-distribution depends on a parameter ν (Greek Nu), called the degrees of
freedom of the distribution. v = n -1, where n is sample size. The degree of freedom,
ν, refers to the number of values we can choose freely.
4. The variance of the t-distribution is ν/ (ν-2) for ν>2.
5. The variance of the t-distribution always exceeds 1.
6. As ν increases, the variance of the t-distribution approaches 1 and the shape
approaches that of the standard normal distribution.
7. Because the variance of the t-distribution exceeds 1.0 while the variance of the Z-
distribution equals 1, the t-distribution is slightly flatter in the middle than the Z-
distribution and has thicker tails.
8. The t-distribution is a family of distributions with a different density function
corresponding to each different value of the parameter ν. That is, there is a separate t-
distribution for each sample size. In proper statistical language, we would say,
“There is a different t-distribution for each of the possible degrees of freedom”.
9. The t formula for sample when σ is unknown, the sample size is small, and the
population is normally distributed is:
x−µ x−µ
t= =
sx s
√n
This formula is essentially the same as the z-formula, but the distribution table values are
not. The confidence interval to estimate µ becomes:
s
µ=x ± t α
2 √n
,v

Where: x = sample mean


α = 1 – CL
ν = n – 1 (degrees of freedom)
s = sample standard deviation
n = sample size
 = unknown population mean

Statistics For Management II Page 61


Steps:
i. Calculate degrees of freedom (v = n - 1) and sample standard error of the mean.
α
ii. Compute
2

iii. Look up t α , v
2

iv. Construct the confidence interval


v. Interpret results

Illustration 1

If a random sample of 27 items produces x=128.4 and s = 20.6. What is the 98%
confidence interval for µ? Assume that x is normally distributed for the population. What
is the point estimate?

Solution:

The point estimate of the population mean is the sample mean, in this case 128.4 is the
point estimate.

Given:
n=27
x=128.4
s=20.6
CL=0.98
v=n−1=27−1=26

s 20.6
i. s x = = =3.96
√ n √27

ii. α = 1 – CL = 1- 0.98 = 0.02

0.02
tα = =0.01
2
2

iii. t α , v =t 0.01 ,26=2.479


2

Statistics For Management II Page 62


s
iv. µ=x ± t α , v
2 √n

v. = 128.4 ± 2.479(3.96)

= 128.4 ± 9.82

118.56 ≤  ≤ 138.22

We state with 98% confidence that the population mean lies between 118.56 and 138.23.

Illustration 2

A sample of 20 cab fares in Bahir Dar city shows a sample mean of Br 2.50 and a sample
standard deviation of Br. 0.50. Develop a 90% confidence interval estimate of the mean
cab fares in Bahir Dar city. Assume the population of cab fares has a normal distribution.

Given:
n=20
x=2.50
s=0.50
CL=0.90
v=n−1=20−1=19

s 0.50
i. s x = = =0.112
√ n √20

ii. α = 1 – CL = 1- 0.90 = 0.10

0.10
tα = =0.05
2
2

iii. t α , v =t 0.05 ,19=1.729


2

s
iv. µ=x ± t α , v
2 √n

Statistics For Management II Page 63


v. = 2.50 ± 1.729(0.112)

= 2.50 ± 0.194

2.31 ≤  ≤ 2.69

We state with 90% confidence that the mean of cab fares in Bahir Dar city lies
between Birr 2.31 and 2.69.

Illustration 3

Sales personnel for X Company are required to submit weekly reports listing customer
contacts made during the week. A sample size of 61 weekly contact reports showed a
mean of 22.4 customer contacts per week for the sales personnel. The sample standard
deviation was 5 contacts.

a. Develop a 95% confidence interval estimate for the mean number of weekly
customer contacts for the population of sales personnel.

b. Assume that the population of weekly contact data has a normal distribution. Use
the t distribution to develop a 95% confidence interval for the mean number of
weekly customer contacts.

c. Compare your answer for parts (a) and (b). What do you conclude from your
results?

Solutions:

a) Given:
n=61 weekly contact reports
x=22.4 contact
s=5 contact
CL=0.95
s 5
i. s x = = =0.64
√ n √61

Statistics For Management II Page 64


ii. α = 1 – CL = 1- 0.95 = 0.05

0.05
Zα = =0.025
2
2

iii. Z α =Z0.025=1.96
2

s
iv. µ=x ± Z α
2 √n

v. = 22.4 ± 1.96(0.64)

= 22.4 ± 1.25

21.15 ≤  ≤ 23.65

We can state with 95% confidence that the mean weekly contact lies between
21.15 and 23.65 contacts.

b) Given:
n=61 weekly contact reports
x=22.4 contact
s=5 contact
CL=0.95
ν = n – 1 = 61 – 1 = 60

s 5
i. s x = = =0.64
√ n √61

ii. α = 1 – CL = 1- 0.95 = 0.05

0.05
tα= =0.025
2
2

iii. t α , v =Z0.025,60 =2.00


2

Statistics For Management II Page 65


s
iv. µ=x ±t α
2 √n

v. = 22.4 ± 2.00(0.64)

= 22.4 ± 1.28

21.12 ≤  ≤ 23.68

We can state with 95% confidence that the mean weekly contact lies between 21.12
and 23.68 contacts.

c) As the sample size increases, the t-distribution and z (normal) distribution


approximate to be equal.

2.6. Determining the Sample Size

Dear student! The reason for taking a sample from a population is that it would be too
costly to gather data for the whole population. But collecting sample data also costs
money; and the larger the sample, the higher the cost. To hold cost down, we want to use
as small a sample as possible. On the other hand, we want a sample to be large enough to
provide “good” approximation/estimates of population parameters. Consequently, the
question is “How large should the sample be?”

The answer depends on three factors:


1) How precise (narrow) do we want a confidence interval to be?
2) How confident do we want to be that the interval estimate is correct?
3) How variable is the population being sampled?

2.6.1. Sample Size for Estimating Population Mean ( μ)

Dear student! Based on the previous discussions that the confidence interval for μ is
σ σ
µ=x ± Z α . From this expression Z α is called error of estimation (e). That is, the
2 √n 2 √n

σ
difference between x and µ which results from the sampling process. So, e=Z α .
2 √n

Statistics For Management II Page 66


2 2
σ 2σ
( ) ( )
2 2
Squaring both sides results in e = Z α . Solving for n results in, n= Z α 2.
2 n 2 e

( )
2
Zα × σ
2
n=
e
Illustration 1
A gasoline service station shows a standard deviation of Birr 6.25 for the changes made
by the credit card customers. Assume that the station’s management would like to
estimate the population mean gasoline bill for its credit card customers to be within ±
Birr 1.00. For a 95% confidence level, how large a sample would be necessary?

Solution:

Given:
e = Birr 1.00
σ = Birr 6.25
CL = 0.95
Z α =Z 0.025=1.96
2

( )
2
Zα × σ
2
n=
e

( )
2
1.96 ×6.25
n=
1

n=150.06=151

Illustration 2

The National Travel and Tour Organization (NTO) would like to estimate the mean
amount of money spent by a tourist to be within Birr 100 with 95% confidence. If the
amount of money spent by tourist is considered to be normally distributed with a standard

Statistics For Management II Page 67


deviation of Birr 200, what sample size would be necessary for the NTO to meet their
objective in estimating this mean amount?

Solution:

e = Birr 100

σ = Birr 200

CL = 0.95

Z α =Z 0.025=1.96
2

( )
2
Zα × σ
2
n=
e

( )
2
1.96 ×200
n=
100

n=15.37=16

If population standard deviation (σ ) is unknown we have to make an educated guess or


take a pilot sample and estimate it.

H −L
σ=
4
The rough approximation is because 95.4% of the total population falls
1
within ± 2 σ . σ = range.
4

2.6.2. Sample Size for Estimating Population Proportion (p)

The confidence interval for p is:

p= p ± Z α
2 √ pq
n

Statistics For Management II Page 68


The expression Z α
2 √ pq
n
is called the error term (e). That is,

e=Z α
2 √ pq
n
, squaring both sides

2 2 pq
e = Zα
( ) 2
n , solving for n

( )
2
Zα pq
2
n=
e2

p and q
Since we are trying to determine n, we cannot have . Instead, we should have p
and q. so it becomes:

( )
2

2
n= pq
e

Illustration 1

Suppose that a production facility purchases a particular component part in large lots
from a supplier. The production manager wants to estimate the proportion of defective
parts received from this supplier. She believes that the proportion of defects is no more
than 0.2 and wants to be with in 0.02 of the true proportion of defects with a 90% level of
confidence. How large a sample should she take?

Solution:

Given:

Statistics For Management II Page 69


e = 0.02

p = 0.2

q =0.8

CL = 0.90

Z α =Z 0.05=1.64
2

( )
2

2
n= pq
e

( )
2
1.64
n= 0.2 ×0.8
0.02

n=1075.84 ≈ 1076

Illustration 2

What is the largest sample size that would be needed in estimating a population
proportion to be within ± 0.02, with a confidence coefficient of 0.95?

Solution:

Given:

e = 0.02

CL = 0.95

Z α =Z 0.025=1.96
2

The largest sample size would be obtained when p = 0.5. So,

Statistics For Management II Page 70


( )
2

2
n= pq
e

( )
2
1.96
n= 0.5 ×0.5
0.02

n=2401

If p is unknown and there is no possibility of estimating it, use 0.5 as the value of p
because it will generate the greatest possible sample size as compared with other values.

Statistics For Management II Page 71


Summary
In this chapter we presented methods for developing interval estimates of a population
mean and a population proportion. A point estimator may or may not provide a good
estimate of a population parameter. The use of an interval estimate provides a measure of
the precision of an estimate. Both the interval estimate of the population mean and the
population proportion are of the form: point estimate ± margin of error.

We presented interval estimates for a population mean for three cases. In the σ known
case, historical data or other information is used to develop an estimate of σ prior to
taking a sample. Analysis of new sample data then proceeds based on the assumption that
σ is known. In the σ unknown case and the sample size is large, the sample data are used
to estimate both the population mean and the population standard deviation. In the σ
unknown and the sample size is small case, the sample data are used to estimate both the
population mean and the population standard deviation through t distribution.

In the σ known case, the interval estimation procedure is based on the assumed value of σ
and the use of the standard normal distribution. In the σ unknown and the sample size is
large case; the interval estimation procedure uses the sample standard deviation s and the
Z distribution. In the σ unknown and the sample size is small case; the interval estimation
procedure uses the sample standard deviation s and the t distribution. In all cases the
quality of the interval estimates obtained depends on the distribution of the population
and the sample size. If the population is normally distributed the interval estimates will
be exact in both cases, even for small sample sizes. If the population is not normally
distributed, the interval estimates obtained will be approximate. Larger sample sizes will
provide better approximations, but the more highly skewed the population is, the larger
the sample size needs to be to obtain a good approximation.

The general form of the interval estimate for a population proportion is ± margin of error.
In practice the sample sizes used for interval estimates of a population proportion are
generally large. Thus, the interval estimation procedure is based on the standard normal
distribution.

Statistics For Management II Page 72


Glossary

Point estimator - The sample statistic, such as , s, or , that provides the point estimate of
the population parameter.
Point estimate - The value of a point estimator used in a particular instance as an
estimate of a population parameter.
Interval estimate - an estimate of a population parameter that provides an interval
believed to contain the value of the parameter. For the interval estimates in this chapter, it
has the form: point estimate ± margin of error.
Margin of error - The ± value added to and subtracted from a point estimate in order to
develop an interval estimate of a population parameter. σ known The case when historical
data or other information provides a good value for the population standard deviation
prior to taking a sample. The interval estimation procedure uses this known value of σ in
computing the margin of error.
Confidence level - The confidence associated with an interval estimate. For example, if
an interval estimation procedure provides intervals such that 95% of the intervals formed
using the procedure will include the population parameter, the interval estimate is said to
be constructed at the 95% confidence level.
Confidence interval – is another name for an interval estimate.
σ unknown - The more common case when no good basis exists for estimating the
population standard deviation prior to taking the sample. The interval estimation
procedure uses the sample standard deviation s in computing the margin of error.
t distribution - A family of probability distributions that can be used to develop an
interval estimate of a population mean whenever the population standard deviation σ is
unknown and is estimated by the sample standard deviation s.
Degrees of freedom – is a parameter of the t distribution. When the t distribution is used
in the computation of an interval estimate of a population mean, the appropriate t
distribution has n - 1 degrees of freedom, where n is the size of the simple random
sample.

Statistics For Management II Page 73


Self-Test Questions
Using the following data answer question # 1 – 3.

A sample survey of 54 discount brokers showed that the mean price charged for a trade of
100 shares at $50 per share was $33.77. The survey is conducted annually. With the
historical data available, assume a known population standard deviation of $15.

1) What is the value of population mean on the basis of point estimation?

A. $50 C. $15
B. $33.77 D. $100
2) Using the sample data, what is the margin of error associated with a 95% confidence
interval?

A. 1 C. 3
B. 2 D. 4
3) Develop a 95% confidence interval for the mean price charged by discount brokers
for a trade of 100 shares at $50 per share.

A. 33.77 to 30.77 C. 29.77 to 37.77


B. 30.77 to 33.77 D. 30.11 to 40.11
4) Find the t value(s) for each of the following cases.
a) Upper tail area of .025 with 12 degrees of freedom
b) Lower tail area of .05 with 50 degrees of freedom
c) Upper tail area of .01 with 30 degrees of freedom
d) Where 90% of the area falls between these two t values with 25 degrees of
freedom
e) Where 95% of the area falls between these two t values with 45 degrees of
freedom

Statistics For Management II Page 74


CHAPTER THREE

HYPOTHESIS TESTING

Introduction

Dear Students! In Chapter Two we have discussed the first statistical inference which is
estimation. In this chapter we will discuss the second statistical inference which is
hypothesis testing. The chapter comprises of concepts about test of hypothesis for a
single population and two independent populations. It has been tried to show how
hypothesis can be tested for single mean, proportion, and differences of means and
proportions.

Objectives of the Chapter

After studying this Chapter, students should be able to:


 discuss the types errors in hypothesis testing
 explain what is level of significance
 know the meaning of types of hypotheses
 understand the procedures of hypothesis testing
 differentiate the meaning of one tail test and two tail tests
 comprehend how to hypothesize and test single population and double population
information

3.1. Basic Concepts

Dear learner! At times we wish to examine statistical evidence, and determine whether it
supports or contradicts a claim that has been made (or that we might wish to make)
concerning the entire population. This is done in a somewhat asymmetric fashion,
analogous to the approach taken in the Ethiopian system of criminal justice (adopted

Statistics For Management II Page 75


throughout most of the modern world): We take a statement, presume it to be “innocent,”
i.e., true, and ask how strongly the evidence contradicts our initial presumption.

The evidence is viewed as being the result of some statistical procedure. We calculate the
probability that the same procedure – if carried out in a world where the statement really
is true – would, purely due to sampling error, provide evidence at least as contradictory
to the statement on trial as is the evidence we have in fact seen. This probability, called
the significance level of the sample data with respect to the statement, is then interpreted.
If it is large, we conclude that the evidence against the statement is weak, since we must
acknowledge that, in a presumed world in which the statement is true; our studies would
frequently provide such evidence purely due to our exposure to sampling error. However,
if this probability is small, we conclude that the evidence at hand is quite different from
that which we would expect to see if the statement were true, i.e., we conclude that the
evidence strongly argues against the statement’s truth, and we lean towards finding the
statement “guilty.”

Just as in a criminal trial, we never conclude that the statement is “innocent” – at most,
we find it “not guilty.” In other words, our analysis leaves us in one of two camps: We
have strong evidence that the original statement is false, or we do not have such evidence.
Therefore, if we wish to make an affirmative case for a claim, we are forced to take the
opposite of that claim as the statement we put on trial. Only in this way might we
conclude, at the end, that the data – if strong evidence against the claim on trial – serves
to support the original claim.

Dear learner! What do you think when someone says Hypothesis Testing? In our day-to-
day life we are overwhelmed with various hypothetical thinking or assumptions which
are termed as hypothesis.

A statistical hypothesis is an assumption about a population parameter. This assumption


may or may not be true.

Hypothesis testing refers to the formal procedures used by statisticians to accept or


reject statistical hypotheses.

Statistics For Management II Page 76


Statistical Hypotheses
The best way to determine whether a statistical hypothesis is true would be to examine
the entire population.

There are two types of statistical hypotheses.

 Null hypothesis. The null hypothesis, denoted by H 0, is usually the hypothesis


that sample observations result purely from chance.

 Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the


hypothesis that sample observations are influenced by some non-random cause.

For example, suppose we wanted to determine whether a coin was fair and
balanced. A null hypothesis might be that half the flips would result in Heads and
half, in Tails. The alternative hypothesis might be that the number of Heads and
Tails would be very different. Symbolically, these hypotheses would be expressed
as:

H 0 : p=0.5

H 1 : p ≠ 0.5

Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given
this result, we would be inclined to reject the null hypothesis. We would
conclude, based on the evidence, that the coin was probably not fair and balanced.

Can We Accept the Null Hypothesis?

Dear learner! Some researchers say that a hypothesis test can have one of two outcomes:
you accept the null hypothesis or you reject the null hypothesis. Many statisticians,
however, take issue with the notion of "accepting the null hypothesis." Instead, they say:
you reject the null hypothesis or you fail to reject the null hypothesis.

Statistics For Management II Page 77


Why the distinction between "acceptance" and "failure to reject?" Acceptance implies
that the null hypothesis is true. Failure to reject implies that the data are not sufficiently
persuasive for us to prefer the alternative hypothesis over the null hypothesis.

3.2. Steps in Hypothesis Testing

Dear learner! Statisticians follow a formal process to determine whether to reject a null
hypothesis, based on sample data. This process, called hypothesis testing, consists of
four steps.

 State the hypotheses. This involves stating the null and alternative hypotheses.
The hypotheses are stated in such a way that they are mutually exclusive. That is,
if one is true, the other must be false.

 Formulate an analysis plan. The analysis plan describes how to use sample data
to evaluate the null hypothesis. The evaluation often focuses around a single test
statistic.

 Analyze sample data. Find the value of the test statistic (mean score, proportion,
t-score, z-score, etc.) described in the analysis plan.

 Interpret results. Apply the decision rule described in the analysis plan. If the
value of the test statistic is unlikely, based on the null hypothesis, reject the null
hypothesis.

3.3. Decision Errors and Rules in Hypothesis Testing

3.3.1 Errors in Hypothesis Testing

Dear learner! What we test in hypothesis is population parameter based on sample


information. As you know from the previous units, sample information is not the perfect
estimator of population parameter. This fact brings the concepts of errors in hypothesis
testing. The conclusion we made and the real population fact might be different, i.e.,
errors in hypothesis.

Statistics For Management II Page 78


Dear learner! The following are two types of errors in a hypothesis test.

A. Type I error. A Type I error occurs when the researcher rejects a null
hypothesis when it is true. The probability of committing a Type I error is
called the significance level. This probability is also called alpha, and is
often denoted by α.
B. Type II error. A Type II error occurs when the researcher fails to reject a
null hypothesis that is false. The probability of committing a Type II error
is called Beta, and is often denoted by β. The probability of not
committing a Type II error is called the Power of the test.

If we reject a hypothesis when it should be accepted, we say that a Type I error has
been made. If, on the other hand, we accept a hypothesis when it should be rejected,
we say that a Type II error has been made. In either case, a wrong decision or error in
judgment has occurred. In order for decision rules (or tests of hypotheses) to be good,
they must be designed so as to minimize errors of decision. This is not a simple
matter, because for any given sample size, an attempt to decrease one type of error is
generally accompanied by an increase in the other type of error. In practice, one type
of error may be more serious than the other, and so a compromise should be reached
in favor of limiting the more serious error. The only way to reduce both types of error
is to increase the sample size, which may or may not be possible.

Dear learner! In testing a given hypothesis, the maximum probability with which we
would be willing to risk a Type I error is called the level of significance, or
significance level, of the test. This probability, often denoted by α , is generally
specified before any samples are drawn so that the results obtained will not influence
our choice. In practice, a significance level of 0.05 or 0.01 is customary, although
other values are used. If, for example, the 0.05 (or 5%) significance level is chosen in
designing a decision rule, then there are about 5 chances in 100 that we would reject
the hypothesis when it should be accepted; that is, we are about 95% confident that
we have made the right decision. In such case we say that the hypothesis has been

Statistics For Management II Page 79


rejected at the 0.05 significance level, which means that the hypothesis has a 0.05
probability of being wrong.

Activity

Consider the following hypotheses that relate to the medical example mentioned earlier.

H 0 : A person is free ¿ diseaese


H a : A person has diseaese

Suppose a person takes a medical test that attempts to detect the disease. Discuss the
consequences of a Type I error and a Type II error.

3.3.2 Decision Rules in Hypothesis

Dear learner! The analysis plan includes decision rules for rejecting the null hypothesis.
In practice, statisticians describe these decision rules in two ways - with reference to a P-
value or with reference to a region of acceptance.

 P-value: The strength of evidence in support of a null hypothesis is measured by


the P-value. Suppose the test statistic is equal to S. The P-value is the probability
of observing a test statistic as extreme as S, assuming the null hypothesis is true.
If the P-value is less than the significance level, we reject the null hypothesis.

 Region of acceptance: The region of acceptance is a range of values. If the test


statistic falls within the region of acceptance, the null hypothesis is not rejected.
The region of acceptance is defined so that the chance of making a Type I error is
equal to the significance level.

The set of values outside the region of acceptance is called the region of
rejection. If the test statistic falls within the region of rejection, the null
hypothesis is rejected. In such cases, we say that the hypothesis has been rejected
at α level of significance.

Statistics For Management II Page 80


3.4. One-Tailed and Two-Tailed Hypothesis Tests

Dear learner! A test of a statistical hypothesis, where the region of rejection is on only
one side of the sampling distribution, is called a one-tailed test. One tail hypothesis test
further can be classified as right one tail test and left one tail test. The basis to decide the
type of test is mainly the sign of comparison used in the alternative hypothesis part.

For example, suppose the null hypothesis states that the mean is less than or equal to 10.
The alternative hypothesis would be that the mean is greater than 10. The region of
rejection would consist of a range of numbers located on the right side of sampling
distribution; that is, a set of numbers greater than 10.

Table 3.1: Mathematical symbols and type of test

Mathematics symbols in H a Type of test


≤∧¿ Left one tail test
≥∧¿ Right one tail test
≠ Two tail test

Example: Identify the types of tail tests for the following pairs of hypothesis:
A) Ho: P< 0.4∧Ha: P ≥ 0.45
B) Ho: P ≥ 0.12∧Ha : P<0.12
C) Ho: μ=24∧Ha: μ ≠ 24

Solution:

A) Right one tail test because the alternative hypothesis has ≥


B) Left one tail test because the alternative hypothesis has ¿
C) Two tail test because the alternative hypothesis has≠

Statistics For Management II Page 81


Activities
1. A trade group predicts that back-to-school spending will average $606.40 per
family this year. A different economic model is needed if the prediction is
wrong. Specify the null and the alternative hypotheses to determine if a
different economic model may be needed.
2. An advertisement for a popular weight-loss clinic suggests that participants in
its new diet program lose, on average, more than 10 pounds. A consumer
activist wants to determine if the advertisement’s claim is valid. Specify the
null and the alternative hypotheses to validate the advertisement’s claim.

3.5. Hypothesis Testing of Population Mean and Proportion

3.5.1. Hypothesis Testing for Single Population Mean (Large Samples)

Dear learner! Here the sample information is taken from a set of population where the
population information is fully unknown or difficult to know. Then an assumption will be
tested whether it is failed to accept or reject it. The sample taken from the population is
assumed to be large whenn>30 .

If the standard deviation of the populationδ is known, then based on the central limit
theorem, then the sampling distribution of the mean x would follow the standard normal
distribution for a large sample size.

x −μ x−μ
Z= =
The Z-statistics is given by: σx σ
√n
In this formula the numerator ( x−μ), measures how far the observed sample mean x is
from the hypothesized mean μ. The denominator σ x is the standard error of the mean so
the Z test statistics represents how many standard errors x is ¿ μ.

If the population standard deviation is unknown, then a sample standard deviation S is


used to estimate σ .The value of test statistics will be:

Statistics For Management II Page 82


x −μ
Z=
S
√n
Illustration

A packaging device is set to fill detergent powder packets with a mean weight of 5kg.
These are known to drift upwards over a period of time due to machine fault, which is not
tolerable. A random sample of 100 packets is taken and weighed. This sample has a mean
weight of 5.03kg and a standard deviation of 0.21kg. Can we conclude that the mean
weight produced by the machine has increased? Use a 5 percent level of significance.

Solution:

Ho:The mean weight has not increased∨μ=5

Ha:The meanweight has increased∨μ>5

Sample ¿(n ¿)=100 , x=5.03 kg , s=0.21 kg∧α =5 %

Here the appropriate test statistics is Z because though the population standard deviation
is unknown, the sample size is large at 100.

Decision rule: Accept the null hypothesis if the Z cal is less than Z tab

Reject
Ho

Accept Ho
Z
Ztab=1.6
0 45

Statistics For Management II Page 83


x−μ x−μ 5 .03−5
Z cal= = =1 . 428
σx σ = 0 .21
√ n √100

Z tab=1.645

Decision: Accept Ho, i.e., the mean weight does not increase

3.5.2. Hypothesis Testing for Single Population Mean (Small Samples)

Dear students, in the previous discussion we employed or used Standard normal


distribution (Z - test) because the sample size large. If the sample size is not large (n <
30), it is preferable to use student’s t - distribution. Thus, the test statistics for
determining the difference between the sample mean x and population mean is given by:

x−μ
x−μx
t= = s
sx
√n

Where s is unbiased estimation of unknown population standard deviation σ . This test


statistics has a t-distribution with n-1 degrees of freedoms.

Illustration

Suppose the average breaking strength of steel rods is specified to be 18.5 thousand lbs.
For this a sample of 14 rods was tested. The mean and standard deviation obtained were
17.85 and 1.955, respectively. Test the significance of deviation through 5% level of
significance.

Solution: Let us take the null hypothesis that there is no significant deviation in the
breaking strength of the rods, that is,

H o : μ=18.5 and H 1 : ≠18.5

Type of test: Two tail test

Statistics For Management II Page 84


n=14 , x=17.85 , s=1.955

Degree of freedom(df )=n−1=14−1=13

α =0.05

Since the tail is two tail tests, the given alpha has to be divided in to two equal parts as:
α
=0.025
2

The sample size is smaller and the population standard deviation is given as unknown
(estimated using sample deviation). Hence the appropriate test statistics to be used will be
t - test.

Decision rule: if the value of t cal is between -2.16 and 2.16, accept the hypothesis else
reject it.

Reject
Reject
Ho
Ho

Accept Ho
Z

Ztab=- Ztab=2.1
2.16
0 6

x−μ x 17 . 85−18 . 5
t cal= = =−1.24
s 1 . 955
√n √ 14

t tab=t α /2 ,13=−2. 16

Decision: There is no significant deviation of sample mean from the population mean,
i.e., accept H o .

Statistics For Management II Page 85


3.5.3. Hypothesis Testing for Single Population Proportion

Dear learner! We have seen how to conduct hypothesis tests for a mean. We now turn to
proportions. The process is completely analogous, although we will need to use the
standard deviation formula for a proportion.

Sometimes instead of testing a hypothesis pertaining to a population mean, a population


proportion (p) of values in a particular category is considered. For this random sample of
size (n) is selected to compute the proportion of success in a particular sample as follows:

Number of success∈the sample


p=
x
sample ¿ ¿ ¿
n

To conduct a test of hypothesis, it is assumed that the sampling distribution of a


proportion follows a standardized normal distribution. Then using the value of the sample
proportion p and its standard deviation (σ p), we compute the value of Z-statistics as
follows:

p− p p− p
Z= =


σp pq
n

Illustration

Suppose a manufacturer claims that at least 95% of the equipment which he supplied to a
factory conformed to the specification. An examination of the sample of 200 pieces of
equipment revealed that 18 were faulty. Test the claim of the manufacturer.

Solution:

H o : p ≥0.95∧H a : p<0.95

Statistics For Management II Page 86


18
Percent of pieces conforming the specification ( p ¿=1− =0.91
100

n=200∧level of significance ( α )=5 %

Decision rule: Accept H 0 when Z cal is less than Z tab

Reject
Ho

Accept Ho
Z

The appropriate test statistics for the distribution is:

p− p p− p
Z cal= =


σp pq
n
0.91−0.95
Z cal= =−2.67

√ ( 0.95 ) 0.05
200
Z tab=−1.45

Decision: Reject Ho because Z cal is less than Z tab which is within the area of rejection.
Hence, we conclude that the proportion of equipment conforming to specifications is not
95 percent.

3.5.4. Hypothesis Testing of the Difference Between Two Means

Dear learner! Testing the difference implies checking the presence or absence of
difference and their direction comparison of population parameter based on sample

Statistics For Management II Page 87


information taken from two different target populations. This test will check the presence
and type of difference between two independent population means based on sample mean
values difference.

Let x 1and x 2be the sample means obtained in large samples of sizes N 1 and N 2 drawn
from respective populations having means μ1 and μ2 and standard deviations σ 1 and σ 2.
Consider the null hypothesis that there is no difference between the population means
(i.e., μ1= μ2), which is to say that the samples are drawn from two populations having the
same mean.

The sampling distribution of differences in means is approximately normally distributed,


with its mean and standard deviation given by:

μ x − x =0 and σ
1 2 x −x =
σ 21 σ 22
+
N 1 N2 1 2

The test statistic will be estimated:

(x ¿ ¿ 1−x 2)−( μ x −x )
Z cal= 1 2
¿


2 2
σ σ
1 2
+
n1 n2

(x ¿ ¿ 1−x 2)−( μ x −x )
Z cal= 1 2
¿


2 2
s s
1 2
+
n1 n2

(x ¿ ¿ 1−x 2 )−(μ x −x )
t cal= 1 2
¿


2 2
s s
1 2
+
n1 n2

Illustration

Statistics For Management II Page 88


Do employees perform better at work with music playing? The music was turned on
during the working hours of a business with 45 employees. Their productivity level
averaged 5.2 with a standard deviation of 2.4. On a different day the music was turned off
and there were 40 workers. The workers' productivity level averaged 4.8 with a standard
deviation of 1.2. What can we conclude at the 0.05 level?

Solution

We first develop the hypotheses

H 0 : μ1−μ2 ≤0
H 1 : μ 1−μ2 >0
Next, we need to find the standard deviation. Recall the above formulas, we had that the
mean of the difference is:

μ x − x =μ1−μ2=0
1 2

Note: We can substitute the sample means and sample standard deviations for a point
estimate of the population means and standard deviations. Hence,

x 1−x 2=5.2−4.8=0.4


s 21 s22

2 2
2.4 1.2
sx − x = + = + =0.404
1 2
n 1 n2 45 40

Decision rule: Accept H 0 when t cal is less than t tab

Statistics For Management II Page 89


Reject
Ho

Accept Ho
Z

ttab=1.69
0
0

Now we can calculate the t-score. We have

(x ¿ ¿ 1−x 2 )−(μ x −x )
t cal= 1 2
¿


2 2
s s
1 2
+
n1 n2

( 0.4 ) −( 0 )
¿
0.404

¿ 0.988

To decide whether to accept or reject the set null hypothesis, it is mandatory to determine
both t cal and t tab and conduct comparison. t tab is the value of t score obtained from table
considering degree of freedom and level of significance α .

To calculate the degrees of freedom, we can take the smaller of the two numbers n 1 - 1
and n2 - 1. So, in this example we use 39 degrees of freedom. The t tab gives a value of
1.690 for the t 0.05 value. Notice that 0.988 is still smaller than 1.690 and the result is the
same. Since the t-score is smaller than 1.690, we fail to reject the null hypothesis and
state that there is insufficient evidence to make a conclusion about employees performing
better at work with music playing.

3.5.5. Hypothesis Test for the Difference between Two Population Proportions

Statistics For Management II Page 90


Dear students, in the previous sub-section we have discussed hypothesis testing of the
difference between two population means. Now we will discuss hypothesis testing of the
difference between two population proportions. Let two independent populations each
having proportion and standard deviation of an attribute be as follows:

Population Proportion Standard deviation


1 p1 σp 1

2 p2 σp 2

The sampling distribution of difference in sample proportions ( p1− p2 ) is based on the


assumption that the difference between two population proportions ( p1− p2 ) is normally
distributed. The standard deviation of sampling distribution of p1− p2 is given by:

p1 q 1 p 2 q 2
σ p −p = +
1 2
n1 n2

The Z statistics for the difference between two population proportions is stated as:

( p 1− p2 )−( p1− p2 )
Z=
σ p −p
1 2

In variably, the standard error σ p −p of the difference between sample proportions is not
1 2

known. Thus, when a null hypothesis states that there is no difference between the
population proportions, we combine two sample proportions ( p1∧ p2 ) to get one unbiased
estimates of population proportion as follows:

Pooled estimate:

n1 p1 + n2 p 2
p=
n1 + n2

The Z test statistics is then restated as:

Statistics For Management II Page 91


p1 −p 2
Z=
s p −p
1 2

Illustration

Suppose that a company is considering two different television advertisements for


promotion of a new product. Management believes that advertisement A is more effective
than advertisement B. Two test market areas with virtually identical consumer
characteristics are selected: advertisement A is used in one area and advertisement B in
another area. In a random sample of 60 customers who saw advertisement A, 18 had tried
the product. In a random sample 100 customers who saw advertisement B, 22 had tried
the product. Does this indicate that advertisement A is more effective than advertisement
B, if a 5 percent level of significance is used?

Solution: Ho: p 1=p 2∧Ha: p1 > p2

18 22
n1=60 , p1= =0.30; n2=100 , p 2= =0.22
60 100

Level of significance α =5 percent

Test statistics:

( p 1− p2 )−(P1−P2 )
Z=
sp −p
1 2

n1 p1+ n2 p 2 60 ×18+100 × 22
p= = =0.25
n1 + n2 60+100

Where,

1 2

sp − p = p q (
1 1
+ ); q=1− p
n 1 n2

Statistics For Management II Page 92



s p − p = 0.25 x 0.75
1 2 ( 601 + 1001 )=0.0707
Decision rule: Accept Ho if Z cal is less than Z tab

Reject
Ho

Accept Ho
Z

Substituting values in Z statistics, we have

0.30−0.22
Z cal= =1.131
0.0707

Z tab at α =0.05=1.64

Decision: There is no significance difference in the effectiveness of the two


advertisements.

Statistics For Management II Page 93


Summary

Hypothesis testing is a statistical procedure that uses sample data to determine whether a
statement about the value of a population parameter should or should not be rejected. The
hypotheses are two competing statements about a population parameter. One statement is
called the null hypothesis ( H 0), and the other statement is called the alternative
hypothesis ( H a ).

Whenever historical data or other information provides a basis for assuming that the
population standard deviation is known, the hypothesis testing procedure for the
population mean is based on the standard normal distribution. Whenever σ is unknown,
the sample standard deviation s is used to estimate σ and the hypothesis testing procedure
is based on the t distribution. In both cases, the quality of results depends on both the
form of the population distribution and the sample size. If the population has a normal
distribution, both hypothesis testing procedures are applicable, even with small sample
sizes. If the population is not normally distributed, larger sample sizes are needed. In the
case of hypothesis tests about a population proportion, the hypothesis testing procedure
uses a test statistic based on the standard normal distribution.

In all cases, the value of the test statistic can be used to compute Z cal and t cal values for
the test. These values used to determine whether the null hypothesis should be rejected. If
either of these values is less than or equal to the level of significance α, the null
hypothesis can be rejected.

Statistics For Management II Page 94


Glossary

Null hypothesis - The hypothesis tentatively assumed true in the hypothesis testing
procedure.

Alternative hypothesis - The hypothesis concluded to be true if the null hypothesis is


rejected.

Type I error - The error of rejecting H0 when it is true.

Type II error - The error of accepting H0 when it is false.

Level of significance – is the probability of making a Type I error when the null
hypothesis is true as equality.

One-tailed test - A hypothesis test in which rejection of the null hypothesis occurs for
values of the test statistic in one tail of its sampling distribution.

Test statistic - A statistic whose value helps determine whether a null hypothesis should
be rejected.

Calculated value (p-value) - A probability that provides a measure of the evidence


against the null hypothesis given by the sample.

Two-tailed test - A hypothesis test in which rejection of the null hypothesis occurs for
values of the test statistic in either tail of its sampling distribution.

Statistics For Management II Page 95


Self-Test Questions

1. Randomly 1500 selected pine trees were tested for traces of the Bark Beetle
infestation. It was found that 153 of the trees showed such traces. Test the
hypothesis that more than 10% of the Tahoe trees have been infested. (Use a 5%
level of significance.)
2. A manufacturer claimed that at least 95% of the equipment that she supplied to a
factory conformed to specifications. An examination of a sample of 200 pieces of
equipment revealed that 18 were faulty. Test her claim at significance levels of (a)
0.01 and (b) 0.05.
3. A random sample of 12 families in one city showed an average monthly food
expenditure of Birr 1380 with a standard deviation of Birr 100 and a random
sample of 15 families in another city showed an average monthly food
expenditure of Birr 1320 with a standard deviation of birr 120.test whether the
difference between the two means is significant at 0.01

4. A television research analyst wishes to test a claim that more than 50% of the
households will tune in for a TV episode. Specify the null and the alternative
hypotheses to test the claim.
True or False

1. A tentative assumption about a population parameter is called the null hypothesis

2. Type I error is the probability of accepting the null hypothesis when it is true

3. Type I error is more harmful than type II error

4. The probability of making a type I error is referred as the level of significance

Multiple choices

Statistics For Management II Page 96


1. Critical region is a region of
A. Rejection
B. Indecision
C. Acceptance
D. None of the above
2. The test statistics to test μ1−μ2 for normal population is
A. F-Test
B. t-test
C. Z-test
D. None of the above
3. For test of hypothesis Ho: μ 1 ≤ μ2∧Ha: μ 1> μ 2,the critical region at
∝=0.10∧n>30 ,is
A. Z ≤ 1.96
B. Z ≤−1.645
C. Z ≥ 1.96
D. Z ≥ 1.645

Statistics For Management II Page 97


References

Bowen Earl, Basic Statistics for Business and Economics.

Lapin, Statistics for modem business and economics.

Anderson, statistics for Business and Economics.

Lino Douglas A. and Robert D. mason, Basic statistics for Business and Economics.

Stockton and Clark, Introduction to Business and Economics Statistics.

Kohler, statistics for Business and Economics.

Neter/Wasserman, Fundamental statistics for Business and Economics.

Hank/Reitsch, understanding Business Statistics.

Van matre/Gilbreath statistics for Business and Economics.

Hoel Paul G. and Jessen Raymond, Basic Statistics for Business and Economics

Statistics For Management II Page 98


Answer Key

Chapter One
4. i, 0.7486
ii. 0.9082
5. 0.7549
6. 0.9345

Chapter Two
1. B
2. D
3. C
4. (a) 2.179
(b) -1.676
(c) 2.457
(d) -1.708 and 1.708
(e) -2.014 and 2.014

Chapter Three
True/false
1. True
2. False
3. False
4. True
Multiple choices
1. A
2. C
3. A

Statistics For Management II Page 99


Appendices

Appendix I: Standard Normal Distribution (Z-test) Table

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952

Statistics For Management II Page 100


2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.499

Appendix II: Student’s t-distribution Table

One-tailed p 0.1 0.05 0.025 0.01 0.005

Two-tailed p 0.2 0.1 0.05 0.02 0.01


df
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.92 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.44 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.86 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.25
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.35 1.771 2.16 2.65 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.06 2.485 2.787

Statistics For Management II Page 101


26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.31 1.697 2.042 2.457 2.75
31 1.309 1.696 2.04 2.453 2.744
32 1.309 1.694 2.037 2.449 2.738
33 1.308 1.692 2.035 2.445 2.733
34 1.307 1.691 2.032 2.441 2.728

Statistics For Management II Page 102

You might also like