0% found this document useful (0 votes)
9 views

Lecture 2-Data Collection & Sample Design

The document discusses sampling methods and sample design. It defines key terms like population, sample, parameter and statistic. It describes advantages of sampling over collecting data from the entire population. Random sampling methods like simple random sampling are also explained. The role of sampling in statistical analysis and inference is covered.

Uploaded by

lsejeso15
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture 2-Data Collection & Sample Design

The document discusses sampling methods and sample design. It defines key terms like population, sample, parameter and statistic. It describes advantages of sampling over collecting data from the entire population. Random sampling methods like simple random sampling are also explained. The role of sampling in statistical analysis and inference is covered.

Uploaded by

lsejeso15
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

DATA COLLECTION

AND SAMPLE DESIGN


Lecture 2-Class Discussion Notes
BM & EBL Year 1
Kelebogile Kenalemang
SAMPLING
What is Sampling?
• Sampling is a tool that is used to indicate how much data to collect
and how often it should be collected.
• It is a statistical technique or process that allows for the selection of a
subset of elements from a statistical population in order to estimate
characteristics of the whole population.
• The tool defines the samples to take in order to quantify a system, a
process, a business issue, or a research problem.
Sample and Population
• The term population refers to the entire group of people or items to
which the statistical investigation relates.
• The term sample refers to a small group selected from that
population.
• In the same way, we use the term parameter to refer to a population
measure, and the term statistic to refer to the corresponding sample
measure.
• Statistical representations
• For example, if we consider our population to be the current
membership of the Institute of Directors, the mean salary of the full
membership is a population parameter.
• However, if we take a sample of 100 members, the mean salary of this
group is referred to as a sample statistic.
• In a survey of student opinion about the catering services provided by a
college, what is the target population?
• If the survey was concerned with catering services generally in colleges,
what would be the target population?
• Since it is unlikely that in either case, the survey team could collect data
from the whole population, a small group of students would be selected to
represent the population.
• The survey team would then draw appropriate inferences about the
population from the evidence produced by the sample.
• It is very important to define the population at the beginning of an
investigation in order to ensure that any inferences made are meaningful.
Finite and infinite Populations
• Populations, such as those referred to above, are limited in size, and
are known as finite populations.
• Populations which are not limited in size are referred to as infinite
populations.
• In practice, if a population is sufficiently large that the removal of one
member does not appreciably alter the probability of selection of the
next member, then the population is treated statistically as if it were
infinite.
WHY SAMPLE?
Why Sample?
• Sampling enables you to collect and analyze data for a smaller portion
of the population (sample) which must be a representative of the
entire population and then apply the results to the whole population.
• Sampling permits you to draw conclusions about very complex
situations.

• Statistical Inference
The advantages of using a Sample rather
than the Population

• Practicality-The population could be very large, possibly infinite. Hence, it


would be physically impossible to collect data from the whole population.

• Time-If the data are needed quickly, then there may not be enough time
to cover the whole population. For example, if we are concerned with
checking the quality of goods produced by a mass production process,
the delay in delivery whilst every item is checked, will be unacceptable.
• Cost-The cost of collecting data from the whole population may be
prohibitively high. In the example above, the cost of checking every item
could make the mass produced item excessively expensive.
The advantages of using a Sample rather
than the Population
• Errors-If data were collected from a large population, then the actual
task of collecting, handling and processing the data would involve a
large number of people and the risk of error increases rapidly. Hence,
the use of a sample, with its smaller data set will often result in fewer
errors.
• Tests-Which case, it is obviously undesirable to deal with the entire
population. For example, a manufacturer wishes to make a claim
about the durability of a particular type of battery. He runs tests on
some batteries until they fail to determine what would be a
reasonable claim about all the batteries.
When would we use a population rather
than a sample?
• Small populations-If the population is small, so that any sample taken would
be large relative to the size of the population, then the time, cost and
accuracy involved in using the population rather than the sample will not be
significantly different.
• Accuracy-If it is essential that the information gained from the data is
accurate, then statistical inference from sample data may not be sufficiently
reliable. For example, it is necessary for a shop to know exactly how much
money has been taken over the counter in the course of a year. It is not
sufficient, for the owner to record takings on a sample of days out of the year.
• The problem of errors is still relevant here, but any errors in the data will be
ones of arithmetic rather than unreliability of statistical estimates.
The role of Sampling in Statistical Analysis
• The use of data from a sample instead of a population has important
implications for the statistical investigation since it leads us into the
realms of statistical inference; it becomes necessary to know what
may be inferred about the population from the sample.
• What does the sample statistic tell us about the population
parameter, or what does the evidence of the sample allow us to
conclude about our belief with respect to the population?
The role of Sampling in Statistical Analysis
• How does the mean age of a sample of professional accountants
relate to the mean age of the population of professional accountants?
• If, in a sample of adults, the joggers are fitter, can we claim for the
population as a whole that jogging keeps you fit?
• With an appropriately chosen sample, it is possible to estimate
population parameters from sample statistics, and, to use sample
evidence to test beliefs held about the population.
Statistical Inference
• Statistical inference is a large and important aspect of statistics.
Information is gathered from a sample and this information is used to
make deductions about some aspect of the population.
• For example, an auditor may check a sample of a company's
transactions and, if the sample is satisfactory, he will assume, or infer,
that all the company's transactions are satisfactory.
• He uses a sample because it is cheaper, quicker and more practical
than checking all of the transactions carried out in the company.
Sample Selection
• It is extremely important that the members of a sample are selected
so that the sample is as representative of the population as possible,
given the constraints of availability, time and money.
• A biased sample will give a misleading impression about the
population.
Methods of Selecting a Sample
1. Probability/Random Sample Designs

2. Non-Probability/Non-Random Sample Designs


Probability/Random Sample Designs

• Random sampling means that every member of the population has an


equal chance of being selected for the sample.
• If the population consists of groups of members with different
characteristics, which are important in the investigation, then random
sampling should result in a sample which contains members from
each of these groups
Sampling Frame
• The first step in selecting a random sample from a finite population is to
establish a sampling frame.
• This is a list of all members of the population. It does not matter what
form the list takes as long as the individual members can be identified.
• Each member of the population is given a number, then some random
method is used to select numbers and the sample members are thus
identified.
• The representativeness of the sample depends on the quality of the
sampling frame. It is important that the sampling frame possesses the
following properties:
• It is important that the sampling frame possesses the following
properties:
Completeness - all the population members should be included in the
sampling frame. Incompleteness can lead to defects in the sample,
especially if the members which are excluded belong to the same
group within the population.
Accuracy - the information for each member should be accurate and
there should be no duplication of members.
Probability/Random Sample Designs
Simple Random Sample Design
• A simple random sample is most suitable when the members of the
population are similar for the purposes of the investigation.
• For example, a simple random sample would be suitable for selecting
a sample of 20 employees of a company to take part in a survey if we
are interested only in the fact that the individuals are employees of
this company.
How to decide on the sample size
• You need to decide how large your sample size will be. Although larger samples
provide more statistical certainty, they also cost more and require far more
work.

• There are several potential ways to decide upon the size of your sample, but one
of the simplest involves using a formula with your desired confidence interval
and confidence level, estimated size of the population you are working with, and
the standard deviation of whatever you want to measure in your population.

• The most common confidence interval and levels used are 0.05 and 0.95,
respectively. Since you may not know the standard deviation of the population
you are studying, you should choose a number high enough to account for a
variety of possibilities (such as 0.5).
Randomly selecting a sample
• This can be done in one of two ways: the lottery or random number method.

• In the lottery method, you choose the sample at random by “drawing from a
hat” or by using a computer program that will simulate the same action.

• In the random number method, you assign every individual a number. By


using a random number generator or random number tables, you then
randomly pick a subset of the population. You can also use the random
number function (RAND) in Microsoft Excel to generate random numbers.
Factors that influence the sample size
• The "right" sample size for a particular application depends on many
factors, including the following:

• Cost considerations (e.g., maximum budget, desire to minimize cost).


• Administrative concerns (e.g., complexity of the design, research
deadlines).
• Minimum acceptable level of precision.
• Confidence level.
• Variability within the population or subpopulation (e.g., stratum, cluster)
of interest.
Stratified Random Sampling

• If the populations members have different characteristics, which are


of interest, then the simple random sampling method may not give
the most representative sample for a given sample size.
• For example, in the survey of employees of a company, it may be
important to distinguish between male and female employees. A
simple random sample design could result in too many members of
one gender, unless a large sample is used.
• Stratification Variable, (a variable by which a study population is
divided into strata).
• The Stratified Random Sampling method enables us to produce a sample which more
accurately reflects the composition of the population.
• The procedure requires the sampling frame to be subdivided into the groups of interest.
• These groups are referred to as strata.
• In our example, the males and females should be identified. We need to know how many
members of the population fall into each category.
• We then use a simple random sampling method to select sample members from each stratum
separately in proportion to their number in the population.
• If the population contains 60% males and 40% females, and we are selecting a sample of 100
members, then we select 60 men from the male stratum and 40 women from the female
stratum.
• The main advantage of stratified random sampling is that we can use smaller sample sizes to
get the same results as with a simple random sample.
Steps for creating a Stratified Random
Sample
• (a) defining the population
• (b) choosing the relevant stratification
• (c) listing the population
• (d) listing the population according to the chosen stratification
• (e) choosing your sample size
• (f) calculating a proportionate stratification and
• (g) using a simple random or systematic sample to select your
sample.
Multi-Stage Sample Design

• As the name implies, the selection of the sample by this method requires several stages.
• The method is most used when the population is distributed over a wide geographical area.
• For example, the population might be the world-wide membership of the SHU Alumni Society.
• The first stage is to divide the population into a few clearly defined areas.
• In our example, these could be individual countries. The proportion of the sample allocated to these
areas is determined by the proportion of the population in each area.
• Hence, if 80% of the Alumni membership was in England and Wales, and we were looking for a
sample of 1000 members, then we allocate 800 sample members to England and Wales, as with
stratified sampling. The next stage is to define some smaller areas, these might be local government
districts and then companies within these districts. A sample is taken of the local government
districts. Within the chosen districts, a sample of the companies is selected. Finally individual
members are sampled from the chosen companies in the selected districts. With this method the
actual sampling is quicker and more convenient.
Cluster sampling design

• In previous designs we have selected items one at a time.


• In cluster sampling clusters of items are formed, which it is assumed are
reasonably representative of the whole population.
• Clusters are then randomly selected and all of the items in the cluster are
included in the sample.
• For example, suppose a large firm stores its invoices in batches of 50. If, in a year,
there are 10,000 invoices generated, then there will be 200 batches. These
batches can be used as clusters. Suppose the firm wants a sample of 300 invoices.
This could be achieved by selecting 300 individual invoices randomly from the
10,000. Alternatively, the cluster sample design allows 6 clusters to be randomly
selected from the 200 batches. This is a much easier and quicker method, but we
must be sure that there is no bias within the batches.
Systematic Sampling
• Define
• Explain the procedure for a systematic sample design
Non Probability/Non-random sample
designs

• For many surveys, especially in the area of Market Research, sampling


frames do not exist. For example, if we wish to investigate
housewives' views of a new product, it would be difficult to draw up a
sampling frame for housewives.
Quota Sampling
• Quota sampling is a commonly used sampling method in this
situation.
• Initially the important characteristics of the target population are
identified as for stratified sampling, for example, male/female, age
groups, social class, etc.
• The sample is divided proportionately into these groups as far as it is
possible, but from then on, it is left to the individual field workers to
decide how to obtain the sample members. There is no question of
identifying individuals first or choosing them randomly.
Judgement Sampling
• Judgement sampling can be used, where the researcher uses a
mixture of hunch, prior knowledge and judgement to select the
sample. There is no attempt at stratification and randomness.
• Snow balling
• Convenience Sampling
Statistical Investigations and Surveys

• If a statistical investigation, of the type referred to in the above, is to


be carried out, then the following stages are normally required:
1. Define the objectives of the survey - what information is to be
collected and for whom or what.
2. Define the target population
3. Decide on the sampling method
4. Choose an appropriate method of collecting the data - ensure that
the method will yield the required information.
5. Carry out a pilot survey - this is a 'dress rehearsal' for the full survey
and gives a guide to the suitability of the data collection method;
particularly the adequacy of any questionnaire used. Amend the
procedure as necessary.
6. Carry out the main survey
7. Analyze and present the results

You might also like