0% found this document useful (0 votes)
15 views78 pages

Understanding Inferential Statistics

The document outlines a course on inferential statistics, focusing on drawing conclusions about populations based on sample data. Key objectives include understanding sampling methods, estimating parameters, assessing risks, and implementing hypothesis testing. The course covers topics such as sampling theory, estimation, and hypothesis testing, emphasizing the importance of representative samples and the management of sampling and non-sampling errors.

Uploaded by

k2907344
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views78 pages

Understanding Inferential Statistics

The document outlines a course on inferential statistics, focusing on drawing conclusions about populations based on sample data. Key objectives include understanding sampling methods, estimating parameters, assessing risks, and implementing hypothesis testing. The course covers topics such as sampling theory, estimation, and hypothesis testing, emphasizing the importance of representative samples and the management of sampling and non-sampling errors.

Uploaded by

k2907344
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Inferential

statistics
Prof. MOHAMMADI Ahlam

Academic year : 2024/2025


Course objectives
Statistical inference is a branch of statistics that allows conclusions to be drawn about an entire
population based solely on data obtained from a sample. Since the analysis concerns only a
part (the sample) of a larger whole (the population), it is essential to know how to assess the
risks associated with this generalization.

The main objective is to understand to what extent the results observed on a well-chosen
sample can be used to deduce the characteristics of the original population. This involves
measuring the reliability of the conclusions drawn, taking into account uncertainties and possible
errors. These uncertainties are managed using tools such as point estimates, confidence
intervals and hypothesis testing, which determine the level of confidence that can be placed in
the results.

In short, statistical inference aims to generalize observations made on a sample to make


decisions concerning the entire population, while assessing the risks of error associated with this
generalization.
Inferential statistics
Targeted competencies
Understand sampling methods and rules;

Estimate parameters from a sample and give their confidence intervals;

Assess risks as a function of sample size;

Implement a hypothesis test;


Inferential statistics
Reference Books
Inferential statistics
Presentation of the course outline
Sampling methods
Qualitative representativeness (sampling methods)
Quantitative representativeness (sample size)

Estimation
Point estimate
confidence interval estimation
Population size N Sample size n
𝒎, 𝝈, 𝝈𝟐 , 𝒑 The sample must 𝒙, 𝒔, 𝒔𝟐 , 𝒇

(population parameters) be representative (sample characteristics)
Course outline
Chapter 1: Sampling Theory

Sampling Methods
● Empirical Methods
● Probabilistic Methods
Determining Sample Size
● Factors influencing sample size.
● Calculation methods for optimal sample size.
● Practical considerations in the study context.
Use of the Bienaymé- Chebichev Inequality
● Introduction to inequality and its importance in sampling.
● Application to estimate the variability of estimates
Course outline
Chapter 1: Sampling Theory

Using the Normal Law


● Application of the normal law in the context of sampling.
● Role of the normal distribution in the construction of confidence intervals.
Sampling Distribution
● Concept of sampling distribution and its importance.
● Sampling distribution of means: characteristics and implications.
Frequency Sampling Distribution
● Understanding frequency sampling distributions.
● Applications in categorical data analysis.
Course outline

Chapter 2: Theory of Estimate

Point Estimate
● Definition and importance of point estimate.
● Examples of common point estimates (mean, proportion).
Confidence Interval Estimation
● Concept and interpretation of confidence intervals.
● Construction of confidence intervals for means and proportions.
● Impact of sample size on interval precision.
Course outline
Chapter 3: Hypothesis Testing

Introduction to Hypothesis Testing


● Basic concepts: null hypothesis and alternative hypothesis.
● Importance of hypothesis testing in decision making.
Types of Tests
● Parametric tests ( Student t-test , ANOVA): conditions of application and
interpretation.
● Non-parametric tests ( Wilcoxon test , chi-square test): when and how to use them.
Type I and II Errors
● Definition and consequences of type I and type II errors.
● Methods to minimize these errors in testing.
Power of the Test
Chapter 1:
Sampling Theory
The main purpose of sampling
A manufacturer wants to check the quality of the lamps produced by a new
production line. It is therefore necessary to evaluate the average running time of
the lamps.
How do you estimate the average life of these lamps?
We can't test all the lamps!

The head of a political party wants to estimate the proportion of militants in favor
of Mr X's candidacy for the next presidential election.
How do you calculate a candidate's popularity within a population?
It's too expensive to interview all the militants!
Importance of sampling
When it comes to collecting information on a population, there are two
possibilities:

● The first involves observing or questioning all elements of the population,


known as a complete or exhaustive survey or census.
● The second involves observing or questioning part of the population, known
as a partial survey or poll. The elements of the population that are actually
observed constitute the sample, and the operation of selecting these
elements is called sampling.

It should be noted that recourse to the second solution, i.e. the partial survey,
is the most common practice.
Population & Sample
● Population:
In statistics, a population includes every possible element that you are interested
in measuring, or the entire dataset that you want to draw conclusions about. A
statistical population can refer to any type of data, including: People,
Organizations, Objects, Events And more.
● The sample:
A sample is defined as a finite subset of a population, which constitutes a portion
of the group under investigation. The sample size denotes the total number of
items selected to form the sample.
● Fraction or sampling rate:
Proportion of population units included in the sample. It is the ratio of sample
size n to population size N.
To sum up
Advantages of Sampling
● Reduced Costs: Collecting data from an entire population can be expensive
and time-consuming. Sampling allows researchers to obtain insights at a
fraction of the cost.
● Faster Data Collection: Sampling facilitates quicker data collection and
analysis, enabling timely decision-making.
● Practical Approach: In many cases, it's impractical or impossible to study an
entire population (e.g., large-scale surveys or rare events). Sampling provides
a feasible alternative.
● Simplified Analysis: Working with a smaller sample makes data management
and analysis more straightforward, allowing for more detailed examination of
the data.
Back to the examples
● For the lamp manufacturer:
He takes a sample consisting of 130 lamps.
For each lamp, he measures the operating time.
The sample average is 36,000 hours.
An estimate for all lamps is 36,000 hours.

● For the party leader:


It forms a sample size 400 people.
Of the people selected, 250 are in favor of the proposed candidate.
An estimate of the proportion of the population in favor of Mr. X is:
250/400 = 0.625 = 62.5%

What is the quality of these two estimates?


Sampling errors
Sampling errors are statistical errors that arise when a sample does not
represent the whole population. They are the difference between the real values
of the population and the values derived by using samples from the population.

Sampling errors occur when numerical parameters of an entire population are


derived from a sample of the entire population. Since the whole population is not
included in the sample, the parameters derived from the sample differ from
those of the actual population.

They may create distortions in the results, leading users to draw incorrect
conclusions. When analystes do not select samples that represent the entire
population, the sampling errors are significant.
Sampling errors
Causes:
• Sample Size: Smaller samples are more susceptible to sampling errors, as they
may not capture the full variability of the population.
• Sampling Method: The method used to select the sample (e.g., random
sampling, stratified sampling) can influence the extent of sampling error.

Examples:
• If a survey estimates the average income of a community based on a sample of
50 individuals, the sample mean might differ from the true population mean.
• In a study of consumer preferences, if only young adults are surveyed, the
results may not accurately reflect the views of older adults, leading to a
sampling error.
Non-Sampling errors

Non-sampling errors refer to inaccuracies that arise from factors other than the
sampling process itself. These errors can occur at any stage of data collection and
analysis, affecting the validity of the results.

Impact: Non-sampling errors can lead to biased results that are not easily
quantifiable. Unlike sampling errors, they do not decrease with larger sample sizes
and can significantly affect the reliability and validity of research findings.
Non-Sampling errors
Causes:

• Measurement Error: Errors that occur when collecting data, such as using
poorly worded survey questions or faulty measuring instruments.
• Response Bias: When respondents do not provide truthful or accurate
answers, often due to social desirability or misunderstanding questions.
• Non-Response Error: When individuals selected for the sample do not
respond, leading to a potential bias if non-respondents differ from
respondents.
• Processing Error: Mistakes made during data entry, coding, or analysis can
introduce inaccuracies.
Difference between
Sampling error Non-Sampling error
What? Occurs due to the sample selected Due to the sources other than
does not perfectly represents the sampling.
population of interest.
Why? Deviation between sample mean Scarcity of data and miscalculated of
and population mean. misinterpreted of analysis.
Where? Only during sample selection From the beginning to the end.

With? Increase with sample size No relation with sample size.


Sampling Non-Sampling
errors errors

Conclusion
Both sampling errors and non-sampling errors are important to consider when
conducting research. Sampling errors are inherent to the process of taking
samples and can often be managed statistically, while non-sampling errors can
stem from various factors throughout the research process and may require
careful design and implementation strategies to minimize their impact.
Understanding both types of errors helps researchers ensure the accuracy and
reliability of their findings.
Sampling Methods
If the results of a sample survey are to be extrapolated to the entire population
under study, it is essential that the survey is conducted according to well-defined
rules, and that the calculations leading to these extrapolations are consistent with
the sampling procedure used.

The sample chosen must be as representative as possible of the population


studied; in other words, the degree of correspondence between the information
gathered and what we would learn from a comparable census of the population
depends largely on how the sample has been chosen.

Modern sampling theory proposes a fundamental distinction between


probability-based samples and non-probability-based samples.
Sampling Methods

The choice between probabilistic and non-probabilistic methods often depends


on the availability of a sampling frame.

A sampling frame is a list or set containing all the units (individuals, companies,
households, etc.) in the population from which a sample will be selected for a
survey or study. This frame serves as a reference for identifying and selecting the
members of the future sample, ensuring that each unit has a known, non-zero
chance of being selected.
Sampling Methods

Key features of a sampling frame:

• Representative of the population: The sampling frame must include all


elements of the target population, to ensure that the sample drawn is
representative.
• Identifiable and exhaustive: Each unit in the frame must be clearly identified,
and the frame must cover the entire population without omission.
• Used for sampling methods: This list is the starting point for applying
sampling techniques such as simple random sampling, stratified sampling, etc.
Probability-based samples
In probability-based sampling, every member of the population has a known,
non-zero chance of being selected. This method ensures that the sample is
representative of the population, allowing researchers to make valid inferences
about the entire population from the sample.

Key Characteristics:
• Known probability of selection: Each member of the population has a
measurable chance of being chosen.
• Random selection: The process of choosing individuals for the sample is
random, which helps to minimize bias.
• Representative sample: Probability-based sampling aims to create a sample
that accurately reflects the characteristics of the broader population.
Types of Probability-Based Sampling
(1) Simple Random Sampling
Simple Random Sampling is a fundamental probability-based sampling method
where every individual in a population has an equal and independent chance of
being selected for the sample. It is one of the simplest and most widely used
sampling techniques.

Key Features:
• Equal Probability: Each member of the population has an equal chance of
being included in the sample.
• Independence: The selection of one individual does not affect the selection of
another; each selection is independent.
• Randomness: The selection process is entirely random, ensuring no bias in the
sample choice.
Types of Probability-Based Sampling
(1) Simple Random Sampling

How It Works:

• Population Definition: First, define the population from which the sample will
be drawn.
• Assign Numbers: Each individual in the population is assigned a unique
number or identifier.
• Random Selection: Using a random number generator, lottery system, or
another randomization method, select the required number of individuals for
the sample.
Types of Probability-Based Sampling
(1) Simple Random Sampling

Example 1:

Imagine a company has 1,000 employees, and the HR department wants to


survey 100 of them. If they assign a number to each employee and then use a
random number generator to pick 100 employees, this would be an example of
simple random sampling.
Types of Probability-Based Sampling
(1) Simple Random Sampling
Types of Probability-Based Sampling
(1) Simple Random Sampling
Example 2:

A simple random sample of 𝟓 companies is desired from a population of 𝟐𝟐


companies. The sampling frame of these companies is available.
We take an extract from a random number table, for example:
We randomly select a number from the table, let's assume it's 06121. As N = 22,
we'll retain the first group of 2 digits, which gives the N°s: 06, then 12; 19; 17; the
numbers 82,77 and 92 are unusable. The fifth company will be the one with the
Number: 10.
Types of Probability-Based Sampling
(1) Simple Random Sampling

Question:

A high school has a boarding school with 90 boarders. Each week, 5 of these
boarders are drawn at random to clear the tables in the dining hall after each
meal. Each week, the random number generator below is used to draw these 5
students.
92200 99401 54473 34336 82786
What is the list of numbers resulting from this draw?
Types of Probability-Based Sampling
(2) Stratified sampling
Stratified sampling is a technique that involves subdividing a heterogeneous
population, of size N, into p more homogeneous subpopulations or “strata” of size
Ni such that:
𝐍 = 𝐍𝟏 + 𝐍𝟐 + ⋯ + 𝐍𝐩
A sample of size ni is then taken independently from each stratum, using a
sampling plan of the user's choice. In most cases, Simple Random Sampling is
used within each stratum.

Stratification can lead to appreciable gains in precision. It also facilitates data


collection operations and provides information for different parts of the
population.
Types of Probability-Based Sampling
Types of Probability-Based Sampling
(2) Stratified sampling

There are two possible approaches to distributing the total sample size, 𝑛, among
the different strata:

• The first solution, known as proportional, consists in keeping the same


sampling fraction in each stratum.
• A second solution, known as optimal, takes into account the survey budget.
Types of Probability-Based Sampling
(2) Stratified sampling

i. Proportional Allocation:

In proportional allocation, the sampling fraction (the proportion of the population in each
stratum that is selected) remains the same across all strata. This means that the size of the
sample from each stratum is proportional to the size of the stratum in the population.
Formula
If the total population size is N, and the population of stratum i is Ni , then the sample size for
stratum i, ni , is given by:
𝐧𝐢 𝐍𝐢 𝐍𝐢
= ⟺ 𝐧𝐢 = 𝐧 ×
𝐧 𝐍 𝐍
Types of Probability-Based Sampling
(2) Stratified sampling
Proportional Allocation: Example
In a population of 10,000 companies, divided into 5,000 small companies, 3,000 medium-
sized companies and 2,000 large companies, we want to have a sample of 500 companies.
With the proportional allocation principle, the sampling fraction is constant:
500
𝑓= = 0.05 = 𝟓%
10000
Stratum Stratum size Sample size
Small 5000 5000 ∗ 𝟎, 𝟎𝟓 = 250
Medium 3000 3000 ∗ 𝟎, 𝟎𝟓 = 150
Large 2000 2000 ∗ 𝟎, 𝟎𝟓 = 100
Total N = 10000 n = 500
Types of Probability-Based Sampling
(2) Stratified sampling
Advantages of Proportional Allocation:
• Simplicity: This method is straightforward and easy to implement.
• Representativeness: Since each stratum’s sample size is proportional to its
representation in the population, the sample reflects the population’s overall
structure.

Disadvantages of Proportional Allocation:


• Not Optimal for Variability: It does not take into account the variability within
each stratum, which may lead to inefficient estimates if some strata are more
variable than others.
Types of Probability-Based Sampling
(2) Stratified sampling

ii. Optimal Allocation:

In optimal allocation, the sample size for each stratum is determined based on
both the size of the stratum and the variability within the stratum. This method
seeks to minimize the sampling variance or maximize the precision of the
estimates while considering the survey budget.
Formula
The sample size for stratum i, 𝒏𝒊, in optimal allocation is calculated as:
𝐍𝐢 𝛔𝐢
𝐧𝐢 = 𝐧 ×
σ 𝐍𝐣 𝛔𝐣
Types of Probability-Based Sampling
(2) Stratified sampling
Advantages of Optimal Allocation:
• Precision: Optimal allocation provides more precise estimates by assigning
larger sample sizes to strata that have greater variability.
• Efficient Resource Use: It takes into account the survey budget, allocating
resources where they will have the greatest impact on reducing uncertainty.
Disadvantages of Optimal Allocation:
• Complexity: This method requires knowledge of the variability in each stratum,
which may not always be available or easy to estimate.
• Higher Costs: It may lead to higher costs if larger samples are needed in strata
with more variability or smaller populations.
Types of Probability-Based Sampling
(2) Stratified sampling
Proportional Allocation VS Optimal Allocation

Conclusion

Proportional allocation is simpler and ensures that each stratum is represented


according to its size in the population, but it may not be efficient if some strata
have higher variability.
Optimal allocation is more efficient in terms of reducing sampling variance and
improving precision but requires more information and is more complex to
implement. It also takes into account the survey budget, focusing resources on
strata that contribute the most to the precision of estimates.
Exercise 1:
Proportional allocation in stratified sampling
A company wishes to conduct a customer satisfaction survey in three distinct regions:
North, South and East. The aim is to sample a total of 600 customers.
The number of customers in each region is as follows:
● North region: 3,000 customers,
● South region: 5,000 customers
● Eastern region: 2,000 customers
The company decides to use stratified sampling with proportional allocation. The
sample size for each region must therefore be proportional to the total population of
each region.
1. Calculate the sample size for each region according to the proportional allocation
principle.
2. Once you've determined the number of employees to select from each region,
explain how you would conduct the random selection within each group.
Exercise 2:
Optimal allocation in stratified sampling
A company wants to conduct a satisfaction survey among its employees in three
departments: Production, Marketing and Human Resources. The number of
employees in each department and the variability of satisfaction (measured by the
standard deviation of responses) are as follows:
● Production: 600 employees, standard deviation of 3.
● Marketing: 200 employees, standard deviation of 4.
● Human Resources: 100 employees, standard deviation 5.
The company decided to select a total sample of 150 employees using an optimal
distribution.
1. Calculate the sample size for each department using the optimal distribution
formula.
2. Explain why optimal allocation is more advantageous than proportional allocation
in this case.
Types of Probability-Based Sampling
(3) Multistage sampling
Multistage Sampling is a sampling method that involves selecting samples in
multiple steps, or "stages," progressively breaking down the target population
into smaller and more manageable subgroups. This technique is especially
useful when the population is large or geographically dispersed, and when direct
sampling is too costly or logistically challenging.

At each stage, a sample is drawn, starting from larger, more general groups and
moving toward more specific, smaller units. For instance, if you're surveying a
country's population, the first stage might involve selecting regions, the second
stage might focus on cities within those regions, and subsequent stages could
narrow the focus to specific neighborhoods and then households.
Types of Probability-Based Sampling
(3) Multistage sampling
Types of Probability-Based Sampling
Process steps of multistage sampling:

• First degree (or level): The population is divided into large groups or subsets
(called primary units or first-degree units). These groups may be geographic
regions, schools, companies, or other sub-groups depending on the survey.
• Second stage: A sample of sub-units is selected from the selected primary
units. For example, after selecting regions, a number of cities or
neighborhoods within these regions could be selected.
• Third stage (if necessary): If the secondary unit is still too large or
heterogeneous, we continue to subdivide into smaller and smaller units,
selecting a sample at each stage. This can continue over several stages until
more reasonable unit sizes, such as households or individuals, are reached.
Types of Probability-Based Sampling

Example of Multistage sampling:

Suppose an organization wants to conduct a household survey in a country:


• First Stage: Sampling begins by selecting regions or provinces.
• Second Stage: Within the selected regions, a certain number of cities are
chosen.
• Third Stage: In each selected city, specific neighborhoods are chosen.
• Fourth Stage: Finally, households in the selected neighborhoods are sampled
for the survey.
Types of Probability-Based Sampling
Advantages of Multistage Sampling:
• Cost and logistical efficiency: This method allows resources to be concentrated
on specific subgroups, reducing travel and data collection costs.
• Flexibility: It is adaptable based on the structure of the population or study. This
makes it effective for handling large or geographically dispersed populations.
• Improved accuracy: Each stage of sampling helps to control diversity within
subgroups, improving the overall precision of estimates.
Disadvantages of Multistage Sampling:
• Increased complexity: The more stages involved, the more complex the design
and management of the sampling process.
• Potential loss of precision: Each stage of sampling introduces additional
potential for error, especially if the groups are not well-defined.
Types of Probability-Based Sampling
(4) Systematic sampling
Systematic sampling is a method of sample selection in which the elements of
the population are selected according to a predefined regular interval, called the
selection interval. Once this interval has been determined, elements are selected
at fixed intervals throughout the population.

This method is particularly popular for its simplicity and efficiency. It is often used
when the population is organized in the form of a list or file. For example, if you
have an ordered list of 1,000 people and you wish to select 100, you would choose
a random starting point, then select every k-th person until you reach the desired
sample. The step k is determined by dividing the total population size by the
desired sample size.
Types of Probability-Based Sampling
Types of Probability-Based Sampling
(4) Systematic sampling
If the elements of the population are arranged randomly (with no particular
pattern), systematic sampling produces a sample equivalent to that obtained by
simple random sampling. In other words, both methods have the same
probability of selecting representative samples, since the absence of order in the
population guarantees that each unit has an equal chance of being selected,
regardless of the method.

However, if the elements of the population follow a regular pattern or trend,


systematic sampling can offer greater precision than simple random sampling. In
fact, by selecting elements at fixed intervals, this method enables variations in
the population to be captured more regularly, thus improving the
representativeness of the sample.
Types of Probability-Based Sampling

Steps of the Systematic Sampling Process:

• Define the sample size: Start by determining the required sample size (n) and
the total population size (N).
• Calculate the sampling interval: The sampling interval k is calculated as
follows: k = N/n. This interval represents how many elements to skip between
two selections.
• Select a random starting point: Choose a random starting point within the
first k elements of the population.
• Select the elements: From the chosen starting point, select every k-th
element until the required sample size is obtained.
Types of Probability-Based Sampling

Systematic sampling: Example

Imagine a school has a list of 1,000 students, and the administration wants to
survey 100 of them about their satisfaction with the cafeteria services.
Desired sample size (n): 100 students; Total population size (NNN): 1,000 students.
k=N/n​=1000/100​=10
This means every 10th student will be selected.
Let’s say the administration randomly selects the 4th student from the list as the
starting point.
Starting from the 4th student, the administration will select every 10th student:
4th, 14th, 24th, 34th, 44th, ..., up to 994th.
Types of Probability-Based Sampling
Advantages of Systematic Sampling:
• Simplicity: This Systematic sampling is easy to implement. Once the sampling
interval is determined and a random starting point is selected, the process of
selecting the sample is straightforward.
• Efficiency: It requires less time and effort compared to random sampling
methods, as the selection follows a regular pattern. This makes it faster to gather
data.
Limitations:
• Requires a Complete List: This method necessitates having a complete and
ordered list of the population, which may not always be available.
• Less Randomness: While systematic sampling is efficient, it is less random than
simple random sampling. This can lead to a lack of variability in the sample,
potentially affecting the results.
Non Probability-Based Sampling

Non-probability-based sampling methods refer to approaches for selecting


samples from a population where not every individual has a known or equal
opportunity to be included in the sample. This contrasts with probability
sampling, where each member of the population has a defined chance of being
selected, ensuring that the sample is representative of the entire population.

Since individuals do not have an equal chance of being selected, samples


obtained by these methods may introduce a bias. This can lead to over- or under-
representation of certain groups within the sample, which can undermine the
validity of the conclusions drawn from the study.
Non Probability-Based Sampling
(1) Convenience sampling

Incidental sampling, also known as convenience sampling, is a method of sample


selection in which participants are chosen on the basis of ease of access or
availability. This method relies on the selection of individuals who are most easily
reached by the researcher, without regard to the representativeness of the
sample in relation to the total population.

Selection process: In accidental sampling, the researcher selects participants


based on their availability at the time of data collection. For example, this might
include people met in a public place, colleagues, or friends. This approach is fast
and inexpensive, as it does not require complex planning or considerable
resources to reach specific participants...
Non Probability-Based Sampling
Non Probability-Based Sampling
Example of Accidental Sampling:
Let's imagine a researcher who wants to study coffee-drinking habits among
university students. To do this, he decides to use accidental sampling by visiting a
cafeteria on the university campus during lunchtime.

Sampling process
• Choice of location: The researcher goes to the cafeteria, a place frequented by
many students, to maximize the number of potential participants.
• Participant selection: The researcher begins by approaching nearby students,
asking them questions about their coffee consumption. He makes no effort to
ensure that participants represent different age groups, genders or fields of
study. He simply focuses on those who are present and available at the time.
Non Probability-Based Sampling
• Advantages of convenience sampling:
Accidental sampling has several advantages. Firstly, it is simple to implement,
enabling data to be collected quickly, especially in situations where time is
limited. Secondly, it is cost-effective, as it reduces the costs associated with
research, such as travel or remuneration of participants. In addition, it can be
useful for exploratory studies where initial hypotheses are being tested.

• Limitations of convenience sampling:


However, accidental sampling has significant limitations. Since participants are
not selected at random, the samples obtained may be biased and fail to reflect
the diversity of the population. This can compromise the validity of results and
make it difficult to generalize conclusions to a wider audience. It is therefore
important to interpret the results of studies based on this method with caution.
Non Probability-Based Sampling
(2) Priori sampling
A priori sampling is a method of sample selection that relies on predetermined
criteria established before data collection begins. Unlike other sampling
methods where participants are chosen randomly or based on their availability, a
priori sampling utilizes information or assumptions about the population to
determine who will be included in the sample. This approach is often used in
studies where obtaining data on specific groups or particular characteristics is
necessary.

Participants are selected according to predefined criteria based on


demographic, behavioral, or other relevant factors. For example, a researcher
might decide to include only individuals aged 18 to 30 who regularly use a certain
product.
Non Probability-Based Sampling

Example of Priori Sampling:

A researcher conducts a study on the use of fitness apps among young adults.
Before starting data collection, the researcher defines specific inclusion criteria:
• Participants must be between the ages of 18 and 30.
• Participants must be residents of a specific city to control for geographical
factors.
The researcher then uses social media platforms, inviting members who match
the age and usage criteria to participate in the study.
The resulting sample is made up of individuals who meet the a priori criteria,
enabling the researcher to gather specific information.
Non Probability-Based Sampling

Advantages of Priori Sampling:


• Precise Targeting: It allows for the targeting of specific groups relevant to the
study, thereby increasing the relevance of the collected data.
• Time and Resource Efficiency: It avoids spending time selecting participants
who do not meet the inclusion criteria, making the process more efficient.
Disadvantages of Priori Sampling:
• Selection Bias: Since participants are chosen based on predefined criteria,
there is a risk of bias that may limit the generalizability of the results to the
entire population.
• Limited Flexibility: The a priori approach may restrict the ability to explore
other perspectives or characteristics that were not initially considered.
Non Probability-Based Sampling
(3) Snowball sampling

A Snowball sampling is a sample selection method used mainly in hard-to-reach


or hidden populations. The technique is based on the principle that initial
participants, called “nodes”, recommend other potential participants, thus
creating a chain or “snowball” of recruits.

This approach is particularly useful for studying minority groups or specific


populations where it is difficult to obtain a representative sample using
traditional methods.
Non Probability-Based Sampling
(3) Snowball sampling
Non Probability-Based Sampling
Features of Snowball Sampling:

.• Initial recruitment: The process begins by identifying a few participants who


meet the study's inclusion criteria. These initial participants may be selected
randomly, or their selection may be made through social networks or expert
recommendations.
• Recommendations: Once the initial participants have been identified, they are
invited to recommend others who might also meet the study criteria. Each
new participant can in turn recommend other individuals, gradually
expanding the sample.
• Sample enlargement: This method continues until the researcher has
reached a sufficient number of participants, or until new recommendations
become rare.
Non Probability-Based Sampling
Advantages of Snowball Sampling:
• Access to Hidden Populations: Snowball sampling is particularly effective for
reaching hard-to-identify or hard-to-recruit populations, such as homeless
people, drug users, or members of minorities.
• Reduced costs and time: By relying on participants' recommendations,
researchers can save time and resources by avoiding extensive searches to
identify subjects.
Disadvantages of Snowball Sampling:
• Potential bias: This type of sampling can lead to bias, as participants are often
recruited from similar social networks. This can limit the diversity and
representativeness of the sample.
• Difficulty in estimating population size: It can be difficult to estimate the true
size of the population of interest, which complicates analysis of the results and
their generalization to the whole population.
Non Probability-Based Sampling
(4) Quota sampling
Quota sampling is a sampling method that ensures certain specific
characteristics of the population are represented in the final sample. This
approach involves defining quotas based on predetermined criteria (such as age,
gender, socio-economic status, etc.) and recruiting participants until these
quotas are met. Quota sampling is often used in market research and social
studies to ensure that the sample reflects the composition of the target
population.

Quota sampling involves studying the structure of the population according to


empirically selected criteria (quotas). The sample is then constructed as a
miniature reproduction of the population based on these criteria.
Non Probability-Based Sampling
(4) Quota sampling

Once quotas have been set, individuals are selected at the interviewer's
convenience.

The criteria used to define quotas should not be too numerous. Beyond 3 criteria,
the process becomes complex.

Quotas must be based on reliable data (available statistics) indicating the


distribution of the population according to the chosen criteria. The criteria most
commonly used in market research are economic and socio-demographic (in
particular age, gender, socio-professional category, etc.).
Non Probability-Based Sampling
Characteristics of Quota Sampling:

• Definition of Quotas: Before data collection begins, the researcher


determines the characteristics of the population to be studied and establishes
quotas for each characteristic. For example, a researcher might decide to
include 50% women and 50% men in their sample or ensure that participants
come from different age groups.
• Recruitment of Participants: Participants are recruited until each quota is
filled. This process can be done using various methods, such as random
sampling, convenience sampling, or other approaches.
• Flexibility: Unlike other probabilistic sampling methods, quota sampling
allows for some flexibility in recruiting participants, which facilitates meeting
the established quotas.
Non Probability-Based Sampling
Example of Quota Sampling:

A researcher would like to construct a sample of 1000 individuals.


The population structure according to three criteria is as follows:

i. A single control variable: Age


Age Population structure Sample distribution
20-29 40% 400
30-49 35% 350
50-59 25% 250
Total 100% 1000
Non Probability-Based Sampling
Example of Quota Sampling:

ii. Two control variables: Age & Gender

Population structure

Age Gender Male Female Total

20-29 48% 52% 100%

30-49 49% 51% 100%

50-59 45% 55% 100%


Non Probability-Based Sampling
Example of Quota Sampling:

ii. Two control variables: Age & Gender

Sample Distribution

Age Gender Male Female Total

20-29 192 208 400

30-49 172 178 350

50-59 113 137 250


Non Probability-Based Sampling
Example of Quota Sampling:

iii. Three control variables: Age, Gender and Socio-professional category

Population structure
Non Probability-Based Sampling
Example of Quota Sampling:

iii. Three control variables: Age, Gender and Socio-professional category

Sample distribution
Exercise: Quota sampling method
A company wants to conduct a survey among its clients to evaluate their
satisfaction. The total population of clients is divided according to two main
criteria: gender and age group, but the age group proportions differ for each
gender.

The proportions are as follows:


● Gender: 60% of the clients are men; 40% of the clients are women.
● Age group distribution by gender:
- Men: 25% are between 18 and 30 years old; 55% are between 31 and 50 years
old; 20% are 51 years old and above.
- Women: 35% are between 18 and 30 years old; 45% are between 31 and 50
years old; 20% are 51 years old and above.
Exercise: Quota sampling method

The company wants to form a representative sample of 500 clients using the
quota sampling method.

Questions:

● How many men and how many women should be included in the sample?
● Calculate the distribution of men and women by age group.
● How many clients in total will be in each age group, regardless of gender?

You might also like