0% found this document useful (0 votes)
29 views5 pages

Chap 1

Uploaded by

adomibob
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views5 pages

Chap 1

Uploaded by

adomibob
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Chapter 1 – Introduction and basic concepts

Chapter 1 INTRODUCTION AND BASIC CONCEPTS


Statistics is a field of applied mathematics that is divided into two branches: descriptive statistics, the
subject of this course, that deals with collecting, organising, summarising and presenting data, and
inferential statistics that consists of using a sample to make inferences (this usually involves estimation or
hypothesis testing) about the population from which the sample was drawn. Inferential statistics is based
not only on descriptive statistics, but also on probability theory.
Statistics has so many applications that it is impossible to mention them all. Examples include medicine
(disease screening, vaccine efficacy,…), environmental sciences (climate modelling, water quality
monitoring,…), physics (thermodynamics, astrophysics and cosmology,…), economics (econometrics,
economy forecasting,…), demography (fertility, mortality,…), business (market research, risk analysis,…),
agriculture (crop yields, soil health,…), finance (company earnings, financial forecasts,…), computer science
(machine learning, artificial intelligence,…), and the list goes on.

1.1 STAGES IN A STATISTICAL STUDY


A statistical study consists of five stages:
 Designing the study. It is essential to define the purpose of the study, its scope, its type and the sources
of data. The purpose of the study is to figure out what we are trying to answer or examine. Its scope
describes the extent to which the research topic will be explored: it helps decide what data we need to
gather and what data collection tools we need to design. Its type will specify if we are dealing with a
quantitative research, a qualitative research, or a mixed method (i.e., combining the two). Data can be
gathered from primary sources or secondary sources. After narrowing down the research topic, a researcher
will formulate a hypothesis about the outcome of his or her research.
 Data collection and verification. When designing a study, much attention must be given to the process
by which data are gathered. After data are collected, they should be scrutinized for errors, outliers and
missing values.
 Data presentation. Once data are error free, they must be presented in a suitable form using a variety of
data displays and numerical summaries using descriptive statistics tools.
 Analysis of data. Data analysis involves manipulating data in order to discern trends and patterns with
the aid of statistical methods. The researcher may then test his or her hypothesis and make inferences about
the population.
 Interpretation of data. Data interpretation implies the process of giving meaning to the information that
has been gathered and drawing relevant conclusions regarding the purpose of the study.

Ali Bouchetob Descriptive Statistics 1


Chapter 1 – Introduction and basic concepts

1.2 STATISTICAL VOCABULARY


1.2.1 Population
A population is the set of all elements of interest in a particular study. A population can include people,
animals, inanimate objects, etc.
1.2.2 Sample
A sample is a subset drawn directly from a population using a pre-defined selection method. Samples are
used when population sizes are too large and collecting information about each member can be costly and
time-consuming. In order to draw valid conclusions about the population, the sample must be
representative of the whole population.
1.2.3 Census
A census is a method for collecting data from the entire population.

1.2.4 Survey
A survey is a method that involves collecting data from a sample.
1.2.5 Parameter
A parameter is a value describing some characteristic of the population.
1.2.6 Statistic
A statistic is a value describing some characteristic of the sample.
1.2.7 Variables
A variable is a characteristic of interest for each member of the population. There are two types of variables
in statistics.
a) Qualitative variables
Qualitative variables (also referred as categorical variables) describe a characteristic or attribute. They can
take on values that fit into categories. Examples of qualitative variables include sex, eye colour, etc.
Qualitative variables are divided into two types: nominal and ordinal. A qualitative variable is nominal when
there is no intrinsic ordering of its categories. For example, eye colour is a nominal variable because there
is no order among brown, blue, grey and green eyes. A qualitative variable is ordinal if its categories have a
natural order or rank. For instance, education level that takes values such as Bachelor’s, Master’s and PhD
is an ordinal variable as there is a clear order of the categories.
b) Quantitative variables
Quantitative variables (also referred as numerical variables) measure a numerical quantity or amount.
Quantitative variables may be classified into two types: discrete and continuous. A discrete variable is a
variable that can take only specific values. They arise from a counting process. Examples of discrete variables
include number of siblings in a family, number of seats in a bus, etc.

Ali Bouchetob Descriptive Statistics 2


Chapter 1 – Introduction and basic concepts

Note: Not all digit strings are numbers: a phone number is a categorical variable as it does not represent a
quantity; in other words, it makes no sense in doing mathematical operations with it.
A continuous variable is a variable that can take on any value within a certain range. They arise from a
measuring process. Examples of continuous variables include mass, speed, etc.
Sometimes, it is necessary to transform statistical variables. Let’s say we are looking at the age of a group of
people. Age is an intrinsically continuous quantitative variable.
 We may only be interested in the number of years since birth. In this case, age would be treated as a
discrete quantitative variable since it is rounded to the closest integer.
 We may be interested in levels such as children, adolescents, adults, older adults. In this case, age would
be treated as a qualitative ordinal variable.
1.2.8 Data
Data are measurements or observations. They are the actual values of a variable. A datum (or a score) is a
single measurement or observation. A data set is a collection of measurements or observations.

1.3 SAMPLING TECHNIQUES


Sampling is a process that enables researchers to choose a sample from the population to be studied.
Sampling methods can be divided into two categories: probability sampling and non-probability sampling.
1.3.1 Probability sampling methods
Probability sampling gives every member of the population an equal chance of being chosen. It is mainly
used in quantitative research.
a) Simple random sampling
In simple random sampling, every participant is selected randomly. After defining the population, a list
containing all the members must be created. Each member is then assigned a number. The size of the sample
is carefully chosen. Tools like a random number generator may be used to choose participants from the
population to make up the sample.
For instance, in a high school of 1500 students a researcher wants to select a simple random sample of size
125. He assigns every student a number from 1 to 1500 and then use a random number generator to select
125 participants.
b) Systematic sampling
In systematic sampling, every member of the population is given a number like in simple random sampling.
Let n be the number obtained by dividing the population size by the sample size. A starting point is chosen
at random from within the first n members of the population and participants are picked at regular intervals.
One way of achieving this is to choose every nth member of the population.

Ali Bouchetob Descriptive Statistics 3


Chapter 1 – Introduction and basic concepts

In the high school from the previous example, a researcher assigns every student a number from 1 to 1500.
Since n = 1500/125 = 12, he randomly selects a starting point within the first 12 numbers (say 7). So, the
researcher selects the name associated with the number 7 as the first student, and then every 12 th name
(7, 19, 31, etc.) until he gets a sample of 125 students.
c) Stratified sampling
In stratified sampling, the population is partitioned into groups called strata based on a characteristic.
Generally, the sample size of each stratum is selected in proportion to its size in the population. The
researcher then draws a random sample from each stratum and combine them to form a representative
sample.
In the high school from the previous example, there are 900 girls and 600 boys. If the relevant characteristic
is sex, the researcher will divide the whole population into two strata. Using random sampling techniques,
the researcher will choose 75 girls (since they represent 60% of the whole population) and 50 boys (40% of
the population), resulting in a representative sample of 125 students.
d) Cluster sampling
In cluster sampling, the population is divided into groups or clusters. Each cluster should have similar
characteristics to the whole population. Some of the clusters are randomly selected. The sample consists of
all the members of the selected clusters. Generally, a researcher uses this method when the population
under study is large and geographically dispersed.
Suppose that there are 60 high schools in a particular county (wilaya). Rather than travel all over the county
to collect data, a researcher randomly selects 20 high schools (these are the clusters), and then gathers data
from every high school to conduct his or her study.
1.3.2 Non-probability sampling methods
In nonprobability sampling, not every member of the population has a chance of being selected. These
methods have high risks of sampling bias. They are often used in exploratory research.
a) Convenience sampling
In convenience sampling, the researcher selects participants that are easy to reach or individuals willing to
participate in the research (i.e., volunteers). This type of sampling may not provide a representative sample,
and is unlikely to produce generalisable results.
In the high school from the earlier example, a researcher decides to collect data from a sample of 125
students entering or coming out of the library. By doing so, he leaves out every student who wasn’t at the
library entrance during the collection of data.
b) Judgement sampling
In judgement sampling, the researcher selects participants using his or her own judgement, based on the

Ali Bouchetob Descriptive Statistics 4


Chapter 1 – Introduction and basic concepts

purpose of the study. This type of sampling allows the researcher to target specific individuals who have
unique knowledge or experience. Judgement samples are at risk for sampling bias.
Let’s assume a researcher wants to know more about the reasons that make people choose a medical career.
He or she may only select people in the medical profession like physicians, pharmacists, resident doctors,
nurses … to make his or her sample.
c) Quota sampling
In quota sampling, the population is partitioned into non-overlapping groups called strata based on a
characteristic or trait, just as in stratified sampling but without random selection within each stratum. In
quota sampling, the researcher will use his or her own judgment to select participants from each stratum
based on a specific number or proportion (called quota) to produce the final sample.
For instance, a researcher wants to study the factors influencing customers’ smartphone brand preference.
Depending on the research goals, the population may be divided into different strata like employment
status, sex, educational attainment or age. Pretend that the researcher wants to focus on the individuals’
ages. He divides the population into three age groups: under 25, between 25 and 50, and over 50. Let’s
suppose a sample size of 180 people is decided upon.
The researcher may want the proportion of individuals in each age group to match that of the population.
This is known as proportional quota sampling. If 45% of the population are under the age of 25, 35% are
between 25 and 50, and 20% are over 50, then he will need a quota of 81 for the first group, 63 for the
second and 36 for the third. The sampling procedure proceeds until the predetermined quotas are reached.
The researcher may choose non-proportional quota sampling, so he may set a quota of 60 people for each
age group to facilitate comparisons between these groups. Participants are selected until the desired quota
for each age group is met.
d) Snowball sampling
Snowball sampling (or chain-referral sampling) is a method used when the population under study is hidden
or difficult to find for various reasons. A researcher begins by collecting data from a subject who can help
recruit other potential participants for the study. These potential participants in turn are contacted to
produce additional contacts. In this way, members of the sample are recruited via chain referral. The
sampling procedure proceeds until the researcher has gathered a sufficient amount of data to analyse.
Snowball samples are subject to sampling bias.
For example, a researcher wants to conduct a survey about undocumented workers. Contacting this kind of
people is difficult due to issues of confidentiality. Once the researcher has contact details of a subject, the
latter can help recruiting other individuals from among his circle of friends and acquaintances.

Ali Bouchetob Descriptive Statistics 5

You might also like