Applied Statistics for Social Sciences
Applied Statistics for Social Sciences
Greek alphabet
Name in Uppercase letters
Lowercase Name in Uppercase letters
Lowercase
letters letters
Alpha [ No & g
Beta \ Xi ' h
Gamma ] Omicron ( i
Delta ^ Pi ) j
Epsilon _ Rho * k
Zeta ` Sigma + l
Wow a Tau , m
Theta ! b Upsilon - n
Iota " c Phi . o
Kappa # d When / p
Lambda $ e Psi 0 q
Mu % f Omega 1 r
Applied Statistics Course for Social Sciences Prof. MONDJELI
GENERAL INTRODUCTION
Statistics can be defined as a mathematical method of quantitative analysis.
of sets containing many elements. Its main supports are the analysis
digital and graphical analysis. It is a knowledge tool that allows for interpretation of
phenomena, to outline them, to measure their dimensions and to highlight the
most important aspects. It is a quantitative method, meaning that it uses the
name as a means of expression. In the language of letters, it opposes that of numbers, this
which gives it an obvious character of objectivity, better yet, of neutrality. There is no room for
value judgments regarding the observed phenomena. In this sense, it allows to disjoin
the observation of appreciation. In short, to be effective, statistics must simplify,
summarize, synthesize or better yet decompose. Moreover, what it gains in
effectiveness, she loses it in loyalty.
The objective of this course is to provide the learner with the basic analytical tools.
allowing to describe a statistical population. At the end of this course, the student will be able
to identify the parameters of central tendency and to provide a rigorous interpretation of them,
to analyze the dispersion and concentration of a population, to study the direction and degree
of interdependence between phenomena (adjustment, correlation), to identify the importance of
time in the evolution of phenomena, ... In short, this course prepares the future decision-maker to detect
problems and to implement a start of solution in its socio-environment
economic.
1. STATISTICAL APPREHENSIONS
1.1. The various stages of a statistical study
This is about reviewing the various steps of a statistical study.
The primary concern of the statistician is to unambiguously define the reference set on which
will carry the observations.
The population is the reference set or the set of observed statistical units.
Each element of this set is called an individual or statistical unit. It is also called the
reference of the statistical universe.
As soon as the number of individuals becomes too large, the statistician focuses on the most important aspects.
significant. The character can therefore be defined as a classification criterion for units.
statistics. The character consequently ranks the individuals it considers as equivalents
in the same class or under the same category. E.g.: the staff of UYII can be classified
by age, by sex, by qualification.
Example:
Properties
P1: The terms are incompatible or exclusive, meaning that an individual cannot
belong to two modalities at once.
P2: the terms are unambiguous, meaning they include all situations.
P3: The modalities of a character are hierarchized according to the fineness of the information.
available.
Two individuals are in the same class if and only if they have the same modality. The classes
equivalence then forms a partition of the population, that is to say a subdivision of the
population A subset E1,E2,E3…EPwho verify:
In other words, each individual in the population is in a unique [Link] only one.
We distinguish between two types of characteristics: qualitative characteristics and quantitative characteristics.
A characteristic is said to be quantitative when its observation is subject to measurement, that is to say
that its modalities are identifiable and quantifiable. Quantitative characteristics can be
discrete or continuous:
the discrete statistical variable (discontinuous): in this case, the possible values are
of isolated numbers that belong to the set of integers (set IN). For example:
the number of children per household, the age of an individual, … ;
The continuous statistical variable: here, the values are in infinite number or
belong to the set of real numbers (IR). For example: the exact age of an individual,
temperature of a body, weight, ... In this case, it is appropriate to simplify in order to
the analysis, to divide the series into several classes of equal or unequal amplitudes.
Statistical tables represent a first synthesis of the information that allows for
to circumscribe the distribution, to provide the appearance and the raw perception that one has of
observed phenomenon. Their role is therefore to classify the information obtained from the collection of
data.
In this case, the terms are simply recorded by a word translating a state or a
section. Generally, the terms are preceded by a code number (the nomenclature).
Absolute frequencies are the counts noted as [Link] represent the number of times that the
modality has been observed.
Relative frequencies, denoted firepresent the ratio of the populationsiby the total staff N.
fi=ni/N.
Properties:
k
P1 :
∑ n= N
i=1
i
n = ∑ ni = N = 1
∑ i ∑
f=
N
I N N
P2 :
The table below shows the distribution in 2016 of L1 students classified according to their
regions of origin.
Regions n if i
Center 5
Littoral 10 TAF: after determining the total number (N), calculate the frequencies
Adamaoua 8 relativesiDetermine the modal region and interpret.
Southwest 16
Northwest 10
North 15
Total
The different modalities are easy to grasp. The following set of possible values retained is
generally ordered by ascending sort.
xI 0 1 2 3 4 Total
nI 10 12 14 15 5
The calculation of FCC allows us to answer the question 'how many students used x.'I
at most" or more precisely "what is the % of students who used at most xi
formats?1.
The calculation of the FCD allows us to answer the question 'how many students used xi'.
at least" or more precisely "what is the % of students who have used at least xi
formats?
Let it be an interval [ai;bi[ :aiis the lower bound and bithe upper limit:
ai+ b i
the center of this interval cis
i=
;
2
the distance is notedhi= (b−i ai) / 2
1To determine the exact number, we go from bottom to top and subtract from the total amount the cumulative totals that stop exactly.
before the number of the concerned modality.
University of Yaoundé II-Soa 6 Academic year 2019-2020
Applied Statistics Course for Social Sciences Prof. MONDJELI
They provide a visual image of the phenomenon and thus complement the information mentioned.
through the digital board. They have the advantage of allowing comparison between 2 phenomena
through highlighting their essential traits.
All sectors are drawn on the same circle, the angle at the center being proportional to
the workforce or to the frequency. Thus ifSiis the surface of a sector, we have: Si=360°(ni/N)=360°fi.
The MBF program has 45 students distributed according to their regions of origin: Littoral 8; Center
9; West 20; North 5 and Northwest 5.
Note: The results are expressed in degrees. Regarding the semicircular sectors, the
The weighting factor is 180°.
Each pipe is characterized by a constant base and a height proportional to the number of units.
or at the frequency.
It is about representing on the same band, which can be vertical or horizontal, the populations.
or the frequencies of the different modalities while respecting the principle of proportionality.
For example, this is the distribution of mechanical parts according to the number of defective parts.
xi 0 1 2 3 4 5 6 Total
ni 10 12 14 15 5 3 9
They are obtained as follows: on the x-axis, each of the values is marked.
from the variable. On the y-axis, the corresponding frequencies or counts are plotted.
Each bar will be proportional to the corresponding frequency.
⎧ 0,forx∈[0;−∞[
⎪n
Thus for F(x)= I ⎪ i for x< x
⎨N i
⎪
⎪⎩ 1 for x≥ ix sup
The cumulative distribution function indicates the proportion of elements in the population that
the value of the character is <xi.
The histogram: the x-axis represents the different successive classes and the axis
the frequencies corresponding to the classes.
•In the case of classes with equal amplitudes: each class is represented by a rectangle
where the width on the horizontal axis is the amplitude, and on the vertical axis, the
length is the frequency. The area of this rectangle is equal to the amplitude multiplied.
by the frequency, the total area being proportional to the total number.
The distribution of salaries of the 1000 workers of the UYII is represented in the table.
following:
Niveau de salaire.103ni Fi
[10 ; 20[ 100 0.1
•In the case of classes with unequal amplitudes: in order to draw a correct histogram, it
It is necessary to first carry out the calculation of frequency densities. This is the ratio
between frequency and amplitude :di=fi/ei.
The statistical series below represents the agricultural holdings of a region according to
of their areas :
The cumulative distribution function in the continuous case is the cumulative frequency function or
cumulative function F, defined by:
x
∀ x ∈ IR,F(x)= f(t)dt .
∫0
NB: the intersection point of the two curves projected onto the y-axis corresponds to the
cumulative frequency 0.5; projected onto the x-axis, it indicates the median value of the
distribution.
Let a statistical variable X that takes values X1, X2,…, X kThe sum of these values
k
X1+X2+…+Xk= ∑ XI .
i=1
i=k k
Note: when there is no ambiguity, it does not matter how you write: ∑ XI∑
or X you
i
i=1 i=1
k
∑ X∑you
1
X∑ X or
i
i
i i
.
Properties
k
P1 : ∑ a= ka
i=1
k k k k
P2 : ∑ ( Xi+ Y+i Z)=i ∑ X+i ∑ Y+I ∑ Z ilinearity
i=1 i=1i=1i=1
k b k
P4: ∑ aX= a ∑X i i
Let X be a statistical variable that takes values X.1, X2,…, XkThe product X1X2X3…Xk=
k
∏X .
i=1
i
i=k k k
youX
∏ X∏ i I
.
i
Properties.
∏ a= a ;
k
P1 :
k
P2 :
∏ aX= a ∏ X i i ;
k b k
P4 :
∏(X }= ∏ X ∏Y i I
i
i ;
x
P5 : ∏ = ⎛ x⎞.
I i
∏ ⎜y⎟
∏y ⎝ ⎠i I
NB:
i=k
∏ ( x+ y≠ ) ∏ x+ ∏ y
i I i i
(non-linearity) ; ∏x
i
≠
xI
we do not simplify
i=1
∏y
i yi
To do at home: Explain and expose the notions of double summation and double
product. It is necessary afterwards to make a connection with the notions of integral when the series
is ongoing.