0% found this document useful (0 votes)
27 views11 pages

Applied Statistics for Social Sciences

The document presents a course on statistics applied to social sciences, covering the fundamental concepts of statistics, including descriptive statistics, probability calculation, and mathematical statistics. It describes the steps of a statistical study, data collection and analysis, as well as the types of statistical characteristics. The course aims to provide students with analytical tools to describe statistical populations and interpret the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views11 pages

Applied Statistics for Social Sciences

The document presents a course on statistics applied to social sciences, covering the fundamental concepts of statistics, including descriptive statistics, probability calculation, and mathematical statistics. It describes the steps of a statistical study, data collection and analysis, as well as the types of statistical characteristics. The course aims to provide students with analytical tools to describe statistical populations and interpret the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIVERSITY OF YAOUNDE II-SOA

COURSE IN APPLIED STATISTICS TO


SOCIAL SCIENCES

Prof. Mondjeli Mwa Ndjokou

Academic year 2019-2020

Greek alphabet
Name in Uppercase letters
Lowercase Name in Uppercase letters
Lowercase
letters letters
Alpha [ No & g
Beta \ Xi ' h
Gamma ] Omicron ( i
Delta ^ Pi ) j
Epsilon _ Rho * k
Zeta ` Sigma + l
Wow a Tau , m
Theta ! b Upsilon - n
Iota " c Phi . o
Kappa # d When / p
Lambda $ e Psi 0 q
Mu % f Omega 1 r
Applied Statistics Course for Social Sciences Prof. MONDJELI

GENERAL INTRODUCTION
Statistics can be defined as a mathematical method of quantitative analysis.
of sets containing many elements. Its main supports are the analysis
digital and graphical analysis. It is a knowledge tool that allows for interpretation of
phenomena, to outline them, to measure their dimensions and to highlight the
most important aspects. It is a quantitative method, meaning that it uses the
name as a means of expression. In the language of letters, it opposes that of numbers, this
which gives it an obvious character of objectivity, better yet, of neutrality. There is no room for
value judgments regarding the observed phenomena. In this sense, it allows to disjoin
the observation of appreciation. In short, to be effective, statistics must simplify,
summarize, synthesize or better yet decompose. Moreover, what it gains in
effectiveness, she loses it in loyalty.

We distinguish three stages in the teaching of statistics:

Descriptive statistics, whose aim is to describe the observed phenomenon, to highlight


the essential is to create syntheses using a digital language. Here, we use
the elementary mathematics that is most often sufficient to interpret the
available data and inform decision-making;
The calculation of probabilities, which is essentially based on random mechanisms;
Mathematical statistics, which pertains to the study of statistical induction.

Without being exhaustive, statistics is used in many fields: in demography


(census), in economics (income, consumption, GDP, etc.), in sociology, in agronomy
(agricultural production), in industry (manufacturing control), in politics (investigation)
of opinion, pre-election survey), in medicine (test for the effectiveness of medications), etc.

The objective of this course is to provide the learner with the basic analytical tools.
allowing to describe a statistical population. At the end of this course, the student will be able
to identify the parameters of central tendency and to provide a rigorous interpretation of them,
to analyze the dispersion and concentration of a population, to study the direction and degree
of interdependence between phenomena (adjustment, correlation), to identify the importance of
time in the evolution of phenomena, ... In short, this course prepares the future decision-maker to detect
problems and to implement a start of solution in its socio-environment
economic.

The course is divided into six chapters:

Chapter 1: Statistical Series with a Single Character

Chapter 2: Numerical description of a statistical series with one characteristic

Chapter 3: Two-character statistical distributions

Chapter 4: Numerical description of statistical distributions with two characteristics

University of Yaoundé II-Soa 2 Academic year 2019-2020


Applied Statistics Course for Social Sciences Prof. MONDJELI

CHAPTER 1: UNIVARIATE STATISTICAL SERIES


CHARACTER
The objective of basic descriptive statistical tools is to provide summaries.
synthetic series of values, adapted to their type (qualitative or quantitative), and observed
on a population or a sample. In the case of a single variable, the most important concepts
Classical ones are those of median, quantile, mean, frequency, cumulative frequency, variance.
standard deviation. These concepts are associated with graphical representations: bar chart,
pie chart, cumulative chart, histogram, cumulative curve, box plot
mustache, etc.

1. STATISTICAL APPREHENSIONS
1.1. The various stages of a statistical study
This is about reviewing the various steps of a statistical study.

1.1.1. The collection of basic data


Here, we proceed first by:

the definition of the information to be obtained, indicating precisely the whole of


study. Ex: the age of all first-year students;
the obtaining of information: here two procedures exist: the census and the
Survey. The census consists of giving a questionnaire to all elements of the
population in question. The survey itself is a sufficiently representative sample.
representative sample taken from the population. It should be noted that its results are ultimately
extrapolated to the whole population;
the classification of data or information, which is done in the form of a table or
graphic representations.

1.1.2. Data analysis


Data analysis consists of:

To simplify numerous digital data by replacing them with a few


parameters (arithmetic mean, mode, median, ...);
decompose complex phenomena into their simple entities. For example, one will have to
highlights long-term trends, seasonal variations
cyclic and everything related to hazards;
to study the links between the variation of two phenomena. It concerns what is relative to the
correlations.

1.2. Population and statistical units

The primary concern of the statistician is to unambiguously define the reference set on which
will carry the observations.

University of Yaoundé II-Soa 3 Academic year 2019-2020


Applied Statistics Course for Social Sciences Prof. MONDJELI

1.2.1. The population

The population is the reference set or the set of observed statistical units.
Each element of this set is called an individual or statistical unit. It is also called the
reference of the statistical universe.

1.2.2. The character

As soon as the number of individuals becomes too large, the statistician focuses on the most important aspects.
significant. The character can therefore be defined as a classification criterion for units.
statistics. The character consequently ranks the individuals it considers as equivalents
in the same class or under the same category. E.g.: the staff of UYII can be classified
by age, by sex, by qualification.

1.2.3. The modality

Each class or category of character is called a modality.

Example:

Characters Sex Marital status


Terms Masculine Feminine Célibataire Marié Divorced

Properties

P1: The terms are incompatible or exclusive, meaning that an individual cannot
belong to two modalities at once.

P2: the terms are unambiguous, meaning they include all situations.

P3: The modalities of a character are hierarchized according to the fineness of the information.
available.

Two individuals are in the same class if and only if they have the same modality. The classes
equivalence then forms a partition of the population, that is to say a subdivision of the
population A subset E1,E2,E3…EPwho verify:

They iare all non-empty: ∀i=1,2...p,E≠i ∅ ;


lesEiare pairwise disjoint: yes and≠ j,E I∩ E j= ∅ ;
p
the meeting of theEiis equal to population E: E=
i E
.
i=1

In other words, each individual in the population is in a unique [Link] only one.

1.3. The different types of characters

We distinguish between two types of characteristics: qualitative characteristics and quantitative characteristics.

University of Yaoundé II-Soa 4 Academic year 2019-2020


Course on Applied Statistics in Social Sciences Prof. MONDJELI

1.3.1. Qualitative characteristics

A trait is said to be qualitative when its observation cannot be measured.


This means that its different modalities cannot in any case be quantified. For example: gender,
nationality, socio-professional category, marital status, ...

1.3.2. Quantitative characters

A characteristic is said to be quantitative when its observation is subject to measurement, that is to say
that its modalities are identifiable and quantifiable. Quantitative characteristics can be
discrete or continuous:

the discrete statistical variable (discontinuous): in this case, the possible values are
of isolated numbers that belong to the set of integers (set IN). For example:
the number of children per household, the age of an individual, … ;
The continuous statistical variable: here, the values are in infinite number or
belong to the set of real numbers (IR). For example: the exact age of an individual,
temperature of a body, weight, ... In this case, it is appropriate to simplify in order to
the analysis, to divide the series into several classes of equal or unequal amplitudes.

2. STATISTICAL TABLES AND GRAPHS


2.1. The role of statistical tables

Statistical tables represent a first synthesis of the information that allows for
to circumscribe the distribution, to provide the appearance and the raw perception that one has of
observed phenomenon. Their role is therefore to classify the information obtained from the collection of
data.

2.1.1. The case of qualitative series

[Link]. The terms

In this case, the terms are simply recorded by a word translating a state or a
section. Generally, the terms are preceded by a code number (the nomenclature).

[Link]. The frequencies

Absolute frequencies are the counts noted as [Link] represent the number of times that the
modality has been observed.

Relative frequencies, denoted firepresent the ratio of the populationsiby the total staff N.
fi=ni/N.

Properties:
k

P1 :
∑ n= N
i=1
i

University of Yaoundé II-Soa 5 Academic year 2019-2020


Course in Applied Statistics to Social Sciences Prof. MONDJELI

n = ∑ ni = N = 1
∑ i ∑
f=
N
I N N
P2 :

The table below shows the distribution in 2016 of L1 students classified according to their
regions of origin.

Regions n if i
Center 5
Littoral 10 TAF: after determining the total number (N), calculate the frequencies
Adamaoua 8 relativesiDetermine the modal region and interpret.
Southwest 16
Northwest 10
North 15
Total

2.1.2. The case of quantitative series

[Link]. Case of discrete quantitative series

The different modalities are easy to grasp. The following set of possible values retained is
generally ordered by ascending sort.

In a common core Statistics course, the number of formats used is counted by


the students (to be addressed in the lecture hall):

xI 0 1 2 3 4 Total
nI 10 12 14 15 5

The calculation of FCC allows us to answer the question 'how many students used x.'I
at most" or more precisely "what is the % of students who used at most xi
formats?1.

The calculation of the FCD allows us to answer the question 'how many students used xi'.
at least" or more precisely "what is the % of students who have used at least xi
formats?

[Link]. Case of continuous quantitative series

A variable is said to be continuous if it is capable of taking any real value in


an interval.

Let it be an interval [ai;bi[ :aiis the lower bound and bithe upper limit:

ai+ b i
the center of this interval cis
i=
;
2
the distance is notedhi= (b−i ai) / 2

1To determine the exact number, we go from bottom to top and subtract from the total amount the cumulative totals that stop exactly.
before the number of the concerned modality.
University of Yaoundé II-Soa 6 Academic year 2019-2020
Applied Statistics Course for Social Sciences Prof. MONDJELI

the extent or the amplitudei=bI-aI.

The distribution of a car fleet according to mileage is as follows:

Km.103 [0 ; 4[ [4 ; 8[ [8 ; 12[ [12 ; 16[ [16 ; 20[ [20 ; 24[ Total


Number of vehicles (ni15 25 40 11 9 8
TAF at home: how many vehicles have traveled less than 12,000 km? More than 8,000.
km? What type of vehicles do we observe the most according to mileage?

2.2. Graphic representations

They provide a visual image of the phenomenon and thus complement the information mentioned.
through the digital board. They have the advantage of allowing comparison between 2 phenomena
through highlighting their essential traits.

2.2.1. The case of a qualitative character

The principle consists of using absolute frequencies or relative frequencies to make...


graphical representations. They can take the form of circular or semi-sectors.
circles, organ pipes, band diagrams.

[Link]. Circular and semi-circular sectors

All sectors are drawn on the same circle, the angle at the center being proportional to
the workforce or to the frequency. Thus ifSiis the surface of a sector, we have: Si=360°(ni/N)=360°fi.
The MBF program has 45 students distributed according to their regions of origin: Littoral 8; Center
9; West 20; North 5 and Northwest 5.

ThusY=(nY/N)*360°,SD= (nD/N)*360°, and so on.

Note: The results are expressed in degrees. Regarding the semicircular sectors, the
The weighting factor is 180°.

[Link]. The organ pipes

Each pipe is characterized by a constant base and a height proportional to the number of units.
or at the frequency.

The number of students by fields of study is as follows:

Streams 2015 2016


Economy 10 15
Law 15 20
Political science 12 8

[Link]. Bar charts

It is about representing on the same band, which can be vertical or horizontal, the populations.
or the frequencies of the different modalities while respecting the principle of proportionality.

2.2.2. The case of the quantitative variable


University of Yaoundé II-Soa 7 Academic year 2019-2020
Applied Statistics Course in Social Sciences Prof. MONDJELI

[Link]. The discrete variable

For example, this is the distribution of mechanical parts according to the number of defective parts.

xi 0 1 2 3 4 5 6 Total
ni 10 12 14 15 5 3 9

Two types of diagrams can be drawn.

•The stick digraphs

They are obtained as follows: on the x-axis, each of the values is marked.
from the variable. On the y-axis, the corresponding frequencies or counts are plotted.
Each bar will be proportional to the corresponding frequency.

•The distribution function

This is about graphically representing cumulative frequencies.

⎧ 0,forx∈[0;−∞[
⎪n
Thus for F(x)= I ⎪ i for x< x
⎨N i

⎪⎩ 1 for x≥ ix sup

Thus, F(-∞) = 0 and F(+∞) = 1.

The cumulative distribution function indicates the proportion of elements in the population that
the value of the character is <xi.

[Link]. The continuous statistical variable

The graphic representation can take the form of a histogram or a function of


distribution.

The histogram: the x-axis represents the different successive classes and the axis
the frequencies corresponding to the classes.

•In the case of classes with equal amplitudes: each class is represented by a rectangle
where the width on the horizontal axis is the amplitude, and on the vertical axis, the
length is the frequency. The area of this rectangle is equal to the amplitude multiplied.
by the frequency, the total area being proportional to the total number.

The distribution of salaries of the 1000 workers of the UYII is represented in the table.
following:

Niveau de salaire.103ni Fi
[10 ; 20[ 100 0.1

University of Yaoundé II-Soa 8 Academic year 2019-2020


Course in Applied Statistics for Social Sciences Prof. MONDJELI

[20 ; 30[ 300 0.3


[30 ; 40[ 400 0.4
[40; 50[ 200 0.2
Total 1000 1
TAF: plot the frequency histogram.

•In the case of classes with unequal amplitudes: in order to draw a correct histogram, it
It is necessary to first carry out the calculation of frequency densities. This is the ratio
between frequency and amplitude :di=fi/ei.

The statistical series below represents the agricultural holdings of a region according to
of their areas :

Area (ha) nI fidi.102FCC FFD TAF: Trace histogram in this case.


[0 ; 1[ 27 Interpret the dataiclasses [0 ; 1[ and [3 ; 5[.
[1 ; 2[ 35 What is the % of farms that have
less than 10 hectares? What does the shape inspire you?
[2 ; 3[ 29
[3 ; 5[ 54 of the differential diagram (oblique to
left) ?
[5 ; 10[ 105
[10 ; 20[ 70 Observations :les classes [0 ; 1[ et [3 ; 5[
[20; 40[ 40 have the characteristic of having the same density
Total of frequency, namely 7.5. This situation
means that there is as much chance of encountering farms whose area is located
between 0 and 1 those whose area is between 3 and 5 hectares. Furthermore, this histogram is
oblique to the left or spread to the right, implying that the majority of operations are
of small size, the modal density being the class [1 ; 2[.

The cumulative distribution function in the continuous case is the cumulative frequency function or
cumulative function F, defined by:
x
∀ x ∈ IR,F(x)= f(t)dt .
∫0

It is an integral function of the step function f, which is an affine function.


pieces :
2YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYVN Y 2
;Y LY JY VN 2 Y Y
3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%VN Y Y
Thus, the ordinate f(x) of the histogram at point x is the derivative of the function F(x).
it is called the density of the statistical variable. Its graphical representation is called
integral diagram. It is a polygonal line composed of segments whose ends
the boundaries of the classes are on the x-axis and the cumulative frequencies are on the y-axis
corresponding to these limits. Its graph consists of representing the modalities on the x-axis and on
Arrange the cumulative frequencies in increasing and decreasing order.

In general, F has the following properties:

1. I xi−1< x≤ xi so F(x)= F(x i) = F i ;


University of Yaoundé II-Soa 9 Academic year 2019-2020
Applied Statistics Course in Social Sciences Prof. MONDJELI

2. F is constant in the interval [ x;x


I−1 i [ ;
3. The curve of F is step-shaped;
4. F is continuous to the left;
5. F is increasing in the broad sense: ifx < x' so then F(x)≤ F(x') ;
6. LimF(x)= 1 andLim F(x)= 0 .
x→+∞ x→−∞

NB: the intersection point of the two curves projected onto the y-axis corresponds to the
cumulative frequency 0.5; projected onto the x-axis, it indicates the median value of the
distribution.

3. THE 'SUM' AND 'PRODUCT' OPERATORS


3.1. The "sum" operator (

Let a statistical variable X that takes values X1, X2,…, X kThe sum of these values
k
X1+X2+…+Xk= ∑ XI .
i=1

i=k k
Note: when there is no ambiguity, it does not matter how you write: ∑ XI∑
or X you
i
i=1 i=1
k

∑ X∑you
1
X∑ X or
i
i
i i
.

Properties
k
P1 : ∑ a= ka
i=1

k k k k
P2 : ∑ ( Xi+ Y+i Z)=i ∑ X+i ∑ Y+I ∑ Z ilinearity
i=1 i=1i=1i=1

k b k

P3: If there exists a b such that3 H O, one can write: ∑ X=i ∑ X+ i∑ X i


i=1 i=1 i=b+1

P4: ∑ aX= a ∑X i i

∑ (aX+ b)= a ∑X+ kb (linearity).


P5: I i

NB: ∑ x ≠ ∑ x ; ∑ x y ≠ y (we do not simplify by the operator);


i I i i
∑x y ≠ ∑
⎡⎣ x∑⎤⎦ ⎡⎣y
I I I i i ⎤⎦ .
∑y iy ∑x I i

3.2. The "product" operator (

Let X be a statistical variable that takes values X.1, X2,…, XkThe product X1X2X3…Xk=
k

∏X .
i=1
i

University of Yaoundé II-Soa 10 Academic year 2019-2020


Course in Applied Statistics for Social Sciences Prof. MONDJELI

i=k k k

RQwhen no confusion is to be feared, one can write: ∏ X∏ X ∏ or


iX i
or i
or
i=1 i=1 1

youX
∏ X∏ i I
.
i

Properties.

∏ a= a ;
k
P1 :
k
P2 :
∏ aX= a ∏ X i i ;
k b k

P3: If there exists a b such that 1<b<k, we can write:∏ X i= ∏ X i. ∏ X i;


i=1 1=1 i=b+1

P4 :
∏(X }= ∏ X ∏Y i I
i
i ;

x
P5 : ∏ = ⎛ x⎞.
I i
∏ ⎜y⎟
∏y ⎝ ⎠i I

NB:
i=k

∏ ( x+ y≠ ) ∏ x+ ∏ y
i I i i
(non-linearity) ; ∏x
i

xI
we do not simplify
i=1
∏y
i yi
To do at home: Explain and expose the notions of double summation and double
product. It is necessary afterwards to make a connection with the notions of integral when the series
is ongoing.

University of Yaoundé II-Soa 11 Academic year 2019-2020

You might also like