v23 Probability 90
v23 Probability 90
com
1
[Link]
ENGINEERING MATHEMATICS
3 PROBABILITY
1. PROBABILITY
1.1. DEFITITION
A. Random Experiments-
For any invention, number of experiments are done. Consider an experiment whose
results is not predictable under almost similar working condition then these experiments
are known as Random Experiments.
These are some cases of random experiments-
Case 1: If we toss a coin, then the result of the experiment whether it is going to come
head or tail is not predictable under very similar conditions.
Case 2: If we throw a dice, then the outcome of this can not be predicted with certainty
that which number is going to turn.
B. Sample Space,S –
Each random experiments of some possible outcomes, if we make a set of all the possible
outcomes of random experiments then Set ‘S’ is known as the Sample Space & each
possible outcome is Sample Point.
Case 1: If we roll a die, then set of all possible outcomes, is given by {1, 2, 3, 4, 5, 6}
then this will be the sample space of given experiment and 1, 2, 3, 4, 5 & 6 are sample
points.
Similarly, if our objective is getting odd number on rolling same die then the Sample
space will be {1, 3, 5} & for even number Sample space will be {2, 4, 6}.
Case 2: If the outcome of our experiment is to determination whether a male is married
or not then our Sample space will be {Married, Unmarried}.
C. Event,E
An event is a subset A of the sample space S, i.e., it is a set of possible outcomes.
An Event is a set of consisting some of the possible outcomes from the sample space of
the experiment.
Case 1: On tossing a coin twice, All possible outcomes (Sample space) is {HH, HT, TH,
TT} whereas {HH}{HH, TT},{HT, HH}, {HH, HT, TT} are the events.
If the event consists only single outcome then it is known as Simple Events.
If the events consist of more than one outcome then its is known as Compound Events.
Types of Events-
2
[Link]
(i)Complementary Event – Any Event EC is called complementary event of event E if it
consists of all possible outcomes of sample space which is not present in E.
Ex - If we roll a die, then set of all possible outcomes, is given by {1, 2, 3, 4, 5, 6}.
An event of getting outcome in multiple of 3 is
E (multiples of 3) = {3,6}
Then, EC = {1,2,4,5}
(ii) Equally Likely Event – if any two event of sample space are in such a way that the
chance of both the events are equal, then this type of events is known as Equally likely
events.
Ex – Chances of a new born baby to be a boy or girl is 50% means either it can be a girl
or boy.
(iii) Mutually Exclusive Events – Two events are called as mutually exclusive when
occurring of both the simultaneously is not possible.
If E1 & E2 are mutually exclusive then E 1 ⋂ E2 = ϕ
Ex – if we toss a coin then either head or tail can occur, occurrence of both simultaneously
is not possible.
(iv) Collectively Exhaustive Events - Two events are called as Collectively exclusive
when sample points of both the events incudes all the possible outcomes.
If E1 & E2 are mutually exclusive then E 1 ⋃ E2 = S
Ex – if we toss a coin & E 1 is the occurrence of head and E 2 is the occurrence of a tail.
Then both the events are collectively exhaustive because both o them collectively include
all possible outcomes.
(v) Independent Events – Two events are called as independent when occurring of 1 st
event does not affect the occurrence of 2 nd.
Ex – On rolling two dice simultaneously, occurrence of 5 in 1 st die does not affect the
occurrence of 4 in second die. Their occurrence is independent to each other.
D. Probability – If an experiment is conducted under essentially given condition upto ‘n’
times and let ‘m’ cases are favourable to an event ‘E’, then probability of ‘E’ is denoted
by P(E) & defined as
Number of favourable cases to E m
P(E) = =
Total number of Events n
P(E) == 1 − P(E)
P(E) + P(E) = 1
Example -1 A card is drawn from a deck of playing cards. What is the probability of
that the card is
(i) Face card
3
[Link]
(ii) Heart card
(iii) Face and heart card
Sol.
Total number of cards in a deck, n = 52 (sample space)
Total number of suits in a deck = 4(heart, spades, club, diamond)
Total face card (King, Queen, Jack) = 12(3 in each suit)
(i)Probability of card is face card
Number of favourable outcomes, m = 12
m 12
probability of face card = =
n 52
3
P(Face card) =
13
(ii) Probability of card is heart card
Number of heart card in a deck, m = 13
m 13
probability of Heart card = =
n 52
1
P(Heart card) =
4
(iii) Probability of card to be face and heart
Number of face card with heart suit is, m = 3
m 3
probability of Face card with Heart suit = =
n 52
3
P( Face card with Heart suit) =
52
1.2. The Axioms of Probability
Consider an Experiment whose sample space is S. For each event E of the sample space,
we associate a real number P(E). Then P is called a probability function, and P(E) the
probability of the event E, then P(E) will satisfies the following axioms.
Axiom 1 For every event E,
P(E) ≥ 0
Probability of an event can never be negative.
Axiom 2 In case of sure or certain event E,
P(E) = 1
Probability of an event with 100% surety is 1.
Axiom 3 For any number of mutually exclusive events E1, E2, ….,
P(E 1∪E 2∪E3…) = P(E 1) + P(E2) + p(E3) …..
In particular, for two mutually exclusive events E 1, E2,
P(E 1∪E 2) = P(E 1) + P(E 2)
Example – 2 A fair die is tossed once. Find the probability of a 2 or 5 turning up.
Sol.
When a fair die is rolled once, the sample space is S = {1, 2, 3, 4, 5, 6}.
Since die is fair thus, we assign equal probabilities to each sample points,
4
[Link]
1
P(1) = P(2) = ... P(6) =
6
The event that either 2 or 5 turns up is indicated by (2 ∪ 5).
Therefore,
1 1 1
P(2 5) = P(2) + P(5) = + =
6 6 3
1.3. Some Important Theorems on Probability
From the above axioms we can now prove various theorems on probability
Theorem 1: For every event E,
0 ≤ P(E) ≤ 1,
i.e., a probability is between 0 and 1.
Theorem 2: P(Φ) = 0
i.e., the impossible event has probability zero.
Theorem 3: If EC is the complement of E i.e. that event E will not happen, then
P(EC) = 1 – P(E)
DeMorgan’s Law
C
i =n i =n
1. Ei = Eic
i =1 i =1
C
i =n i =n
2. Ei = EiC
i =1 i =1
Ex.
let E1, E2 are two events,
then
m 3
probability of Face card with Heart suit = =
n 52
3
P( Face card with Heart suit) =
52
C
(E1 E2 ) = E1C E2C
De-Morgan’s law is often used to find the probability of neither E 1 nor E2.
Corollary:1
From theorem 3
If EC is the complement of E, then
P(EC) = 1 – P(E)
And from De-Morgen’s theorem
5
[Link]
C
(E1 E2 ) = E1C E2C
(
P E1C )
E2C = P (E1 ( C
E2 ) ) = 1 − P (E
1 E2 )
6
[Link]
Similarly,
P (E1 E2 )
P (E1 | E2 ) = P(E2 ) 0
P(E2 )
P (E2 | E1 ) = P (E2 )
Similarly,
P (E1 | E2 ) = P (E1 )
(ii) If he failed in mathematics then what is the probability that he failed in Chemistry too
P (M C) 0.1 2
P (C | M) = = =
P(M) 0.25 5
(iii) the probability that he neither failed in mathematics nor in chemistry
( C) = 1 − P(M C)
P M
P (M C ) = 0.70
7
[Link]
Example -4 A box A contains 2 white and 4 black balls. Another box B contains 5 white
and 7 black balls. A ball is transferred from the box A to the box B. Then a ball is drawn
Sol.
The probability of drawing a white ball from box B will depend on whether the transferred
transferring the black ball). There are now 5 white and 8 black balls in the box B.
5
Then the probability of drawing white ball from box B is .
13
Thus, the probability of drawing a white ball from urn B, if the transferred ball is black
4 5 10
P(Whiteball) = =
6 13 39
Similarly,
If white ball is transferred from box A to box B, its probability is 2/6( probability of
transferring the White ball). There are now 6 white and 7 black balls in the box B.
The probability of drawing a white ball from Box B, if the transferred ball is white
2 6 2
P(Whiteball) = =
6 13 13
10 2 16
Hence required probability = = + = .
39 13 13
Theorem 9: Baye’s Theorem
It is an extended form of Conditional probability.
Suppose that E 1, E2, E3 …….Em are the mutually exclusive events whose union is the
P(En ) P E
En
P (En | E ) =
n
P(Ei ) P EEi
i =1
In general form,
P (A E) P (A E)
P ( A | E) = =
P(E) P (A E ) + P (B E)
P ( A | E) =
P(A) P E ( A) (using theorem 8 & 9)
8
[Link]
2. PROBABILITY DISTRIBUTION
(ii) P(xi ) = 1
(iii) Mean of Random variable, μ(or E)
E(x) =μ = xiP(xi )
9
[Link]
2 = xi2P(xi ) + 2 − 2
2 = xi2P(xi ) − 2
(v) Standard deviation, σ (SD) – it is square root of the variance.
It is the measure of variation amongst data
Types of Discrete distributions are
(i) Binomial Distribution
(ii) Poission distribution
(iii) Geometric distribution
(C) Continuous Random Variables
A non-discrete random variable X is said to be absolutely continuous, or simply continuous, if
its distribution function may be represented as
x
F(x) = P(X x) = − f ( x ) dx (– < x < )
It follows from the above that if X is a continuous random variable, then the probability that X
takes on any one particular value is zero.
Whereas the interval probability that X lies between two different values, say, a and b, i s given
by
b
P(a X b) = a f ( x ) dx
b
P(a X b) = P(a X b) = P(a X b) = P(a X b) = a f ( x ) dx
Some examples of continuous distribution area as follows
(i). Normal Distribution
(ii). Exponential Distribution
(iii). Uniform Distribution
D) Properties of Expectation and Variance:
If x1 and x2 are two random variance and a and b are constants,
E (ax1 + b) = a E(x1) + b
V (ax1 + b) = a2 V(x1)
E (ax1 + bx2) = a E(x1) + bE(x2)
V (ax1 + bx2) = a2V(x1) + b2V(x2) + 2ab cov(x1, x2)
10
[Link]
Where cov(x1, x2) represents the covariance between x1 and x2 which is the ratio of standard
deviation and mean.
If x1 and x2 are independent, then cov(x1, x2) = 0
Hence, above formula reduces to
V(ax1 + bx2) = a2V(x1) + b2V(x2)
If x1 and x2 are independent, then
E(x1 x2) = E(x1 ) E( x2)
Example –5 A game is to be played with a single fair die. In this game a player wins Rs.20 if
a 2 turns up, Rs.40 if a 4 turns up; loses Rs.30 if a 6 turns up. While the player neither wins
nor loses if any other face turns up. Find the expected sum of money to be won.
Sol.
Since the die is fair, thus probability of each number to be turning up will be equal (1/6)
outcome 1 2 3 4 5 6
0 2
E (X) = − xf ( x ) dx + 0 xf ( x ) dx + 2 xf ( x ) dx
0 2 1
E (X) = − x.0 dx + 0 x 2 x dx + 2 x.0dx
2 1
E(X) = 0 x 2 x dx
11
[Link]
2
2 x2 x3
E(X) = 0 2
dx =
6
0
4
E(X) =
3
Example -7 The density function of a random variable X is given by
1
x 0x2
f (x ) = 2
0 otherwise
Then calculate- E(3X2 – 2X)
Sol.
(3x )
E(3X2 – 2x) = 2
− 2x f (x ) dx
−
( )
E 3x2 − 2x = 3 x2
2
0
1
2
2 1
x dx − 2 x x dx
0 2
2 2
x4 x3
( 2
E 3x − 2x = 3 )
8
−2
6
0 0
10
E(3X2 – 2x) =
3
2.1. Binomial Distribution –
Suppose that we have an experiment such as tossing a coin or rolling a die repeatedly or
choosing a marble from an urn repeatedly. Each toss or selection is called a trial. In any
single trial there will be a probability associated with a particular event such as head on
the coin, 4 on the die, or selection of a particular colour of marble.
In some case this probability will not change from one trial to the next (as in tossing a
coin or die). Such trials are then said to be independent and are often called Bernoulli
trials.
Let p be the probability that an event will happen in any single Bernoulli trial (called the
probability of success). Then q = 1 – p is the probability that the event will fail to happen
in any single trial (called the probability of failure). The probability that the event will
happen exactly x times in n trials (i.e., x times successes and (n – x) times failures will
occur) is given by the probability function
n!
f(x) = P(X = x) = nC x px qn − x = px qn − x
x! (n − x ) !
where,
the random variable X denotes the number of successes in n trials and x = 0, 1, . . . . n.
Case – 1
12
[Link]
When p = q,
P(X = 2) = nC px qn − x
x
2 6 −2
1 1
P(X = 2) = 6C
2 2 2
2 6 −2
6! 1 1 15
P(X = 2) = =
2!4! 2 2 64
Example – 9 Find the probability that in five tosses of a fair die, ‘3’ will appear
(a) twice, (b) at most once, (c) at least two times.
Sol.
Let the random variable ‘x’ be the number of times a 3 appears in five tosses of a fair
die. We have
1
Probability of appearance of ‘3’ in a single toss = p =
6
5
Probability of not appearance of ‘3’ in a single toss = q = 1 – p =
6
2 3
5 1 5 625
(a) P(3 occurs twice) = P(X = 2) = C =
2 6
6 3888
13
[Link]
2 3 3 2 4 1 5 0
1 5 1 5 1 5 1 5
P(X 2) = 5C + 5C +5C + 5C
2 6 3 4 5
6 6 6 6 6 6 6
625 125 25 1
P(X 2) = + + +
3888 3888 7776 7776
763
P(X 2) =
3888
2.1.1. Some Properties of the Binomial Distribution
Example - 10 If the probability of a defective bolt is 0.1, find (a) the mean, (b) the
standard deviation, for the number of defective bolts in a total of 400 bolts.
Sol.
Given,
Number of bolts under inspection, n = 400
probability of a bolt to be defective, p = 0.1
probability of a bolt to be non-defective, q = 0.9
(a) Mean = np = (400) (0.1) = 40.
i.e. we can expect 40 bolts to be defective.
(b) Variance, 2 = npq = (400)(0.1 )(0.9) = 36.
x e −
F(x) = P(X = x) = x = 0, 1, 2,
x!
where (>0) is a given positive constant. This distribution is called the Poisson distribution
and a random variable having this distribution is said to be Poisson distributed.
2.2.1. Some Properties of the Poisson Distribution
Variance 2 =
Standard deviation =
From the table, we can see that expected value and variance is same for poission’s
distribution.
14
[Link]
Example -11 If the probability that an individual will suffer a bad reaction from injection
of a given serum is 0.001, determine the probability that out of 2000 individuals,
Let X denote the number of individuals suffering a bad reaction which is Poisson
distributed, i.e.,
x e −
P(X = x) = where = np = (2000)(0.001) = 2
x!
23 e −2
P(X = 3) = = 0.180
3!
(b) Probability that more than 2, individuals will suffer a bad reaction.
P(X > 2) = 1 – [P(X = 0) + P(X = 1) + P(X = 2)]
20 e −2 21 e −2 22 e −2
=1− + +
0 ! 1! 2 !
P(X = k) = pqk −1
Example -12 Find the probability that in successive tosses of a fair die, a 3 will come up
for the first time on the fifth toss.
15
[Link]
Sol.
Given,
Number of toss before in which success achieved, k = 5
1
Probability of getting in a dice, p =
6
1 5
q=1–p=1- =
6 6
4
1 5
P(X = 5) =
6 6
625
P(X = 5) =
7776
2.4. Normal Distribution:
One of the most important examples of a continuous probability distribution is the normal
distribution, some-times called the Gaussian distribution.
The density function for this distribution is given by
2
− ( x − )
1 22
f(x) = e –<x<
2
where and are the mean and standard deviation, respectively.
The corresponding distribution function is given by
2
− ( x − )
1 x 22
F(x) = P(X x) =
e
2 −
dx
If X has the distribution function given by above equation, we say that the random
variable X is normally distributed with mean and variance 2.
2.4.1. Standard normal distribution –
If we replace μ =0 & σ =1 then normal distribution will reduce to standard normal
distribution.
In such cases the density function for Z will be reduced to
1 2
f(Z) = e−z 2
2
This is often referred to as the standard normal density function.
The corresponding distribution function is given by
1 z −u2 2 1 1 z −u2 2
F(z) = P(Z z) =
2
− e du =
2
+
2
0 e du
16
[Link]
In this graph we have indicated the areas within 1, 2, and 3 standard deviations of the
mean (i.e., between z = – 1 and + 1, z = –2 and +2, z = –3 and +3) as equal,
respectively, to 68.27%, 95.45% and 99.73% of the total area, which is 1.
This means,
P(– 1 Z 1) = 0.6827 = 68.27%
P(– 2 Z 2) = 0.9545 = 95.45%
P(– 3 Z 3) = 0.9973 = 99.73%
2.5. Exponential Distribution:
It is a continuous random variable whose density function is given by
e–x x0
f (x) =
0
x0
1
mean,µ =
1
var iance, 2 =
2
1
s tan dard deviation, =
2.6. Continuous Uniform Distribution
In general, we say that X is a uniform random variable on the interval (a. b) if its
probability density function is given by:
1
if
f(x) = −
0 otherwise
17
[Link]
The distribution given by above density function is uniform distribution.
Since f(x) is a constant, all values of x between α and are equally likely (uniform).
Graphical Representation:
1
−
= xdx
+
=
2
+
E(x) = =
2
( − )2
2 = V(X) =
12
3. STATISTICS
(i) Introduction
Statistics deals with the method of collection, classification and analysis of numerical data for
drawing valid conclusion and making reasonable decision. It is a branch of mathematics which
gives us the tools to deal with large quantities of data.
In this method of calculation, we find a representative value for the given data. This value is
called the measure of central tendency.
(i) mean (arithmetic mean)
(ii) median
(iii) mode
These are the three measures of central tendency
Measure of central tendency indicates an average value of given data.
But, the measures of central tendency are not sufficient to give complete information about a
given data. Variability is another factor which is required to be studied under statistics.
Like ‘measures of central tendency’ a single number is assigned to describe variability of the
data. This single number is called a ‘measure of dispersion.
18
[Link]
(i) Standard deviation
(ii) Variance
(iii) Coefficient of Variation
(iv) Range
‘Measures of Dispersion’ denotes the scattering of the data from a fixed point and that fixed
point is measure of central tendency. It tells about how data is closely packed around the
central mean value
3.1. Arithmetic Mean
3.1.1 Arithmetic Mean for Raw Data
Arithmetic mean is simply the average of the given data that is ratio of sum of the data
or observation divided by total number of observations.
If X1, X2, X3…………Xn are the observations
Then arithmetic mean will be given as
X1 + X2 + X3 + ..... + Xn
mean =
n
It is denoted by X
Thus, it can also be written as,
x=
x
n
x - arithmetic mean
x - refers to the value of an observation
n - number of observations.
Example - 13 The number of mobiles sell by a shop owner in a day for last week is as
follows 4, 5, 15, 2,12,7, 11.
Calculate the average number mobile sale in a day.
Sol.
4 + 5 + 15 + 2 + 12 + 7 + 11 =56
Number of days in a week, n = 7
means, x =
x = 56 = 8
n 7
So, on average shop owner sold 8 mobiles in a day last week.
3.1.2 The Arithmetic Mean for Grouped Data (Frequency Distribution)
if x1, x2, …… xn are observations with respective frequencies f 1, f2… ….,fn then this means
observation x1 occurs f1 times, x2 occurs f2 times, and so on, then mean of the data will
be given as
f1X1 + f2X2 + f3X3 + ..... + fnXn
mean, X =
f1 + f2 + f3 + ....... + fn
19
[Link]
x=
(f.x)
f
Example – 14 The marks obtained by 25 students of Class X of a certain school in a
Mathematics paper consisting of 100 marks are presented in table below. Find the mean
of the marks obtained by the students.
Sol.
20
[Link]
th
N+1
Median = value
2
That is if we arrange data in ascending order, then middle term will be median of the
given data.
However, if n is even, we have two middle points
th th
n n
2 value + 2 + 1 value
Median =
2
That is if we arrange data in ascending order, then mean of the two middle term will be
median of the given data.
3.2.2. Median for Grouped Data
1. Identify the median class which contains the middle observation
N + 1 th
observation
2
This can be done by observing the first class in which the cumulation frequency is equal
to or more than
N+1
. Here. N = f = total number of observations.
2
2. Calculate Median as follows:
N + 1
2 − (f + 1)
Median = L + h
fm
Where,
L = Lower limit of median class
N = Total number of data items = f
f = Cumulative frequency of the class immediately preceding the median class
fm = Frequency of median class
h = difference between upper limit and lower limit of median class
Example – 15 in a class of 10 students, weights of the students (in kg) are 37, 41, 39,
32, 31, 57, 51, 47, 45, 40. Then median of the given weights is.
Sol.
Given weights are 37, 41, 39, 32, 31, 57, 51, 47, 45, 40
Arranging them in their ascending order
31, 32, 37, 39, 40, 41, 45, 47, 51, 57
Since the number of observations, n =10, are even
Thus,
th th
n n
2 value + 2 + 1 value
Median =
2
21
[Link]
th th
10 10
2 value + 2 + 1 value
Median =
2
5th value + 6th value
Median =
2
40 + 41
Median = = 40.5
2
Example - 16 A survey regarding the heights (in cm) of 51 girls of Class X of a school
was conducted and the following data was obtained:
Here, N = 51,
N + 1 51 + 1
Hence, = = 26
2 2
26 comes in the range of 145 – 150 (29 > 26)
So L = 145
f = 11, fm = 18
h=5
so
N + 1
2 − (f + 1)
Median = L + h
fm
22
[Link]
51 + 1
2 − (11 + 1)
Median = 145 + 5
18
Meadian = 148.88
That means, 50% of the student’s height is less than 148.88 and 50% of the student’s
height is higher than 148.88m.
3.3. Mode –
Mode is defined as the value of the variable which occurs most frequently i.e. the value
of maximum frequency.
3.3.1 Mode for Raw Data
In a raw data, most frequently occurring data is mode of that data.
Suppose in a given set of data,
X1 occurs n1 times, X 2 occurs n2 times, X 3 occurs n3 times………, X n occurs nn
And n1 > n2 > n3 >………> nn
Then occurrence of X 1 is highest, thus mode of the given data will be X 1.
If there is more than one data which having same & highest frequency, then each of them
is a mode.
Thus, we have Unimodal (single mode), Bimodal (two modes) and Trimodal (three
modes) data sets.
Example - 16
Find the mode of the data set: 45, 45, 65, 55, 45, 55, 50.
Solution:
Arrange in ascending order: 45, 45, 45, 50, 55, 55, 65
Data frequency
45 3
50 1
55 2
65 1
Here, 45 coming most number of time i.e. 3
Hence, mode of the given data = 45
3.3.2 Mode for Grouped Data
Mode is that value of x for which the frequency is maximum.
In a grouped frequency distribution, it is not possible to determine the mode by looking
at the frequencies. Here, we can only locate a class with the maximum frequency, called
the modal class.
The mode is a value inside the modal class, and is given by the formula:
f1 − f0
Mode = L + h
2f1 − f0 − f2
23
[Link]
Where,
L = Lower limit of the modal class
f0 = Largest frequency (frequency of Modal Class)
f1 = Largest Frequency in the class preceding the modal class
f2 = Frequency of the class succeeding to the modal class
h = Width of the modal class (interval)
Example: 17 A survey conducted on 20 households in a locality by a group of students
resulted in the following frequency table for the number of family members in a
household.
8−7
Mode = 3 + 2
28 −7 −2
Mode = 3.2857
3.4. Properties of Mean, Mode & Median -
In symmetrical distribution, mean, mode & median coincides, but for an unsymmetrical
distribution all are different and related by an empirical formula
Empirical mode = 3 median –2 mean
3.5. Skewness - skewness measure the degree of asymmetry.
There are three types of frequency distributions.
Depending upon the asymmetry, distribution curve can be of 3 types.
(i) Positively skewed distribution
(ii) Symmetric distribution
24
[Link]
(iii) Negatively skewed distribution
In positively skewed distribution, frequency curve has longer tail to the right i.e. mean is
to the right of the mode.
Mode Median Mean
In negatively skewed distribution, frequency curve has longer tail to the left i.e. mean is
to the left of the mode.
Mean Median Mode
25
[Link]
indicated graphically in Figure. For the case of two continuous distributions having the
same mean .
x=
x i
n
then, x1 − x, x2 − x, x3 − x....xn − x are the deviations of the values of x from x .
Then Variance of these data will be given as
2
2
=
(x i − x)
n
It can be shown that
(x )
2
(x
2
2 i − x) i + x2 − 2xix
= 2 =
n n
2
=
x i
2
+
x 2
−
2x x i
n n n
Since X i is a constant value,
So,
x 2
= x2
1 n
= x2 = x2
n n n
2xx i
= 2x
xi
= 2x x = 2x 2
n n
By putting these values in the above equation
2 =
x i
2
+ x2 − 2x2
n
1 2
2 =
n
x i
2
−x
2
(x − x) 1 2
2
=
n
i
=
n
x i
2
−x
26
[Link]
n xi2 − ( xi )2
2 =
n2
The above expression represents the variance whereas square root of the variance will
give the standard deviation.
Variance is represented by σ 2 whereas standard deviation is represented by σ.
=+
(x i − x)2
=
x i
2
− x2
=
n xi2 − ( xi )2
n n n2
Example - 18 calculate the mean and standard deviation for the following:
size 6 7 8 9 10 11 12
Sol.
Size of item x X2
6 36
7 49
8 64
9 81
10 100
11 121
12 144
Σx = 63 Σx = 595
2
∴ mean =
x = 63 = 9
n 7
Standard deviation, = +
(x i − x)2
=
x i
2
− x2
=
n xi2 − ( xi )2
n n n2
=
x i
2
− x2
=
595 − 92
n 7
= 8.569
var iance = 2 = 73.428
3.6.2. Standard Deviation for Grouped Data
Standard deviation & variance can be explained from below example
Example – 19
27
[Link]
Sol.
f = 41 fx = 3232.5 fx 2
= 257806.25
x= fx = 3232.5 = 78.841
f 41
n fixi2 − ( fixi )
2
fx 2
n2
where n = f
41 257806.25 − (3232.5)
2
=
412
= 8.484
Variance = σ 2 = 71.98
3.6.3 Standard deviation of the combination of two groups –
If m1, σ1 are the mean & standard deviation of a sample size of n 1 and m 2, σ2 are the mean
& standard deviation of a sample size of n 2
Then, mean, m & standard deviation, σ of combined sample size n1 + n2 I given by
where,D1 = m1 − m
D2 = m2 − m
28
[Link]
Therefore, such comparisons are done by using a relative measure of dispersion called
coefficient of variation (CV).
Coefficient of var iation,CV =
where is the standard deviation and μ is the mean of the data set.
CV is often represented as a percentage,
CV% = 100
When comparing data sets, the data set with larger value of CV% is more variable (less
consistent) as compared to a data set with lesser value of CV%.
4. CORRELATION
x=
x , y=
y
n n
Their standard deviation is given as
x = +
(x i − x)2
=
x
i
2
− x2
=
n xi2 − ( xi )2
n n n2
y = +
(y i − y)2
=
y
i
2
− y2
=
n yi2 − ( yi )2
n n n2
Then,
Covariance of x, y is defined as
Cov(x, y) =
(x − x)(y − y)
n
29
[Link]
The sign of covariance between x and y determines the sign of the correlation coefficient. The
standard deviations are always positive. If the covariance is zero, the correlation coefficient is
always zero
And coefficient of correlation denoted by ‘r’ & defined as
r=
(x − x)(y − y)
n x y
r=
(x − x)(y − y)
(x − x)2 (y − y)2
Which can also be rewritten as
n xy − x y
r=
n x2 − ( x ) n y2 − ( y )
2 2
30
[Link]
Student A B C D E F G H I J
x=
x = 990 = 99
n 10
y=
y = 980 = 98
n 10
31
[Link]
A 105 6 101 3 36 9 18
B 104 5 103 5 25 25 25
C 102 3 100 2 9 4 6
D 101 2 98 0 4 0 0
E 100 1 95 –3 1 9 –3
F 99 0 96 –2 0 4 0
G 98 –1 104 6 1 36 –6
H 96 –3 92 –6 9 36 18
I 93 –6 97 –1 36 1 6
J 93 –7 94 –4 49 16 28
Total 990 0 980 0 170 140 92
2 2
Σ ( x − x ) = 170, Σ ( y − y ) = 140
Σ ( x − x ) (y − y) = 92.
As we know ,
r=
(x − x)(y − y)
(x − x)2 (y − y)2
Substituting these values in the formula ,
We get
92
r= = 92/154.3 = 0.59.
(170 140)
5. LINES OF REGRESSION
When comparing two different variables, two questions come to mind: “Is there a relationship
between two variables?” and “How strong is that relationship?” These questions can be
answered using regression and correlation. Regression answers whether there is a relationship
and correlation answers how strong the linear relationship is.
It frequently happens that the dots of the scatter diagram generally, tend to cluster along a
well-defined direction which suggests a linear relationship between the variables x and y. Such
a line of best-fit for the gives distribution of dots is called the line of regression.
There are two such lines, one giving the best possible mean values of y for each 8pecified value
of x and the other giving the best possible mean values of x for given values of y. The former
is known as the line of regression of y on x and the latter as the line of regression of x on y.
Consider first the line of regression of y on x.
Let the straight line satisfying the general trend of n dots in a scatter diagram be
y = a + bx
y = na + bx
32
[Link]
1 1
y = a + b . x
n n
y = a + bx …..(1)
y = a + bx
xy = ax +bx2
xy = ax + bx2 …..(2)
This shows that ( x, y ), i.e., the means of x and y, lie on (1).
but a (x − x) = a x − a x
x
x=
n
x = nx,
x = x 1 = nx
a (x − x) = anx − anx = 0
(x − x) (y − y) (x − x) (y − y) y XY
b= 2
== =r r =
(x − x) n2x x n x y
y
Thus, the line of best fit becomes y−y =r (x − x )
x
Note - The correlation coefficient r is the geometric mean between the two regression
coefficients.
y x
For r r = r2 .
x y
Example -21 The two regression equations of the variables x and y are x = 19.13 – 0.87y and
y = 11.64 – 0.50x.
Find (i) mean of x’s, (ii) mean of y’s and (iii) the correlation coefficient between x and y.
Sol.
Since the mean of x’s and the mean of y’s lie on the two regression lines,
we have
33
[Link]
x = 19.13 − 0.87 y …..(i)
6. SAMPLING THEORY
A small section selected from the population is called a sample and the process of drawing
sample is called sampling.
It is essential that a sample must be a random selection so that each member of the population
has the same chance of being included in the sample. Thus, the fundamental assumption
underlying theory of sampling is Random sampling.
A special case of random sampling in which each event has the same probability, P of success
and the chance of success of different events are independent whether previous trials have
been made or not, is known as simple sampling.
6.1. Objectives of sampling –
Sampling aims at gathering the maximum information about the populations with the
minimum effort, cost and time. The logic of the sampling theory is the logic of induction
in which we pass from a particular (sample) to general (population).
6.2. Sampling distribution
Consider all possible samples of size n which can be drawn. from a given population at
random. For each sample, we can compute the mean. The means of the samples will not
be identical. If we group these different means according to their frequencies, the
frequency distribution so formed is known as sampling distribution of the mean.
Similarly, we can have sampling distribution of the standard deviation etc.
While drawing each sample, we put back the previous sample so that the parent
population remains the same. This is called sampling with replacement and all the
subsequent formulae will pertain to sampling with replacement.
6.3. Standard error. The standard deviation of the sampling distribution is called the
standard error (S.E.).
Similarly, the standard error of the sampling distribution of means is called standard error
of means.
34
[Link]
The standard error is used to assess the difference between the expected and observed
values.
The reciprocal of the standard error is called precision.
If n 30, a sample is called large otherwise small. The sampling distribution of large
samples is assumed to be normal.
6.4. Testing a hypothesis -
To reach decisions about populations on the basis of sample information, we make certain
assumptions about the populations involved. Such assumptions, which may or may not
be true, are called statistical hypothesis.
By testing a hypothesis is meant a process for deciding whether to accept or reject the
hypothesis or we can say it is the process of cross checking our assumption whether it is
correct or not.
The method consists in assuming the hypothesis as correct and then computing the
probability of getting the observed sample. If this probability is less than a certain
preassigned value the hypothesis is rejected.
6.5. Errors -
If a hypothesis is rejected while it should have been accepted, we say that a Type I error
has been committed.
On the other hand, if a hypothesis is accepted while it should have been rejected, we say
that Type II error has been made.
The statistical testing of hypothesis aims at limiting the Type I error to a press signed
value (upto 5%) and to minimize the Type II error. The only way to reduce both types of
errors is by increasing the sample size so that more accurate prediction can be made but
increasing the sample size is always not possible.
6.6. Null hypothesis –
The hypothesis formulated for the sake of rejecting it, under the assumption that it is
true. is called the null hypothesis and is denoted by H o. To test whether one procedure is
better than another, we assume that there is no difference between the procedures.
Similarly, to test whether there is a relationship between two variates, we take Ho that
there is no relationship. By accepting a null hypothesis, we mean that on the basis of the
statistic calculated from the sample, we do not reject the hypothesis. It however, does
not imply that the hypothesis is proved to be true. Nor its rejection implies that it is
disproved.
6.7. Level of significance –
The probability level below which we reject the hypothesis is known as level of
significance.
The region in which a sample value falling is rejected then this region is known as critical
region.
35
[Link]
Generally, it is taken as 5%(2.5% on each side) of the normal curve or 95% of which
inside the acceptance region.
6.8. Simple sampling of attributes –
Sampling of attributes may be regarded as the selection of sample from a population
whose members possesses the attribute K.
The presence of K may be called as success.
Suppose we draw a simple sample of n items.
Since this follows normal distribution
Thus, its mean will be
m = = np
And standard deviation will be
= npq
Where p & q are the probability of success & failure respectively & n is the sample size.
If we consider the proportion of successes,
Then,
np
(i) mean proportion of success, =p
n
p q pq
(ii) standard error of the proportion of success, n =
n n n
(iii) Precision of the proportions of success = reciprocal of standard error of the proportion
n
of success,
pq
Example – 22 A coin was tossed 400 times and the head turned up 216 times. Check
whether the coin is biased or unbiased at 5% level of significance.
Sol.
Suppose the coin is unbiased.
Number of samples = 400
Head turns up = 216 times
1
Probability of turning of head in a single toss, p =
2
1
Then probability of failure, q =
2
1
So, mean or expected number of success, μ = 400 = 200
2
Observed value of success, = 216
Thus, excess of observed value over expected success = 216 – 200 = 16
1 1
Standard deviation, σ = = npq = = 400 = 10
2 2
36
[Link]
x−
for normal distribution, z =
For unbiased at significance level of 5% = 95% confidence level
For 95% confidence level. Z= 1.96
for normal distribution,
x − 200
1.96 =
10
x = 219.6
Since for 95% confidence level the max possible of heads turning up is 219.6 & in our
experiment number of head turning up is 216 which is less than maximum possible.
Thus, coin is unbiased.
37
[Link]
PRACTICE QUESTIONS
1. A box contains 2 red and 3 blue marbles. Find the probability that if two marbles are drawn at
random (without replacement), (a) both are blue, (b) both are red, (c) one is red and one is
blue.
Ans. 3/10, 1/10, 3/5
2. If at least one child in a family with 2 children is a boy, what is the probability that both children
are boys?
Ans. 1/3
3. A box contains 3 blue and 2 red marbles while another box contains 2 blue and 5 red marbles.
A marble drawn at random from one of the boxes turns out to be blue. What is the probability
that it came from the first box?
Ans. 21/31
4. Three students A, B , C write an entrance examination & their chances of clearing the exam is
1/2, 1/3, 1/4 respectively. Find the probability that atleast one of them passes.
Ans. ¾
5. A speaks the truth in 75% cases & B speaks the truth in 80% of the cases. In what percentage
of cases, they are likely to contradict each other in stating the same fact.
Ans. 0.35
6. In a bolt factory, machine A, B, C manufactures 25%, 35%, 40% of the total bolt out of which
5%, 4%, 2% are defective respectively from each machine. A bolt is drawn at random from
the product and is found to be defective. What are the probability that it was manufactured by
(a) Machine A, (b) machine B, (c) Machine C
Ans. 25/69, 28/69, 20/69
7. From 20 tickets marked from 1 to 20, one ticket is drawn at random. Find the probability that
it is marked with a multiple of 3 or 5.
Ans. 0.45
8. Ten percent of the tools produced in a certain manufacturing process turn out to be defective.
Find the probability that in a sample of 10 tools chosen at random, exactly 2 wi ll be defective
Ans. 0.1937
9. An urn holds 5 white and 3 black marbles. If 2 marbles are to be drawn at random without
replacement and X denotes the number of white marbles, find the probability distribution for
X.
Ans.
X 0 1 2
F(x) 3/28 15/28 5/14
− 2 pro. 1 3
10. A random variable X is defined by X = 3 pro. 1 2 .
1 pro. 1 6
38
[Link]
Ans. 1, 7, 6
3x2 0 x 1
11. Let X be a random variable defined by the density function f(x) =
0 otherwise
12. Find the variance & the standard deviation of the number of points that will come up on a
single toss of a fair die.
35
Ans. 35/12,
12
15
Ans. 7/2, 15/4,
2
14. A random variable X has E(X) = 2, E(X 2) = 8.
15. The crushing strength of 8 cement concrete experimental blocks, in metric tonnes per sq. cm.,
was 4.8, 4.2, 5.1, 3.8, 4.4, 4.7, 4.1 and 4.5. Find the mean crushing strength and the standard
deviation.
16. The mean of five items of an observation is 4 and the variance is 5.2. If three of the items are
Ans. 4, 7
X: 5 6 7 8 9 10 11 12 13 14 15
f: 18 15 34 47 68 90 80 62 35 27 11
find the mean, median, variance and the standard deviation.
Ans. 10.04, 10.13, 5.54, 2.35
18. The following table shows the marks obtained by 100 candidates in an examination. Calculate
the mean, median and standard deviation:
1 - 10 3
11 – 20 16
21 – 30 26
39
[Link]
31 - 40 31
41 – 50 16
51 - 60 8
****
40