0% found this document useful (0 votes)
41 views10 pages

3and4 Main

sdfgvhbnjm,

Uploaded by

Kingsley RAMASU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views10 pages

3and4 Main

sdfgvhbnjm,

Uploaded by

Kingsley RAMASU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

§ 3 Random Variables

In Calculus a function of points, say y = a x + b, is evidently determin-


istic(so y is a deterministic) variable. In this case b is a konstant, with a
being the coefficient of the independent variable x. Our interest however
is on random variables, which we introduce with ’Set Functions’ because
r.v.’s are functions of sets. An examples of random variable is insurance
claims.

§ 3.1 Set Functions

We give the following as an example of a set function. Let A be a set


in 3 − D space and let Q (A) be the volume of A if A has finite volume;
otherwise let Q (A) be undefined. Thus if A = {( x, y, z ) : 0 ≤ x ≤ 2; 0 ≤
y ≤ 1; 0 ≤ z ≤ 3} then Q (A) = 6 and if A = {( x, y, z ) : x 2 + y 2 + z 2 ≥ 1}
then Q (A) is undefined.

§ 3.1.1 Probability Set Functions

If P ( E ) is defined for a type of subset E of the sample space S and

(a). P ( E ) ≥ 0,

(b). P ( E1 ∪ E2 ∪ E3 ∪ · · · ) = P ( E1 ) + P ( E2 ) + P ( E3 ) + · · · such that


Ei ∩ E j = ;, i 6= j,

(c). P (S ) = 1,

then P ( E ) is called a probability set function of the outcome of the ran-


dom experiment.

§ 3.2 Random Variable

Given a random experiment with a sample space S, then a function X ,


which assigns to each element e ∈ S one and only one real number
X ( e ) = x, is called a random variable. The space of X , would then be the
set of real numbers A = {x : x = X ( e ), e ∈ S}.
16 I N T R O D U C T I O N T O M AT H E M AT I C A L S TAT I S T I C S

§ 3.3 Distributions of Random Variables

§ 3.3.1 Discrete Probability Distribution Function


The discrete probability distribution function usually referred to as proba-
bility mass function(PMF) is the probability that the discrete r.v. X takes a
value x, and is denoted by PX ( x ) or in general f ( x ). Thus;

PX ( x ) = P ({X = x}) (4)


= P ( al l possi bl e out comes in the event {X = x}) (5)

Let Ω be the sample space. Applying the axioms we learnt previously, we


get that;


 PX ( x ) ≥ 0
P ({X ∈ B}) = x∈ B PX ( x ), wher e B ∈ Ω
P
P ({X ∈ B}) = (6)
 P
x∈Ω PX ( x ) = 1

Exercises & Example


1. Verify that the following functions are probability mass functions and
then determine the requested probabilities.

x -2 -1 0 1 2
(a). 1 2 2 2 1
f (x) 8 8 8 8 8

(i) P ( X ≤ 2), (ii) P ( X > −2),


(iii) P (−1 ≤ X ≤ 1), (iv) P ( X ≤ 1 or X = 2)

(b). f ( x ) = 2x25+1 , x = 0, 1, 2, 3, 4
(i) P ( X = 4), (ii) P ( X ≤ 1),
(iii) P (2 ≤ X ≤ 4), (iv) P ( X > 10)

2. Let X = the number of heads in 2 fair coin tosses. Then the P ( X >
P2
0) = 1 PX ( x ) = 0.75

§ 3.3.2 Continuous Probability Distribution Function


We recall that a r.v. X is called continuous if ∃ a function f X ( x ) with
f X ( x ) ≥ 0, called the probability density function(pdf) aka cont. prob.
R
distr. function, ∃ P ( X ∈ B ) = B f X ( x ) d x ∀ subsets B of the real line.
Specifically for the subset B = [ a, b ] we have;
Z b
P (a ≤ X ≤ b) = fX ( x )d x
a

which can be interpreted as the area under the graph of the pdf f X ( x ). So
we get;
§ 3 R A N D O M VA R I A B L E S 17

Z a
P (X = a) = fX ( x )d x = 0
a

P (a ≤ X ≤ b) = P (a < X < b) (7)


= P (a ≤ X < b) (8)
= P (a < X ≤ b) (9)
Z b
= fX ( x )d x (10)
a

Z ∞
P (−∞ < X < ∞) = f X ( x ) d x = 1 ( N or malizat ion!)
−∞

Examples

1.
0 ≤ x < 21

2,
f (x) = (11)
0 elsewher e

2
2. f ( x ) = p1 e x /2 for all real x, that is −∞ < x < +∞.

§ 3.4 Cumulative Distribution Function(CDF)


P
For a discrete r.v. X the CDF is defined as, FX ( x ) = k≤x PX ( k ) and
Rx
for a continuous r.v. X it is FX ( x ) = −∞ f x ( t ) d t.

Examples

x 0 1 2
Let the pmf of r.v. X be given by 1 1 1 then the CDF of r.v.
f (x) 4 2 4
x 0 1 2
X is given by 1 3 . Figure 1: The typical CDF of a continuous
FX ( x ) 4 4 1
random variable like X ∼ N or mal.

§ 3.5 Expectations

§ 3.5.1 Mean

The mean or expected value of a r.v. X with pdf f ( x ) is defined as;

(1). µ x = E [ X ] = Σ x x f ( x ), if X is a discrete r.v.,


R∞
(2). µ x = E [ X ] = −∞
x f ( x ) d x if X is a cont. r.v.

Figure 2: The typical CDF of a discrete


random variable.
18 I N T R O D U C T I O N T O M AT H E M AT I C A L S TAT I S T I C S

Discrete expectation
Let the pdf(pmf) of a discrete r.v. X be given by the following table;

x f (x) x f (x)
1
0 4 0
1 1
1 2 2
1 1
2 4 2

Then the expected value is given by E [ X ] = Σ x x f ( x ) = 0 ∗ 14 + 1 ∗ 12 + 2 ∗


1 1
4 = 2

Continuous expectation
Let the pdf of a cont. r.v. X be given as follows;

0≤ x <1

 x,
f (x) = 2-x, 1≤ x <2 (12)
0

elsewher e
Then the expected value of X is given by;
Z ∞
E [X ] = x f ( x )d x (13)
−∞
Z0 Z 1 Z 2 Z ∞
= 0.x d x + x.x d x + x (2 − x ) d x + x.0d x (14)
−∞ 0 1 2
x3 1 x3 2
= |0 + ( x 2 − )| (15)
3 3 1
1 4 2
= + [( ) − ( )] (16)
3 3 3
=1 (17)

Nota Bene
In general if X is a r.v. with pdf f ( x ), the mean of r.v. g ( x ) which is a
function of X is

µ = E [ g ( X )] = Σ x g ( x ) f ( x ), discr et e case,
Z ∞
µ = E [ g ( X )] = g ( x ) f ( x ), cont. case.
−∞

§ 3.6 VARIANCE

σ2 = Var ( X ) = E [( x − µ)2 ] (18)


2
=Σ x ( x − µ) f ( x ) (19)
§ 3 R A N D O M VA R I A B L E S 19

σ2 = Var ( X ) = E [( x − µ)2 ] (20)


Z∞
= ( x − µ)2 f ( x ) d x (21)
−∞

NB
The standard dev denoted σ is the positive square root of the variance.

Theorem 1 The variance of a r.v. X is given by;

σ 2 = E [ X 2 ] − µ2

Proof 3

σ2 = E [( x − µ2 )] (22)
2 2
= E [ x − 2xµ + µ ] (23)
= E [ x 2 ] − µ2 (24)

Theorem 2

Var ( a + bX ) = b2 ( Var ( X )) (25)


2
=b σ2x , which simpl y means the var iance o f a konst ant is 0.
(26)

Proof 4 E [ a + bX ] = a + bµ x

E [( a + bX )2 ] = E [ a2 + 2a b x + b2 X 2 ] (27)
2 2 2
= a + 2a bµ x + b E [ X ] (28)
= a2 + 2a bµ x + b2 (σ2 + µ2 ) (29)

Var (( a + bX )) = E [( a + bX )2 ] − [ E [( a + bX )]]2 (30)


2 2 2 2 2
= a + 2a bµ x + b (σ + µ ) − [ a + bµ x ] (31)
2 2
=b σ (32)

Covariance
Let X and Y have joint pdf f ( x, y ), the covariance between X and Y ,
C ov ( X , Y ) is;

C ov ( X , Y ) = E [( x − µ x )( y − µ y )] (33)
σ x y = E [X Y ] − µ x µ y (34)

Theorem 3 If X and Y are independent then σ x y = 0.

Proof 5 If X and Y are indep. then E [ X Y ] = E [ X ] E [ Y ],


so σ x y = E [ X ] E [ Y ] − µ x µ y = 0
20 I N T R O D U C T I O N T O M AT H E M AT I C A L S TAT I S T I C S

§ 3.7 Chebyshev’s Inequality

Chebyshev’s inequality aka Chebyshev’s Theorem, says that the probability


that the outcome of an experiment with r.v. X will fall more then k stan-
dard deviations beyond the mean of X , µ, is less then k12 . Alternatively we
can say that the theorem says that no more than k12 of the distribution’s
values can be more than k std deviations away from the mean. That is, for
r.v. X ,
1
P (|X − µ| ≥ kσ) ≤ 2 (35)
k
The above formula is derived from

σ2
P (|X − µ| ≥ k ) ≤ (36)
k2
The good/bad thing about Chebyshev’s inequality is that it can be applied
to completely arbitrary distributions(unknown except for mean and
variance).

Examples
1. Roll a single fair die and let X be the outcome. Then, µ = E ( X ) = 3.5
and var ( X ) = 35/12. Suppose we want to compute p = P ( X ≥ 6). We
already know that p = P ( X ≥ 6) = P ( X = 6) = 16 . But by the Chebyshev
inequality, we can obtain a stronger bound on p as follows,

p = P ( X ≥ 6) ≤P ( X ≥ 6 or X ≤ 1) (37)
= P ( X − 3.5 ≥ 2.5) (38)
35/12
= (39)
(2.5)2
7
= (40)
15
2. Suppose we have sampled the weights of dogs in the local animal
shelter and found that our sample has mean of 20kgs with standard
deviation of 3k gs. With the use of Chebyshev’s inequality, we know that
at least 75% of the dogs that we sampled have weights that are two
standard deviations from the mean. Two times the standard deviation
gives us 2 × 3 = 6. Subtract and add this from the mean of 20. This tells us
that 75% of the dogs have weight from 14kgs to 26kgs.
§ 4 Probability Distributions

§ 4.1 Binomial Distribution

RECALL: A r.v. X is said to be binomial if it counts the number of suc-


cesses in n Bernoulli trials, eg getting a head when tossing a (fair) coin.

f ( x ) = Bi ( n, p ) = n C x p x q n−x , q = 1 − p, x = 0, 1, 2, · · · , n (41)

That is,

n!
Bi ( n, p ) = p x q n−x , q = 1 − p, x = 0, 1, 2, · · · , n (42)
( n − x )! x !

If X is a Binomial r.v. with parameters p and n, then


µ = E [ X ] = np
σ2 = Var ( X ) = np (1 − p )

Nota Bene
n n
X X n!
f (x) = p x q n−x = 1 (43)
i =1 i =1
( n − x ) ! x !

§ 4.2 Discrete Uniform Distribution

The simplest discrete r.v. is one that assumes only a finite number of
possible outcomes, each with equal probability. A r.v. X that assumes
each of the values x 1 , · · · , x n with equal probability 1n , that is, f ( x i ) = 1n ,
i = 1, · · · , n. Suppose that X is a discrete uniform r.v. on the consecutive
integers a, a + 1, a + 2, · · · , b, for a ≤ b, then
X
µ = E [X ] = x f (x) (44)
x
b
X 1
= k( ) (45)
b−a+1
k=a
b+a
= (46)
2
22 I N T R O D U C T I O N T O M AT H E M AT I C A L S TAT I S T I C S

and the variance of X is


( b − a + 1)2
σ2 = .
12

§ 4.3 Poisson Distribution

Random variable X is said to be a Poisson r.v. if it represents the no. of


outcomes occurring in a given time interval. For a time interval t the pdf
of a Poisson r.v. is
e−λt (λt ) x
Po ( x, λt ) = , x = 0, 1, 2, 3, · · · (47)
x!
where λ is the average no. of outcomes per unit time.

Nota Bene

X
Po ( x, λt ) = 1(Al wa ys !) (48)
x =0

Theorem 4 Let X ∼ Po ( x, λt ), then E [ X ] = Var ( X ) = λt

Proof 6
X
µ = E [X ] = x f (x) (49)
x

X
= x Po ( x, λt ) (50)
x =0

X e−λt (λt ) x
= x (51)
x =0
x!
∞ −λt
X e (λt ) x
= (52)
x =1
x − 1!
∞ −λt
X e (λt )u+1
= l et t ing u = x + 1, (53)
u=0
u!

X e−λt (λt )u
=λt (54)
u=0
u!
=λt (55)

As for the variance, Var ( X ) = E [ X 2 ] − µ2 = λt

Example
On average a certain intersection results in 3 traffic accidents per month.

(a). What’s the prob. that exactly 5 accidents will occur in a given
month?
§ 4 PROBABILITY DISTRIBUTIONS 23

(b). What’s the prob. that less then 3 accidents will occur in 3 months?

(c). What’s the prob. that at least 4 accidents will occur in 2 given
months?

Theorem 5 Let X ∼ Bi ( n, p ), when n → ∞, p → 0 and µ = np =


konst ant then Bi ( n, p ) → Po ( x, λt ) = Po ( x, µ).

§ 4.4 Normal(Gaussian) Distribution

The pdf of a normal/Gaussian r.v. X with mean µ and variance σ2 is


given as
1 1 x−µ 2
N (µ, σ2 ) = p e− 2 ( σ ) , −∞ < x < ∞. (56)
2πσ
The normal distr. has a symmetric bell shaped curve. The area bounded
by the normal curve and any two points x 1 and x 2 is the same as P ( x 1 <
x < x 2 ).

§ 4.4.1 Standardization
Figure 3: The typical symmetric bell-
Because it usually tedious and difficult to evaluate integrals involving shaped Gaussian distribution curve.
N (µ, σ2 ), then one can transform r.v. X ∼ N (µ, σ2 ) to r.v. Z ∼ N (0, 1)
because integrals involving ∼ N (0, 1) are easier to evaluate.

Example
Given a normal r.v. X with mean µ = 18 and std dev. σ = 2.5 find

(a). P ( X < 15),

(b). The values of k such that P ( X < k ) = 0.2236 and P ( X > k ) =


0.1814

(c). P (17 < X < 21)

§ 4.4 Normal Distr. Approximation to the Binomial Distr. Figure 4: The area under the curve
represents the probability.
Let X ∼ Bi ( n, p ) then as n → ∞ and p not extremely close to 0 or 1, then
x − np
Z= p ∼ N (0, 1), (57)
npq
x−µ
that is, Z = p 2 , where µ = np and σ2 = npq. This means that the (std)
σ
normal distr can be used as an approximation to the binomial distr. and
this approximation is good for n ≥ 30 and np, nq ≥ 5.

Example
Let X ∼ Bi (100, 0.4), find the normal approx. for P ( X < 30).

You might also like