0% found this document useful (0 votes)
426 views27 pages

Chapter 4

This document discusses probability distributions and random variables. It defines discrete and continuous random variables, and provides examples like the number of defective items in a sample. Key discrete distributions discussed are the binomial, which models the number of successes in fixed trials, and the geometric, which models the number of trials until the first success. The document also defines probability mass functions and explores concepts like combinations, permutations, and factorial notation in the context of probability.

Uploaded by

tarek mahmoud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
426 views27 pages

Chapter 4

This document discusses probability distributions and random variables. It defines discrete and continuous random variables, and provides examples like the number of defective items in a sample. Key discrete distributions discussed are the binomial, which models the number of successes in fixed trials, and the geometric, which models the number of trials until the first success. The document also defines probability mass functions and explores concepts like combinations, permutations, and factorial notation in the context of probability.

Uploaded by

tarek mahmoud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

STAT 251 - Chapter 4/6: Random Variables and

Probability Distributions

We use random variables (RV) to represent the numerical features of a random exper-
iment.

In chapter 3, we defined a random experiment as one where the possible outcomes are known
before the experiment is performed (S), but the actual outcome is not known.

A random variable is usually denoted by capital letters (eg) X, Y , Z, and its possible
realized values denoted by the same lowercase letters (eg) x, y, z.

So, X is a random quantity before the experiment is performed and x is the value of the
random quantity after the experiment has been performed.

Examples: X = # of defective items in a sample


. Y = Time till failure of a mechanical part
. Z = % yeild of a chemical process
. X2 = # rolled on a die

Notation: In this course, we will write things like P(X = x) to mean the probability that
the RV X takes on the value x.

The Range of a RV is the set of all possible values it can take on


(eg) X −→ {0, 1,..., n}
Y −→ [0, ∞)
Z −→ [0, 100]
X2 −→ {1, 2, 3, 4, 5, 6}

First we will talk about discrete RVs and then continuous RVs.

A Discrete RV has finite or countable range and a continuous RV is defined on a contin-


uous range (which may be bounded).

1
Discrete Random Variables
A discrete RV is a RV that has a finite or countable range.
(eg) # defective items, # rolled with a dice, # of sales for a store, # of heads on n coin
tosses.

Example A: Let X = The number of 1’s rolled, when rolling a die 4 times....

Probability Mass Function (f (x)): The probability mass function


of a Discrete RV, denoted by a lowercase f (x), is a function that gives the probability of
occurance for each possible realized value x of the RV X. For a DISC RV, f (x) is a discrete
‘step’ function.
f (x) = P(X = x), for all possible values x of X. (for a discrete RV only!)

Properties of f (x):

1. 0 ≤ f (x) ≤ 1, for all x ∈ X


P
2. allx f (x) = 1, Sum to 100%

Probability mass functions can be represented in a frequency table, a histogram or as a


mathematical function. Examples below.....

Combinatorial and Factorial Notation:


Suppose we have n distinct objects and we would like to choose k of them. If we can only
choose each object once (no replacement) and choosing the objects (a, b) is the same as
choosing (b, a) (order they are selected in does not matter), then how many distinct sets of
k objects can we select from the n of them?

(eg) Select k=10 students from this class of n=110 or a poker hand: the deck has 52 (n)
cards, we select 5 (k) of them and we cannot get the same card twice, and K♥-A♥-A♠-A♣-
A♦ is the same as A♠-A♣-A♦-K♥-A♥

2
How many ways can we choose k objects from n distinct objects?

n
 n! n×(n−1)×....×(n−k+1)
k = (n−k)!k! = k! , where k! = k × (k − 1) × (k − 2) ×
.... × 2 × 1
52 52! 52∗51∗...∗2∗1 52∗51∗50∗49∗48

(eg) Poker hand: 5
= (52−5)!5!
= 47∗46∗....∗2∗1∗5∗4∗3∗2∗1
= 5∗4∗3∗2∗1
= 2,598,960

Examples:

1. If we have {1, 2, 3, 4} and we choose 2 of them without replacement, how many dif-
ferent combinations can we get?

2. What is the probability of winning the jackpot in Lotto 6/49?

n n! n!

Note: We define 0! = 1. (eg) n
= (n−n)!n!
= 0!n!
= 1 as it should!

Some Important Discrete Random Variables:


First, we will talk about the bernoulli process which gives rise to the binomial, the
geometric and the negative binomial random variables.

Later in the course (in Chapter 6), we will talk about the poisson process which gives rise
to the poisson random variable (as well as the exponential RV).

Bernoulli Process:
A bernoulli process is a random experiment with the following features:

1) The experiment consists of n independent steps called Trials.

2) Each trial results in either a success or a failure (ie) S = {S, F}

3) The probability of a success is constant across trials


. Prob. of success = P(S) = p

You should be able to recognize when an experiment is a bernoulli process:

3
There is an event that will/will not occur (success/failure). Successes/Failures are measured
over a # of trials. The probability of a success is the same for every trial.

The terms ‘success’ and ‘failure’ are used as the theory has its ‘roots’ in gambling. It can
sometimes be weird to think of a success as ‘finding a defective item’, although this is the
terminlology we use.

Examples:

(1) Testing Randomly selected items for defects:


event: Find a defective item
trial: Inspection of each selected item
outcomes: Defective (success) or not-defective (failure)

2) The number of heads on repeated coin tosses:


event: Get a H
trial: Each coin toss
outcomes: get a H (success) or a T (failure)

As we mentioned, the bernoulli process gives rise to:


Binomial RV: The # of successes in a fixed # of trials
Geometric RV: The # of trials completed until the first success is observed.

Examples:

(1) Testing Randomly selected items for defects:


Binomial RV −→ the # of defects out of n = 50 sampled and inspected items
Geometric RV −→ the # of items inspected until the first defective item is found

(2) The number of heads on repeated coin tosses:


Binomial RV −→ the # of H out of n = 100 coin tosses
Geometric RV −→ the # of tosses until you get the first head

Binomial Random Variables:


Experiment: Counting the # of occourances of an event (success) for n independent
trials (fixed number of trials).

Random Variable: X = # of successes

4
Range: {0, 1, 2,....,n}

Parameters: n = # of trials, p = probability of success

Notation: If X is a Binomial RV, we write X ∼ BIN(n, p), which indicates that


X follows a binomial distribution with n trials that each have a probability of success p.

n

Prob Mass Function: f (x) = P(X=x) = x px (1 − p)n−x , for x = 0, 1,...,n

Example B: If we toss a coin 10 times, what is the probability of getting exactly 2 heads?

Solution: We have n=10, p=0.5


Let X=the # of heads in 10 tosses
Then, X ∼ BIN(10, 0.5) 
and f (2) = P(X=2) = 10 2
(0.5)2 (1 − 0.5)10−2 = 0.044 or 4.4% chance of tossing exactly 2
heads on 10 tosses.

What is the probability of getting two or more heads on 10 tosses?

What is the probability of getting 40 or more heads on 100 tosses?......In chapter 7, we


will see how we can get an approximate answer for a question like this.

Let’s intuitively think about why the probability mass function of a binomial RV makes
sense, using the last example.

Firstly, how many ways can we get 2H on 10 tosses?


10

There are 2 = 45, since the 2H can occur on any 2 of the 10 tosses.

(ie) A head can occur on the 1st or 2nd or.... or 10th toss. We want to choose 2 of the 10
possible locations for the Heads.

Recall: We are assuming that probability of success is the same for each trial and that the
trials are independent.

Let’s take one of the 45 sequences with 2H, say.....(H, H, T, T, T, T, T, T, T, T) or (S, S,


F, F, F, F, F, F, F, F)

and because of independence, P(S ∩ S ∩ F ∩ F ∩ F ∩ F ∩ F ∩ F ∩ F ∩ F)

5
= P(S)P(S)P(F)P(F)P(F)P(F)P(F)P(F)P(F)P(F) = p*p*(1−p)*(1−p)*(1−p)*(1−p)*(1−
p)*(1 − p)*(1 − p)*(1 − p) = p2 (1 − p)8

The above is the probability of getting 2H and then 8T.

All sequences of 2H and 8T have the same probability, so the probability of getting any
particular sequence with 2H and 8T is.....
10

(# possible 2H combos)×(probability of a 2H combo) = 2 p2 (1 − p)8

Example C: Suppose that 5% of computer chips made by a company are defective. If


you randomly select and inspect 8 chips, what is the probability that you find at least one
defective one? What assumption have you made when solving this problem?

Example D: Suppose you go to the casino and make 5 bets of $1 each, on the colour of the
number coming up in the casino game Roulette. What is the probability that you leave the
casino with more money than you came with?

Geometric Random Variables:


Experiment: Counting the # of trials until the first success.

Random Variable: Y = # of trials until first success

Range: {1, 2,....} (infinite but countable)

Parameters: p = P(S) = probability of success

Notation: If Y is a geometric RV, we write Y ∼ GEO(p), which indicates that y


follows a geometric distribution with probability of success p.

Prob. Mass Function.: f (y) = P(Y =y) = p(1 − p)y−1 , for y = 0, 1,...

Example E: If we toss a coin, what is the probability that we have to toss it exactly 5 times
to see the first H show up?

Solution: We have p=0.5


Let Y =the # of tosses until the first H
Then, Y ∼ GEO(0.5)
and f (5) = P(Y = 5) = 0.5 ∗ (1 − 0.5)5−1 = 0.031

6
Let’s intuitively think about why the prob mass function of a geometric RV makes sense
using the last example.

To have the first success on the 5th trial, we must have the first 4 trials be failures.....(F, F,
F, F, S)

Recall: We assume constant probability of success and independence of trials.

So, P(first success on 5th trial) = P(F ∩ F ∩ F ∩ F ∩ S) = by independence = P(F)P(F)P(F)P(F)P(S)


= (1 − p)(1 − p)(1 − p)(1 − p)p = p(1 − p)4

Examples:

1. If 5% of the computer chips manufactured by a company are defective, what is the


probability that we have to check exactly 8 chips before we find one defective one?

2. You go to the casino and play Roulette, betting on the colour of the number. What is
the probability that the first bet you win is on the fourth bet you make?

Sum of Independent Binomial Random Variables:

If X1 , X2 ,....,Xk are k independent binomial RVs, where Xi ∼ BIN(ni , p) −→ (have


same p but may have different n)

Then Y = X1 + X2 +....+Xk ∼ BIN(n1 + n2 +.....+nk , p)

(eg) X1 ∼ BIN(10, 0.25) and X2 ∼ BIN(25, 0.25), and X1 and X2 are independent,
then Y = X1 + X2 ∼ BIN(35, 0.25)

(ie) A BIN(n, p) is the sum of n independent BIN(1, p) trials (a BIN RV is the sum of
Bernoulli RVs)

Sum of Independent Geometric Random Variables:

Summing more that one geometric RV results in a new type of random variable; the
Negative Binomial RV!

If X1 , X2 ,....,Xk are k independent geometric RVs, where Xi ∼ GEO(p) (have same p)

7
Then, Y = X1 + X2 +....+Xk ∼ NEGBIN(k, p)

Y = # of trials until the k th success

y−1

Negative Bin. Prob Mass Fcn: f (y) = P (Y = y) = k−1 pk (1 − p)y−k
Where y = the # of trials and k = the # of successes.

Note: We now have the binomial, geometric and negative binomial random variables
(for discrete RVs)
binomial −→ # of successes in fixed # of trials
geometric −→ # of trials until the 1st success
negative binomial −→ # of trials until the k th success

Example F: We will toss a coin until we get 2 heads. Suppose that the coin is biased and
flips heads 40% of the time. What is the probability that the coin will be tossed exactly 3
times? What is the probability that it will be tossed at least 4 times?

Solution:

Let Y = # of tosses until you get 2 heads ∼ NEGBIN(2, 0.4)

For the first question, we have y = 3 and k = 2


P(Y = 3) = f (3) = 3−1 2−1
(0.4)2 (1-0.4)3−2 = 0.192

For the second question, P(Y ≥ 4) = 1 − P(Y ≤ 3) =


= 1 − P(Y = 3) − P(Y = 2) − P(Y = 1)
3−1 2−1
2 3−2

= 1 − 2−1 (0.4) (1-0.4) − 2−1 (0.4)2 (1-0.4)2−2 - 0
= 1 − 0.192 − 0.16 = 0.648

So far, we have introduced 3 different types of Discrete RVs. In chapter 6, we will introduce
one more (The Poisson RV).

We will now introduce what we call a cumulative distribution function (cdf, F(x)) and
then begin talking about the mean and standard deviation for these discrete RVs. Then
we will move into doing the same things, but for a few Continuous RVs.

8
Cumulative Distribution Functions (CDF):
P
The cumulative distribution function (F (x)) is defined as F (x) = P(X ≤ x) = k≤x f (k)

Let’s consider rolling a die 4 times, and let X = the # of 1’s rolled (Example A).

x f (x) = P (X = x) F (x) = P (X ≤ x)
0 0.4823 0.4823
1 0.3858 0.8681
2 0.1157 0.9838
3 0.0154 0.9992
4 0.0008 1.00

Properties of f (x) and F (x):

In principal, a Discrete RV can take its value on any countable or finite set of real #’s. In
this course, we will only deal with integer valued discrete RVs.

(1) 0 ≤ f (x) ≤ 1, 0 ≤ F (x) ≤ 1

(2) F (x) is non-decreasing and right-continuous


P
(3) allx∈X f (x) = 1, F (−∞) = 0 and F (∞) = 1

(4) f (x) = F (x) - F (x − 1), for all discrete integer valued x


. ( = P (X ≤ x) − P (X ≤ x − 1) = P (x − 1 < X ≤ x) )

(5) P (a < X ≤ b) = F (b) − F (a), for all integers a and b

(6) P (X < a) = P (X ≤ a − 1) = F (a − 1), for all integers a

Note: In the discrete case, P (X < x) 6= P (X ≤ x).

9
The Geometric Distribution Function (F (x)):
Example G: Derive the cumulative distribution function (F (x)) of a geometric RV.

1−rm+1
Fact: For a geometric series, 1 + r + r2 + .... + rm = 1−r

Recall: For Y ∼ GEO(p), f (y) = P (Y = y) = p(1 − p)y−1

Therefore, F (y) = yk=1 p(1 − p)k−1 = p(1 − p)0 + p(1 − p)1 + ..... + p(1 − p)y−1
P
= p( 1 + (1 − p)1 + (1 − p)2 + ..... + (1 − p)y−1 )
which is a geometric series where r=(1 − p) and m=(y − 1), so.....
1−(1−p)(y−1)+1 1−(1−p)y
=p 1−(1−p) = p p = 1 − (1 − p)y

Therefore, if Y ∼ GEO(p), then F (y) = P (Y ≤ y) = 1 − (1 − p)y

The Binomial Distribution Function (F (x)):


There is no closed form expression for F (x) when X ∼ BIN(n, p). Instead, one can
(a) Use the binomial table in the course notes.....we won’t do this!
(b) Calculate F (x) = P(X ≤ x) = f (0) + f (1) +....+ f (x)
(c) In chapter 7, we will learn some approximation methods we can use, when the number
of trials (n) is large.

Example H: Let X ∼ BIN(18, 0.4). Find...


(a) P(5 < X ≤ 10)
(b) P(5 ≤ X < 10)
(c) P(X > 4), and P(X ≥ 4)

Expected Values:
In this section, we will talk about how to find the expected value for a random variable.
By expected, we mean the ‘most representative’ value. The value that would be expected
in the long run. Expected Values make sense in the long run, but often not in the short
run.... let’s discuss this...

Let X be a discrete RV with prob mass function f (x) and range R.


Let g(x) be a function of x. (eg) g(x)=x or g(x)=x2 or g(x)=ex

10
Then the Expected Value of g(x) is....
E(g(x)) =
P
x∈R g(x)f (x)
Note: Often we are interested in the expected value of g(x)=x, although we leave it general,
as there are instances where we want to find the expected value of things such as g(x)=x2
and so on.....

Note: That if g(x)=x, then the expected value, E(g(x)), is just the ‘most likely’ value for
the random variable X (in the long run).
(eg) If X=# of defects and we use g(x)=x, then E(g(x)) = E(x) = The expected # of
defects.

Examples:

1. Consider the following.

(a) Find the expected value of X.


(b) Find the expected value of X 2 .

x f (x)
0 0.10
1 0.20
2 0.50
3 0.15
4 0.05

2. Consider rolling a standard die one time. Let X be the number rolled. Find
(a) E(X) and E(X 2 )

3. Suppose you will make a $1 bet on the colour of the number in roulette. What is the
expected gain/loss for the bet?

2 2
Note: In general, E(X ) 6= E(X) ...... check for the previous example!
(ie) The expected value of the square 6= the square of the expected value!
P P
Mean of X = µx = E(X) = x∈R xf (x) = allx∈R xP(X=x)
Note: The expected value/mean of X is simply a weighted average of the possible values X
may take on, weighted by how often that value will be observed.

11
We use the mean as our measure of center

Variance of X = E( (X − µx )2 ) = σx2
= x∈R (x − µx )2 f (x)
P

Here, we are using g(x)=(x − µx )2 ......we are finding the expected squared deviation from
the mean. This gives us an estimate of the ‘expected’ squared deviation from the mean, or
spread. YES, this is the same as the variance we saw in chapters 1/2...the only difference
is that there it was for a SAMPLE of data, here it is for the POPULATION/true variance.

The above definition of the variance is in a form that helps us to understand what it
is estimating. Below is a definition of the variance that is more useful when actually
calculating a variance.

σx2 = E(X 2) − E(X)2 = E(X 2) − µ2x


p
Standard Deviation of X = σx = σx2
= square root of the variance
(we work with this one most often)

Properties of The Mean, Variance and Standard Deviation:


If you recall, in our review of Chapter 1/2 we discussed a few rules of what happens to the
mean/variance/SD when you add or multiply a variable by a constant. These rules are
exactly the same, only stated in notational form here.

Let X be a random variable with mean µx and variance σx2 , and let a and b be any
real-valued constants.

(1) If Y = aX + b, then
µy = E(Y ) = E(aX + b) = E(aX) + E(b) = aE(X) + b = aµx + b

σy2 = VAR(Y ) = VAR(aX + b) = VAR(aX) + VAR(b) = a2 VAR(X) = a2 σx2


. σy = | a | σx

Recall: Adding a constant changes the mean by that constant, but the SD remains the same.
Multiplying by a constant changes both the mean and SD by that constant.

12
Let X and Y be random variables with means µx , µy and variances σx2 , σy2

(2) If we add/subtract the two random variables X and Y , then.....

Mean of X + Y = µx+y = E(X + Y ) = E(X) + E(Y ) = µx + µy


Mean of X − Y = µx−y = E(X − Y ) = E(X) − E(Y ) = µx − µy
In general, E(aX + bY ) = aµx + bµy

(3) If X and Y are independent, then.....

VAR(aX + bY ) = a2 σx2 + b2 σy2

VAR(aX − bY ) = a2 σx2 +b2 σy2

When X and Y are independent, the variance of the sum/difference is equal to the sum
of the variances.
(ie) VAR(X ± Y ) = VAR(X) + VAR(Y )

(4) If X and Y are independent, then.....

E(XY ) = E(X)E(Y )

Examples:

1. You own 4 machines and you want to have them inspected. It costs $50 for all 4
inspections, and a defective machine is repaired at a cost of $25 each. The probability of
finding 0, 1, 2, 3, 4 defective machines is summarized in the table below.

x f (x) xf (x) X 2 f (x)


0 0.10 0.00 0.00
1 0.20 0.20 0.20
2 0.50 1.00 2.00
3 0.15 0.45 1.35
4 0.05 0.20 0.80
sum 1.00 1.85 4.35

(a) What is the expected # of defective machines?


(b) What is the expected total cost?
(c) What is the standard deviation for the total cost?

13
2. Suppose that in the month of July in Vancouver, the temperature has µ=25o C with
2
σ = 4. What are the mean and SD of temperatures in degrees fahrenheit?
Hint: Fahrenheit = Celsius* 95 + 32

Now that we know how to find means and variances, let’s go thru and find the means
and variances for some distributions we already know.

Mean and Variance of Binomial Random Variables:


If X ∼ BIN(n, p) then.....

Mean of X = E(X) = µx = np
Variance of X = VAR(X) = σx2 = np(1 − p)
Proof:

Let X = X1 + X2 + ..... + Xn , where Xi ∼iid BIN(1, p), then X ∼ BIN(n, p)


P P
Then E(Xi ) = x∈R xf (x) = x∈R xP (X = x) = 1p + 0(1 − p) = p

and E(Xi2 ) = x2 f (x) = x2 P (X = x) = 12 p + 02 (1 − p) = p


P P
x∈R x∈R

Therefore, σx2i = VAR(Xi ) = E(Xi2 ) − E(Xi )2 = p − p2 = p(1 − p)

So, E(X) = E(X1 + X2 + ..... + Xn ) = E(X1 ) + E(X2 ) +.....+ E(Xn ) =


= p + p + ..... + p = np

σx2 = VAR(X) = VAR(X1 + X2 + ..... + Xn ) =(because of Indep.)= VAR(X1 ) + VAR(X2 )


+.....+ VAR(Xn ) = p(1 − p) + p(1 − p) + ..... + p(1 − p) = np(1 − p)

14
Example I: You toss a coin 75 times. If we let X be the # of heads for the 75 tosses, then
what is the expected # of heads and what is its variance?

Example J: You make 80 bets on the colour of the number in Roulette. What is the mean
and variance for the number of bets won?

Mean and Variance of Geometric Random Variables:


If Y ∼ GEO(p) then.....

1
Mean of Y = E(Y ) = µy = p

1−p
Variance of Y = VAR(Y ) = σy2 = p2

** See course notes for proof!

Example K: A telemarketer makes successive independent phone calls and gets a sale with
a probability of 5% each call. Each phone call costs 25 cents. Find
(a) What is the expected cost of making one sale?
(b) What is the corresponding standard deviation?

Mean and Variance of Negative Binomial RVs:


If Y ∼ NEGBIN(k, p) then.....(recall: k=# successes, p=prob. of success)

k
Mean of Y = E(Y ) = µy = p

k(1−p)
Variance of Y = VAR(Y ) = σy2 = p2

Proof:

Recall: If Y = Y1 + Y2 + ... + Yk where Yi ∼iid GEO(p), then Y ∼ NEGBIN(k,p)

E(Y ) = E(Y1 + Y2 + .... + Yk ) = E(Y1 ) + E(Y2 ) +....+ E(Yk ) =

15
1 1 1 k
= p
+ p
+.....+ p
= p

VAR(Y ) = VAR(Y1 + Y2 + .... + Yk ) =(by Indep.)=


k(1−p)
= VAR(Y1 ) + VAR(Y2 ) +....+ VAR(Yk ) = 1−pp2
+ 1−p
p2
+.....+ 1−p
p2
= p2

Example K Continued:
(c) What is the expected cost of making 3 sales?
(d) What is the SD of this?

Example L: A hockey team needs to sign two free agents before the season starts. Suppose
that each player they speak with will join the team with a 20% probability, and assume that
the player’s probabilities of joining are independent of each other.
(a) What is the expected # of players they talk to before signing 2 players?
(b) What is the SD of this?

16
Examples:

1. Suppose that it is known that the probability of being able to log on to a computer
from a remote terminal is 0.7 at any given time.

(a) What is the probability that out of 10 attempts, you are able to log on exactly 6
times?
(b) What is the probability that out of 8 attempts, you are able to log on at least 6
times?
(c) How many times you you expect to have to try to log on before being successfully
logging on?
(d) What is the probability that you must attempt to log on at least 3 times before
successfully logging on? *Note that we can solve this by defining a BIN or GEO
RV
(e) What is the probability that it takes you more than 4 attempts to successfully
log on twice? *Note that we can solve this by defining a BIN of BEGBIN RV

2. Let’s play a game. You have to pay me $60 to play the game. you then roll 2 dice and
look at the sum of those 2 dice. I will give you back whatever the sum of the 2 dice is
squared.

(a) What is the expected gain/loss for playing this game?


(b) What is the variance for the gain/loss when playing this game?
(c) What is the probability that you make money on a single play of the game?
1
3. The probability that a wildcat well will be productive is 13 , regardless of the location
being drilled. We will assume that the locations being drilled are far enough from each
other so that the probability of it being productive is independent for each of the wells
drilled.

(a) How many wells do they expect to have to drill before finding 1 that is productive?
(b) How many wells do they expect to have to drill before finding 3 that are produc-
tive?
(c) What is the probability that the first productive well they find is on the 13th well
they drill?
(d) If they drill 20 wells, what is the probability that at least 2 are found to be
productive?
(e) If it costs the company $1000 start-up for drilling plus $100 for each well they
drill, what is the expected cost and the variance of the cost, of finding 3 productive
wells?

17
Continuous Random Variables
Things in this section will look very similar to the section on Discrete RVs. Conceptually,
all the ideas are the same.

A Continuous random variable is one that has an infinite or uncountable range.

(eg) Weight of an item, time until failure of an mechanical component, length of an object,.....

(Probability) Density Function (f (x)): The density function is a function that alows
us to work out the probability of occurance over a range of x-values. It does not have the
exact same definition as in the discrete case. (That is, f (x) 6= P (X = x), for a CONT RV)

Rb
P(a ≤ X ≤ b) = a f (x)dx
Pb
Note: In the discrete case, P(a ≤ X ≤ b) = a f (x). Now that it is continuous we must
integrate instead of sum.

Note: In the continuous case, P (X = a) = 0, for any a. (ie) the probability of X taking on
any particular value is 0.
R∞
Note: −∞ f (x)dx = 1. This is similar to what we saw in the discrete casae, except there we
summed over all values X could take on.

(Cumulative) Distribution Function (F (x)):

The c.d.f. gives the probability of being less than or equal to a particular value.
Rx
F (x) = P (X ≤ x) = −∞ f (t)dt
Note: Again, this is what we saw in the discrete case, except that now we have an integration
instead of a summation.

Properties of F (x):

1. 0 ≤ F (x) ≤ 1

18
2. F (−∞) = 0 and F (∞) = 1

3. F (x) is continuous and non-decreasing

4. P (a < X < b) = P (a ≤ X < b) = P (a < X ≤ b) = P (a ≤ X ≤ b) = F (b) − F (a)


Including or excluding the endpoints does not matter for continuous RVs as the probability
of X taking on any given value is 0.

0
5. F (x) = f (x). The derivative of the distribution gives the density.
f (x) −→ F (x) by integration, and F (x) −→ f (x) by differentiation.

Now we will introduce two continuous RVs; the Uniform RV and the Exponential RV.
The next chapter will be devoted to the Normal RV (bell curve), which is the most useful
and most common continuous distribution in statistics.

Uniform Random Variables:


Notation: If X is a Uniform RV, then we write X∼UNI(a, b), indicating that X is uniformly
(or evenly) distributed over the interval [a, b]

1
Density Function: f (x) = for a ≤ x ≤ b b−a ,
Rx x−a
Distribution Function: F (x) = a f (u)du = b−a

a+b
Mean of X = E(X) = µx = 2

(b−a)2
Variance of X = VAR(x) = σx2 = 12
Exercise: On your own, show that the above are true.....you will be able to do this after the
section on expected values (on p. 21)!

Example M: Let X ∼ UNI(0, 10). Find....


(a) P(X = 5)
(b) P(X ≤ 2)
(c) P(3 ≤ X ≤ 7)
(d) P(X < 2)

19
Exponential Random Variables:
The Exponential RV is often used to model the time until an event occurs. As a conse-
quence, it takes on values ≥ 0. In chapter 6 we will come back to this, but it arises from
something called the poisson process, which we will discuss more in depth then.

Notation: If X is an Exponential RV with a failure rate of λ, then we write X ∼ EXP(λ)

λ is a positive constant and is equal to the reciprocal of the mean life-time.


1
(ie) If the
lightbulb has a mean lifetime of 5 years, then λ = 5

Density Function: f (x) = λe−λx, for x ≥ 0

Distribution Function: F (x) = 1 − e−λx, for x ≥ 0


1
Mean of X = E(X) = µx = λ
1
Variance of X = VAR(x) = σx2 = λ2
Example N: Suppose that a lightbulb has an expected lifetime of 10 years. Find.....
(a) The failure rate of the lightbulb
(b) The probability that a bulb lasts more than 15 years?
(c) The probability that a bulb lasts 4-6 years?

20
Expected Values:
Again, finding the expected values is very similar to the discrete case, except now we use
integrals instead of summations.

Let X be a continuous RV with density function f (x). Let g(x) be a given function of
x (eg) g(x) = x or g(x) = x2 or,....
R∞
Then the expected value is E(g(x)) = −∞ g(x)f (x)dx
We integrate over the range of X. Here we write (−∞, ∞) to be general....what we mean
is over the range of the RV.

Note: If we use g(x) = x, then we are finding the expected value of X.


R∞
Mean of X: µx = E(X) = −∞
xf (x)dx

Variance of X: σx2 = VAR(X) = E(x2 ) − E(x)2

Note: The properties of µx , σx2 and σx are the same as in the discrete case (outlined on page
12/13)

Example O: Suppose that X ∼ UNI(0, 5). Find.....


(a) The expected value of X
(b) The expected value of X 2
(c) The SD of X

Example P: The density function for X, the lead concentration in gasoline in grams per
liter is given by f (x) = 12.5x − 1.25, for 0.1 ≤ x ≤ 0.5

(a) What is the expected concentration of lead?


(b) What is the variance of the concentration of lead?

21
The Median/Half-life of a Continuous RV:
1
The median of a continuous RV is defined as F (m) = 2 1
(ie) The probability that x is above or below m, the median, is 2
P(X < m) = P(X > m) = 12

When X is the measure of a lifetime, we often refer to the median (m) as the Half-Life

Example Q: Suppose that the lifetime of your TV (in years) follows an exponential distri-
bution with a failure rate of 0.04. Find....
(a) The expected lifetime of the TV.
(b) The SD of the lifetime of the TV.
(c) Calculate the half-life of your TV.
(d) What % of TVs like yours will exceed their expected lifetimes?
(e) What % of TVs like yours will exceed their half-lives?

22
Sum and Average of Independent RVs:
Note: We will use many of the ideas in this section once we come to Chapter 7.

Random experiments are often independently repeated creating a sequence X1 , X2 , ...., Xn


of n independent RVs.

Typically these n RVs, Xi where i = 1, 2, ..., n, will have a common mean (µxi ) and a common
variance (σx2i ). In this case, {X1 , X2 , ...., Xn } is called an Independent Random Sample

(eg) Roll a die repeatedly, measure the lifetime of a type of component repeatedly, crash test
a type of car repeatedly,.....

So, we have independent RVs X1 , X2 , ...., Xn , where µxi = µ and σx2i = σ 2


(common mean/variance)

Often, we are interested in the sum or mean of the sample values.

Pn
S= i=1 Xi = X1 + X2 + .... + Xn
Pn
i=1 Xi X1 +X2 +....+Xn
X̄ = n = n
Example R: Suppose we have a series of 9 independent measurements on the compressive
strength of steel fibre reinforced concrete.
Let Xi ∼ UNI(500, 1500), where Xi is the compressive strength of block i, i=1, 2,...,9.
Question: do you think the breaking strength would follow a uniform distribution?

a+b 500+1500
Recall: E(Xi ) = 2
= 2
= 1000

(b−a)2 (1500−500)2
VAR(Xi ) = 12
= 12
= 83333.33

BUT, a more accurate measure of the ‘true’ compressive strength of the steel fibre reinforced
concrete would be the average/mean strength of the 9 blocks, X̄.

So, when we have independent RVs X1 , X2 , ..., Xn with common mean and variance,
then.....

23
Sum: µS = E(S) = nµxi
. σS2 = VAR(S) = nσx2i

. σS = SD(S) = nσxi

Mean: µX̄ = E(X̄) = µxi


2 σx2
i
. σX̄ = VAR(X̄) = n
σ
. σX̄ = SD(X̄) = √xni

Square Root Rule: The SD of the sum is proportional to the square root of n.
The SD of the average is proportional to the inverse of the square root of n

Example R continued: Recall our example of measurements on the compressive strength


of 9 concrete blocks. Find....
(a) The expected value of the mean of the 9 blocks
(b) The SD of the mean of the 9 blocks (SDX̄ )

Note: We can see that the mean is the same, but the SD is 3-times smaller → square root
rule!

Note: As the number of measurements or observations grows large (n → ∞), the variability
of the mean of the measurements gets very small (SD(X̄) = σX̄ → 0)
More measurements means greater accuracy!

Example S: 20 randomly selected UBC students are asked if they smoke and 6 say that
they do and the other 14 do not. Find.....
(a) The estimated proportion of smokers among UBC students? Is this a valid estimate?
(b) The SD of this estimate

In general, when we are working with


q proportions (which arise from categorical
p(1−p)
variables), we estimate SD(p) = n

24
Max/Min of Independent RVs
As we just discussed, we often conduct a series of random experiments to get a sequence of
Independent RVs {X1 , X2 , ...., Xn }, known as a random sample.

Often, we are interested in the sum or average of all the observations, but there are also
instances where we are interested in the Maximum or Minimum value in this random
sample.
W W
If we let = max{X1 , X2 , ...., Xn }. Then can be used to model:

1) The lifetime of a system of n independent parallel components, where Xi = the lifetime


of component i.

2) The completion time of a project of n independent sub-projects, which can be done


simultaneously and Xi = completion time of project i
V V
If we let = min{X1 , X2 , ...., Xn }. Then can be used to model:

1) The lifetime of a system of n independent components in a series, where Xi = the


lifetime of component i.

2) The completion time of a project pursued by n independent competing teams and Xi


= completion time of project i

25
The Maximum:
Suppose that we have an W
independent random sample, X1 , X2 , ...., Xn , and we wish to find
the maximum of them; = max{X1 , X2 , ...., Xn }

Also, suppose that FXi (x) and fXi (x) are the distribution and density functions for the
RV Xi .

Define
W FW (ν) and fW (ν) to be the distribution and density functions of the maximum
( )

W
Then, FW (ν) = P( ≤ ν) = P((X1 ≤ ν) ∩ (X2 ≤ ν) ∩ ..... ∩ (Xn ≤ ν) ) =(by
indep)= P(X1 ≤ ν)*P(X2 ≤ ν)*.....*P(Xn ≤ ν) = FX1 (ν) ∗ FX2 (ν) ∗ ..... ∗
FXn (ν)

Here, we will only discuss the cases when the Xi are iid (=independent and identically
distributed), as this is most often the case.

Distribution: FW(ν) = (FXi (ν))n


0
Density: fW(ν) = FW(ν) = n(FXi (ν))n−1fXi (ν)
Example T: A system consists of 3 components arranged in parallel. Components are inde-
pendent with a mean lifetime of 5 years. The lifetimes are thought to follow an exponential
distribution. Find....
(a) The median/half-life for component 1
(b) The probability that the first component fails before 5.5 years have passed
(c) The probability that the system fails before 5.5 years have passed
(d) The median/half-life of the system

Note: How we can solve some of these problems using the simple rules of probability that
we learned back in chapter 3.

26
The Minimum:
As before, Suppose that V
we have a random sample, X1 , X2 , ...., Xn , and we wish to find
the minimum of them; = min{X1 , X2 , ...., Xn }

Also, suppose that FXi (x) and fXi (x) are the distribution and density functions for the
RV Xi .

Define
V FV (υ) and fV (υ) to be the distribution and density functions of the minimum
( )

V
Then, FV (υ) = P( ≤ υ) =
V
(
1 − P ( > υ) = 1−P (X1 > υ) ∩ (X2 > υ) ∩ ..... ∩ (Xn > υ) ) =(by indep)=
1−P(X1 > υ)*P(X2 > υ)*.....*P(Xn > υ) =
(
1− 1 − FX1 (υ) )(1 − FX (υ)).....(1 − FX (υ))
2 n

Distribution: FV(υ) = 1−(1 − FXi (υ))n


0
Density: fV(υ) = FV(υ) = n(1 − FXi (υ))n−1fXi (υ)
Example U: A system consists of 3 components in a series. Components are independent
with a mean lifetime of 5 years, and are thought to follow an exponential distribution.
Find.....
(a) The probability of component 1 failing before 5.5 years have passed
(b) The probability that the system fails before 5.5 years have passed
(c) The median/half-life for the system

27

You might also like