0% found this document useful (0 votes)

142 views58 pages

Fundamentals of Probability

The document discusses random variables and their probability distributions. It defines key concepts such as random variables, sample spaces, probability distributions, and independence. It provides examples of discrete and continuous random variables. Specifically, it discusses Bernoulli random variables to represent outcomes of coin flips or free throws. It also introduces joint distributions and conditional distributions to describe relationships between multiple random variables.

Uploaded by

hien05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

142 views58 pages

Fundamentals of Probability

Uploaded by

hien05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 58

Chapter

Fundamentals of Probability

Section 1. Random Variables and Their Probability Distributions

Key concepts and basic models of probability: necessary to understand important

theory of statistics such as sampling distribution and central limit theorem.
Most of statistical analysis and inferences are based on probability theory.
Probability theory is applied to our real life to have better decision making

The concept and definition of random variable

Suppose that we flip a coin 10 times and count the number of times the coin turns up
heads. This is an example of experiment.
An experiment ; any procedure that can, at least in theory, be infinitely repeated and
has a well-defined set of outcomes.
Ex.) In the coin-flipping experiment the number of heads appearing is an integer
from 0 to 10, so the outcomes of the experiment are well defined.
A random variable is one that takes on numerical values and has an outcome
determined by an experiment. (The outcome of statistical survey also becomes
random variables. Here the survey is regarded as one of experiment.)
2

Section 1. Random Variables and Their Probability Distributions

Sample Space
Elements of Sample Space

Outcomes of experiment

Mapping : assigning real numbers to

specific outcomes
Domain of Real Numbers

Elements
of Sample
Individual
outcome inSpace
Outcomes
of survey
experiment
statistical
etc.
Individual outcome in

0
Collectively those real numbers become;

Section 1. Random Variables and Their Probability Distributions

Flipping the coin 10 times
Counting the number of heads (We do not know how many
times the coin will come up heads, before we flip the coin.
Coin flipping: random experiment
Outcome of coin flipping: random variable
To analyze data collected in business and social sciences, it is important to have a basic
understanding of random variables and their properties.
Types of random variable; binary, categorical, numerical
Specific methods of
statistical analysis applicable are determined according to the types of random variable
Underlying distribution of random variable:
Determines the property of statistic to
be analyzed and thereby relevant inference procedure is also determined.
Notation of random variables: using uppercase letters, usually W, X, Y, and Z;
Particular outcomes of random variables: using the corresponding lowercase letters, w,
x, y, and z.
Ex.) The coin-flipping experiment;
X: the number of heads appearing in 10 flips, not associated with any particular value,
but taking on a value in the set {0, 1, 2, , 10}.
x = 6: one of particular outcome
4

Section 1. Random Variables and Their Probability Distributions

Use of subscripts to indicate large collections of random variables :

For example, if we record last years income and other important characteristics of 30,000
randomly chosen households in Vietnam, then the random variables are;
X: income

the particular outcomes of individual H/H: , , , .

Y: housing ownership of H/H:

the particular outcomes of individual H/H: , ,, .

Z: total consumption expenditure

the particular outcomes of individual H/H: , ,, , and so on

Random variables are always defined to take on numerical values, even when they describe
qualitative events.
For example, consider housing ownership at the sample survey of 30,000 H/H above, where
the two outcomes are own house and rented house.
We can define a random variable as follows: Y = 1 in case of own house, and Y = 0 in case
of rented house

An example Bernoulli random variable

Section 1. Random Variables and Their Probability Distributions

1.1 Discrete Random Variables
A discrete random variable is one that takes on only a finite or countably infinite number of values.
countably infinite : Even though an infinite number of values can be taken on by a random
variable, those values can be put in a one-to-one correspondence with the positive integers.
A Bernoulli random variable is the simplest example of a discrete random variable.
In social statistics, the outcomes are multinomial rather than binomial in many cases.
Again, consider the example of housing ownership at the sample survey above.
The probability of any particular H/H having own house can be any number between zero
and one. Call this number , so that
P(Y = 1) =
B.1
P(Y = 0) = 1 .
B.2
The probabilities of success and failure for a Bernoulli random variable are determined
entirely by the value of .
Notation for Bernoulli(Binomial) random variables: X ~ Bernoulli()
X has a Bernoulli distribution with probability of success equal to .
(Methods for estimating are a subject of mathematical statistics and sampling theory.)

Section 1. Random Variables and Their Probability Distributions

The distribution of discrete random variable:
If X takes on the k possible values {, , }, then the probabilities , , , are
defined by (See Figure B.1)
= P(X = ), j = 1, 2, , k,
B.3
where each is between 0 and 1 and
+ + + = 1.
B.4
The probability density function (pdf) of X summarizes the information concerning the
possible outcomes of X and the corresponding probabilities:
f() = , j = 1, 2, , k,
B.5
For any real number x, f (x) is the probability that the random variable X takes on the
particular value x.
When dealing with more than one random variable; is the pdf of X, is the pdf of Y,
and so on

Section 1. Random Variables and Their Probability Distributions

1.2 Continuous Random Variables
A variable X is a continuous random variable if it takes on any real value with zero probability.
A continuous random variable X can take on so many possible values that we cannot count
them or match them up with the positive integers, so logical consistency dictates that X can
take on each value with probability zero.

In practical sense, random variables that take on numerous values are best treated as
continuous.
The pdf of a continuous random variable is defined only to compute events involving
a range of values.
For example, if a and b are constants where a b, the probability that X lies between the
numbers a and b, P(a X b), is the area under the pdf between points a and b, as shown in
Figure B.2.
the integral of the function f between the points a and b.
The entire area under the pdf of a continuous random variable must always equal one.

Section 1. Random Variables and Their Probability Distributions

When computing probabilities for continuous random variables, it is easiest to work with the
cumulative distribution function (cdf ). If X is any random variable, then its cdf is defined for
any real number x by
F(x) = P(X x).
B.6
For discrete random variables; (B.6) is obtained by summing the pdf over all values
such that x.
For a continuous random variable, F(x) is the area under the pdf, f, to the left of the point
x.
0 F(x)
If , then P(X ) P(X ), that is, F() F().
A cdf is an increasing (or at least a nondecreasing) function of x.
Two important properties of cdfs;
For any number c, P(X c) = 1 F(c).
B.7
For any numbers a b, P(a X b) = F(b) F(a).
B.8
For a continuous random variable X, P(X c) = P(X c),
B.9
And
P(a X b) = P(a X b) = P(a X b) = P(a X b).
B.10
Convenient concept for the probability calculations using continuous cdfs.

Section 2. Joint Distributions, Conditional Distributions, and Independence

Usually socio-economic data analysis involves the occurrence of events involving
more than one random variable.
Consider the sample survey of 30,000 H/H above, where we may be interested in
differences in H/H income level by the educational level of the head of H/H
This is an example of a joint distribution.
Also we may be interested in differences in H/H income level among H/Hs, where
educational level of family head is college or above.
This is an example of a conditional distribution.

Section 2. Joint Distributions, Conditional Distributions, and Independence

2.1 Joint Distributions and Independence
Let X and Y be discrete random variables. Then, (X,Y) have a joint distribution, which is fully
described by the joint probability density function of (X,Y):
(x,y) = P(X = x, Y = y),
B.11
where the right-hand side is the probability that X = x and Y = y.
In particular, random variables X and Y are said to be independent if, and only if,
(x,y) = (x) (y)
B.12
for all x and y, where is the pdf of X and is the pdf of Y.
the definition of statistical independence between X and Y
The pdfs and are often called marginal probability density functions to distinguish
them from the joint pdf .
This definition of independence is valid for discrete and continuous random variables.
If X and Y are discrete, then (B.12) is the same as
P(X = x, Y = y) = P(X = x) P(Y = y);
B.13
The probability that X = x and Y = y is the product of the two probabilities
P(X = x) and P(Y = y).

Section 2. Joint Distributions, Conditional Distributions, and Independence

ExampleB.1
[Free Throw Shooting]

Consider a basketball player shooting two free throws. Let X be the Bernoulli random variable equal to one if
she or he makes the first free throw successful, and zero otherwise. Let Y be a Bernoulli random variable
equal to one if he or she makes the second free throw success. Suppose that she or he is an 80% free throw
shooter,
so that P(X = 1) = P(Y = 1) = 0.8. What is the probability of the player making both free throws successful
If X and Y are independent, we can easily answer this question: P(X = 1,Y = 1) = P(X = 1) P(Y
= 1) = (.8)(.8) = 0.64. Thus, there is a 64% chance of making both free throws successful.
Independence of random variables is a very important concept.
If X and Y are independent and we define new random variables g(X) and h(Y) for any functions g and h,
then these new random variables are also independent.
If , , , are discrete random variables, then their joint pdf is
f (, , , ) = P( = , = , , = ).
The random variables , , , are independent random variables if, and only if, their
joint pdf is the product of the individual pdfs for any (, , , ). This definition of
independence also holds for continuous random variables.
Assuming statistical independence often provides a way to reasonable approximation in more
complicated situations.
14

Section 2. Joint Distributions, Conditional Distributions, and Independence

Ex.) Consider the problem of an airline trying to decide how many reservations to accept for a flight that has
100 available seats. If fewer than 100 people want reservations, then these should all be accepted. But what if
more than 100 people request reservations? A safe solution is to accept at most 100 reservations. However,
because some people book reservations and then do not show up for the flight, there is some chance that the
plane will not be full even if 100 reservations are booked. This results in lost revenue to the airline. A different
strategy is to book more than 100 reservations and to hope that some people do not show up, so the final
number of passengers is as close to 100 as possible.
A natural question in this context is: Can we decide on the optimal (or best) number of
reservations the airline should make?
Suppose that the airline accepts n reservations for a particular flight. For each i = 1, 2,, n, let denote the
Bernoulli random variable indicating whether customer i shows up: = 1 if customer i appears, and = 0
otherwise. Letting to denote the probability of success (using reservation), each has a Bernoulli()
distribution.
An approximation: assuming that the are independent of one another.
The variable of primary interest: the total number of customers showing up out of the n
reservations; denote this variable X. Since each is unity when a person shows up, we can
write X = + + + .
Assuming that each has probability of success and that the are independent, the random variable X
has a binomial distribution.

Section 2. Joint Distributions, Conditional Distributions, and Independence

The Binomial Distribution
The probability density function of random variable X having binomial distribution;
f(x) = , x = 0, 1, 2, , n,
B.14
where = , and for any integer n, n! (read n factorial) is define as
n! = n(n1) (n2) 1. By convention, 0! =1.
When a random variable X has the pdf given in (B.14), we write X ~ Binomial(n,).
Equation (B.14) can be used to compute P(X = x) for any value of x from 0 to n.

If the flight has 100 available seats, the airline is interested in P(X > 100). Suppose, initially,
that n =120, so that the airline accepts 120 reservations, and the probability that each person
shows up is = 0.85. Then, P(X >100) = P(X = 101) + P(X = 102) + +P(X = 120), and each
of the probabilities in the sum can be found from equation (B.14) with n = 120, = 0.85, and
the appropriate value of x (101 to 120).
If n = 120, P(X >100) = 0.659
High risk of overbooking
If n = 110, P(X >100) = 0.024
Manageable risk

Section 2. Joint Distributions, Conditional Distributions, and Independence

2.2 Conditional Distributions

In econometrics, we are usually interested in how one random variable, call it Y, is related to
one or more other variables. For now, suppose that there is only one variable whose effects we
are interested in, call it X.
The conditional distribution of Y given X : How X affects Y. summarized by the conditional
probability density function, defined by
(yx) = (x,y) /(x)
B.15

for all values of x such that (x) > 0. The interpretation of (B.15) is most easily seen when X
and Y are discrete. Then,
(yx) = P(Y = yX = x),
B.16
where the right-hand side is read as the probability that Y = y given that X = x. When Y is
continuous, (yx) is found by computing areas under the conditional pdf.

Section 2. Joint Distributions, Conditional Distributions, and Independence

ExampleB.2
[Free Throw Shooting]
Consider again the basketball-shooting example, where two free throws are to be attempted.
Assume that the conditional density is;
(1|1) = .85, (0|1) = .15
(1|0) = .70, (0|0) = .30.

This means that the probability of the player making the second free throw depends on whether
the first free throw was made: if the first free throw is made, the chance of making the second
is .85; if the first free throw is missed, the chance of making the second is .70. This implies
that X and Y are not independent; they are dependent.
We can still compute P(X = 1,Y = 1) provided we know P(X = 1). Assume that the probability
of making the first free throw is .8, that is, P(X = 1) =.8. Then, from (B.15), we have
P(X = 1,Y = 1) = P(Y = 1|X = 1) P(X = 1) = (.85)(.8) = .68.

Section.3 Features of Probability Distributions

The key features of interest on the distributions of random variables;
Measures of central tendency,
location parameter(mean, median)
Measures of variability or spread, and others
shape parameter(variance or higher
order of moments)
Measures of association between two random variables.
correlation coefficient

3.1 A Measure of Central Tendency: The Expected Value

If X is a random variable, the expected value (or expectation and or mean) of X, denoted E(X)
and sometimes or simply , is a weighted average of all possible values of X.
The weights are determined by the probability density function.

Section.3 Features of Probability Distributions

The precise definition of expected value;
Let f (x) denote the probability density function of X.
The expected value of X is the weighted average;
E(X) = f () + f () + + f () = . B.17
when X is a discrete random variable taking on a finite number of values,
say, {, , , }.
If X is a continuous random variable, then E(X) is defined as an integral:
E(X) = ,
B.18

Given a random variable X and a function g(.), we can create a new random variable g(X).
X:is random variable.
and log(X) : are random variable
The expected value of g(X) is, again, simply a weighted average:
E[g(X)] =
B.19
or, for a continuous random variable,
E[g(X)] =
B.20

Section.3 Features of Probability Distributions

ExampleB.3
[Computing an Expected Value]
Suppose that X takes on the values 1, 0, and 2 with probabilities 1/8, 1/2, and 3/8,
respectively. Then,
E(X) = (1) (1/8) + 0(1/2) + 2(3/8) = 5/8.

ExampleB.4
[Expected Value of ]
For the random variable in Example B.3, let g(X) = . Then,
E() = (1/8) + (1/2) + (3/8) = 13/8.
In Example B.3, we computed E(X) = 5/8, so that = 25/64.
E()
In fact, for a nonlinear function g(X), E[g(X)] g[E(X)] (except in very special cases).
If X and Y are random variables, then g(X,Y) is a random variable for any function g.
Expectation of joint distribution. When X and Y are both discrete, taking on
values {, , , } and {, , , }, respectively, the expected value is
E[g(X,Y)] =
where is the joint pdf of (X,Y).
21

Section.3 Features of Probability Distributions

Properties of Expected Values

Property E.1: For any constant c, E(c) = c.

Property E.2: For any constants a and b, E(aX + b) = aE(X) + b.
To be used to calculate the
expected value of linear function of many random variables
Property E.3: If {, , , } are constants and {, , , } are random variables, then
E( + + + ) = E() + E() + + E().
Or, using summation notation,
E=.
B.21
As a special case of this, we have (with each = 1)
E=.
B.22
the expected value of the sum is the sum of expected values.(Often used property for
derivation)

Section.3 Features of Probability Distributions

We can also use Property E.3 to show that if X ~ Binomial(n,), then E(X) = n. That is, the
expected number of successes in n Bernoulli trials is simply the number of trials times the
probability of success on any particular trial. This is easily seen by writing X as
X = + + + , where each ~ Bernoulli().
Then,
E(X) = = = n

3.2 Another Measure of Central Tendency: The Median

The median, sometimes denoted Med(X), and the expected value, E(X), are
different. Neither is better than the other as a measure of central tendency; they are both
valid ways to measure the center of the distribution of X.
If X has a symmetric distribution about the value , then is both the expected value and
the median. Mathematically, the condition is f ( + x) = f ( x) for all x. This case is
illustrated in Figure B.3.

Section.3 Features of Probability Distributions

3.3 Measures of Variability: Variance and Standard Deviation
The distribution of X is more tightly centered about its mean than is the distribution of Y
although the random variables X and Y have the same mean, (Figure 4).
Variance: the simple way of summarizing differences in the spreads of distributions.

Section.3 Features of Probability Distributions

Variance
The expected distance from X to its mean
How far X is from on average
Var(X) E[(].
B.23
Variance of X is sometimes denoted , or simply , when the context is clear.
As a computational device, it is useful to observe that
= E( 2X + ) = E() 2) = E() .
B.24
For example, if X ~ Bernoulli(), then E(X) = , and, since = X, E() = .
From equation (B.24) that Var(X) = E() = = (1 ).

Two important properties of the variance

Property VAR.1: Var(X) = 0 if, and only if, there is a constant c,
such that P(X = c) = 1, in which case, E(X) = c.
The variance of any constant is zero and if a random variable has zero variance, then it is
essentially constant.

Section.3 Features of Probability Distributions

Property VAR.2: For any constants a and b, Var(aX + b) = Var(X).
Adding a constant to a random variable does not change the variance.
But multiplying a random variable by a constant: increases the variance by a factor
equal to the square of that constant.
For example, if X denotes temperature in Celsius and Y = 32 + (9/5)X is temperature in
Fahrenheit, then Var(Y) =Var(X) = (81/25)Var(X).

Section.3 Features of Probability Distributions

Standard Deviation
The standard deviation of a random variable, denoted sd(X), is simply the positive square
root of the variance: sd(X) = +
The standard deviation is sometimes denoted , or simply , when the random variable is
understood.
Two standard deviation properties immediately follow from Properties VAR.1 and VAR.2.
Property SD.1: For any constant c, sd(c) = 0.

Property SD.2: For any constants a and b, sd(aX + b) =sd(X).

In particular, if a0, then sd(aX) = asd(X).
The 2nd property makes the standard deviation more natural to work with than the variance.
Ex.) Suppose that X is a random variable measured in thousands of dollars, say,
income. If we define Y = 1,000X, then Y is income measured in dollars. Suppose that E(X)
= 20, and sd(X) = 6. Then, E(Y) = 1,000E(X) = 20,000, and sd(Y) = 1,000sd(X) = 6,000,
so that the expected value and standard deviation both increase by the same factor, 1,000.
If we worked with variance, we would have Var(Y) = Var(X), so that the variance of Y is
one million times larger than the variance of X.
28

Section.3 Features of Probability Distributions

Standardizing a Random Variable
Standardization of random variable is an application of the properties of variance and
standard deviationand a topic of practical interest in satistical inference
Standardized radom variable, Z is defined as
Z=,
B.25
which we can write as Z = aX + b, where a = (1/), and b = (/). Then, from Property E.2,
E(Z) = aE(X) + b = ( /) ( /) = 0.
From Property VAR.2,
Var(Z) = Var(X) = (/) = 1.

Thus, the random variable Z has a mean of zero and a variance (and therefore a standard
deviation) equal to one
Ex.) Suppose that E(X) = 2, and Var(X) = 9. Then, Z = (X 2)/3 has expected value zero
and variance one.

Section.3 Features of Probability Distributions

Skewness and Kurtosis

Other features of the distribution of a random variable using higher order moments;
The third moment of the random variable Z in (B.25) is used to determine whether a
distribution is symmetric about its mean.
E() = E[]/
If X has a symmetric distribution about , then Z has a symmetric distribution about zero.
(The division by does not change whether the distribution is symmetric.)
E[]/ : a measure of skewness in the distribution of X.

The fourth moment of the random variable Z

E() = E[]/
The 4th moments are always positive 0
E() 0
Interpret values of E(): Larger values mean that the tails in the distribution of X are thicker.
The fourth moment E() : A measure of kurtosis in the distribution of X.

Section.4 Features of Joint and Conditional Distributions

4.1 Measures of Association: Covariance and Correlation
It is useful to have a summary measures of how, on average, two random variables vary with
one another

Covariance
Let = E(X) and = E(Y) and consider the random variable (X) (Y).
Now, if X is above its mean and Y is above its mean, then (X) (Y) 0. This is also
true if X and Y
On the other hand, if (X) and (Y) , or vice versa, then (X) (Y) 0.
The covariance between two random variables X and Y is defined as;
Cov(X,Y) = E[(X) (Y)],
B.26
which is sometimes denoted .
If 0, then, on average, when X is above its mean, Y is also above its mean.
If 0, then, on average, when X is above its mean, Y is below its mean.
Useful expressions for computing Cov(X,Y) are as follows:
Cov(X,Y) = E[(X) (Y)] = E[(X )Y]
= E[X(Y )] = E(XY) .
B.27
It follows from (B.27), that if E(X) = 0 or E(Y) = 0, then Cov(X,Y) = E(XY).
31

Section.4 Features of Joint and Conditional Distributions

Covariance measures the amount of linear dependence between two random variables.
A positive covariance: Two random variables move in the same direction,
A negative covariance: Two random variables move in opposite directions.
Comparing the magnitudes of two covariances: Problematic !

Property COV.1: If X and Y are independent, then Cov(X,Y) = 0.

This property follows from equation (B.27) and the fact that E(XY) = E(X)E(Y)
when X and Y are independent.
Zero covariance between X and Y does not imply that X and Y are independent. In fact,
there are random variables X such that, if Y = , Cov(X,Y) = 0. [Any random variable with
E(X) = 0 and E() = 0 has this property.] If Y = , then X and Y are clearly not
independent: once we know X, we know Y.
A weakness of covariance as a general measure of association between random variables.

Section.4 Features of Joint and Conditional Distributions

Property COV.2: For any constants , , , and ,

Cov(X + Y + ) = Cov(X,Y).
B.28
An important implication of COV.2 is that the covariance between two random variables can
be altered simply by multiplying one or both of the random variables by a constant.
This is important in economics because monetary variables, inflation rates, and so on can
be defined with different units of measurement without changing their meaning.
Property COV.3: sd(X)sd(Y).
The absolute value of the covariance between any two random variables is bounded by the
product of their standard deviations; this is known as the Cauchy-Schwartz inequality.
Correlation Coefficient
To know the relationship between amount of education and annual earnings in the working
population.
Let X denote education and Y denote earnings
Computing their covariance.
It depends on how we choose the units of measurement
for education and earnings.

Section.4 Features of Joint and Conditional Distributions

Property COV.2 implies that the covariance between education and earnings depends on
whether earnings are measured in dollars or thousands of dollars, or whether education is
measured in months or years.
The covariance between them does depend on the units of
measurement.
A deficiency that is overcome by the correlation coefficient

The correlation coefficient between X and Y:

Corr(X,Y ) = ;
B.29
The correlation coefficient between X and Y is sometimes denoted (and called the
population correlation).
and are positive
Cov(X,Y) and Corr(X,Y) has the same sign, and Corr(X,Y) = 0 if,
and only if, Cov(X,Y) = 0.
Some of the properties of covariance carry over to correlation.
If X and Y are
independent, then Corr(X,Y) = 0, but zero correlation does not imply independence.
The magnitude of the correlation coefficient is easier to interpret than the size of the
covariance due to the following property.

Section.4 Features of Joint and Conditional Distributions

Property CORR.1: 1 Corr(X,Y) 1.
If Corr(X,Y) = 0, or equivalently Cov(X,Y) = 0; no linear relationship between X and Y, and
said to be uncorrelated random variables; otherwise, X and Y are correlated.
Corr(X,Y) = 1; perfect positive linear relationship, so that Y = a + bX for some constant a and
b 0.
Corr(X,Y) = 1; perfect negative linear relationship, so that Y = a + bX for some b 0. The
extreme cases of positive or negative 1 rarely occur. Values of closer to 1 or 1 indicate
stronger linear relationships.
Invariant to the units of measurement of either X or Y.

Property CORR.2: For constants , , , and , with 0,

Corr(X + ,Y + ) = Corr(X,Y).
If 0, then
Corr(X + ,Y + ) =Corr(X,Y).
Ex.) The correlation between earnings and education in the working population is .15.
Not depend on whether earnings are measured in dollars, thousands of dollars, or any other unit;
Not depend on whether education is measured in years, quarters, months, and so on.
35

Section.4 Features of Joint and Conditional Distributions

4.2 Variance of Sums of Random Variables
Major Properties of the Variance.
Property VAR.3: For constants a and b,
Var(aX + bY) = Var(X) + Var(Y) + 2abCov(X,Y).
It follows immediately that, if X and Y are uncorrelated-so that Cov(X,Y) = 0, then
Var(X + Y) = Var(X) + Var(Y)
B.30
and
Var(X Y) = Var(X) Var(Y).
B.31

Note) The variance of the difference is the sum of the variances, not the difference in the
variances.
Ex.) Let X denote profits earned by a restaurant during a Friday night and let Y be profits
earned on the following Saturday night. Then, Z = X + Y is profits for the two nights.
Suppose X and Y each have an expected value of $300 and a standard deviation of $15
(so that the variance is 225).
Expected profits for the two nights is E(Z) = E(X) + E(Y) = 2(300) = 600 dollars. If X and
Y are independent, and therefore uncorrelated, then the variance of total profits is the sum
of the variances: Var(Z) = Var(X) + Var(Y) = 2(225) = 450. It follows that the standard
deviation of total profits is or about $21.21.
36

Section.4 Features of Joint and Conditional Distributions

Expressions (B.30) and (B.31) extend to more than two random variables.

To state this extension, we need a definition. The random variables {, , , } are

pairwise uncorrelated random variables if each variable in the set is uncorrelated with every
other variable. That is, Cov(,) = 0, for all ij.
Property VAR.4: If {, , , } are pairwise uncorrelated random variables and
{: i = 1, , n} are constants, then
Var( + + ) = Var() + + Var().
In summation notation, we can write
Var = .
B.32
A special case of Property VAR.4 occurs when we take = 1 for all i. Then, for pairwise
uncorrelated random variables, the variance of the sum is the sum of the variances:
Var = .
B.33
Independent random variables are uncorrelated (see Property COV.1), the variance of a
sum of independent random variables is the sum of the variances.
If the are not pairwise uncorrelated, then the expression for Var is much more
complicated; we must add to the right-hand side of (B.32) the terms 2Cov(,) for all i j.
37

Section.4 Features of Joint and Conditional Distributions

To derive the variance for a binomial random variable using (B.33).
Let X ~ Binomial(n,) and write X = + + + , where the are independent
Bernoulli() random variables. Then, by (B.33), Var(X) = Var() + _ Var() = n(1 ).
Ex.) The airline reservation with n = 120 and = .85, the variance of the number of
passengers arriving for their reservations is 120(.85)(.15) = 15.3, so the standard deviation is about 3.9.

4.3 Conditional Expectation

In the social sciences, we would like to explain one variable, called Y explained variable, in
terms of another variable, say, X explantory variable.
In case that Y is related to X in nonlinear fashion;
Covariance and correlation
coefficient are meaningless.
Ex.) Y: hourly wage, X: years of formal education.
How the distribution of wages changes with education level.
The distribution of Y given X = x generally depends on the value of x. Nevertheless,
we can summarize the relationship.
the conditional expectation of Y given X, or the
conditional mean.
38

Section.4 Features of Joint and Conditional Distributions

We can compute the expected value of Y, given that we know the value of X=x. Denoted by
E(Y|X = x), or sometimes E(Y|x) for shorthand. Generally, as x changes, so does E(Y|x).
Y is a discrete random variable taking on values {, , , }, then
E(Y|x) = .

When Y is continuous, E(Y|x) is defined by integrating y(yx) over all possible values of y.
The conditional expectation is a weighted average of possible values of Y, but now the
weights reflect the fact that X has taken on a specific value.
E(Y|x) is just some function of x, showing how the expected value of Y varies with x.
Ex.), Let X is years of education and Y is hourly wage.
Then, E(Y|X = 12) is the average hourly wage for working population with 12 years of
education (roughly a high school education).
Tracing out the expected value for various levels of education provides important
information on how wages and education are related. See Figure B.5 for an illustration.
In econometrics the relation between average wage and amount of education is specified
by simple function, we typically specify simple functions that capture this relationship. As
an example, suppose that the expected value of WAGE given EDUC is the linear function
E(WAGE|EDUC) = 1.05 + .45 EDUC.
39

Section.4 Features of Joint and Conditional Distributions

If this relationship holds in the population of working people, the average wage for
people with 8 years of education is 1.05 + .45(8) = 4.65, or $4.65. The average wage for
people with 16 years of education is 8.25, or $8.25. The coefficient on EDUC implies that
each year of education increases the expected hourly wage by .45, or 45.

The expected value of hourly wage given various levels of education

Section.4 Features of Joint and Conditional Distributions

Conditional expectations can also be nonlinear functions. For example, suppose that
E(Y|x) = 10/x, where X is a random variable that is always greater than zero. (Figure B.6).
Similar to demand function, where Y is quantity demanded and X is price.
Correlation analysis does not work well!
Transformation of the variables may work.

Section.4 Features of Joint and Conditional Distributions

Properties of Conditional Expectation
Property CE.1: E[g(X) |X] = g(X), for any function g(X).
Functions of X behave as constants when we compute expectations conditional on X.
For example, E(|X) = . Intuitively, this simply means that if we know X, then we
also know .

Property CE.2: For functions a(X) and b(X),

E[a(X)Y + b(X) |X] = a(X)E(Y|X) + b(X).
For example, the conditional expectation of a function such as XY + 2:
E(XY + 2|X) = XE(Y|X) + 2.
Property CE.3: If X and Y are independent, then E(Y|X) = E(Y).
Tying together the notions of independence and conditional expectations.
If X and Y are independent, then the expected value of Y given X does not depend on X,
in which case, E(Y|X) always equals the (unconditional) expected-value of Y.
A special case of Property CE.3: If U and X are independent and E(U) = 0, then E(U|X) = 0.
42

Section.4 Features of Joint and Conditional Distributions

Conditional Variance
Given random variables X and Y, the variance of Y, conditional on X = x, is simply the
variance associated with the conditional distribution of Y, given X = x: E{}.
The formula
Var(YX = x) = E(x)
is often useful for calculations.
Sometimes we need to make assumptions about and manipulate conditional variances for
certain topics in regression analysis.

Ex.) Let Y = SAVING and X = INCOME (both of these measured annually for the
population of all families). Suppose that Var(SAVINGINCOME) = 400 + .25INCOME.
This says that, as income increases, the variance in saving levels also increases. It is
important to see that the relationship between the variance of SAVING and INCOME is
totally separate from that between the expected value of SAVING and INCOME.
Property CV.1: If X and Y are independent, then Var(YX) = Var(Y).
This property is pretty clear, since the distribution of Y given X does not depend on X,
and Var(YX) is just one feature of this distribution.
43

Section 5. The Normal and Related Distributions

The Normal Distribution
The normal distribution and those derived from it are the most widely used distributions in
statistics and econometrics. it is sometimes called Gausian distribution.

Assuming normal distribution of random variables;

Simplifies probability calculations.
Provides important basis for inference in statistics and econometrics- even when the
underlying population is not necessarily normal.
A normal random variable is a continuous random variable that can take on any value.
Its probability density function has the familiar bell shape graphed in Figure B.7.
Mathematically, the pdf of X can be written as
f (x) = exp, ,
B.34
where = E(X) and =Var(X). written as X ~ Normal(,).
Random variables roughly following a normal distribution;
Human heights and weights,
test scores, and
Unemployment rates by area /region / province
44

Section 5. The Normal and Related Distributions

Using basic facts from probabilityand, in particular, properties (B.7) and (B.8) concerning
cdfswe can use the standard normal cdf for computing the probability of any event
involving a standard normal random variable. The most important formulas are;
P(Z z) = 1 (z),
B.36
P(Z z) = P(Z z),
B.37
and
P(a Z b) = (b) (a).
B.38
Because Z is a continuous random variable, all three formulas hold whether or not the
inequalities are strict
Another useful expression is that, for any c 0,
P( c) = P(Z c) + P(Z c) = 2P(Z c) = 2[1 (c)].
B.39
the probability that the absolute value of Z is bigger than some positive constant c: simply
twice the probability P(Z c); the symmetry of the standard normal
distribution.
Any normal random variable Standard normal variable.

Property Normal.l: If X ~ Normal(,), then (X )/ ~ Normal(0,1).

Section 5. The Normal and Related Distributions

Additional Properties of the Normal Distribution

Property Normal.2: If X ~ Normal(,), then aX + b ~ Normal(a,).

Thus, if X ~ Normal(1,9), then Y = 2X + 3 is distributed as normal with mean
2E(X) + 3 = 5 and variance 9 = 36; sd(Y) = 2sd(X) = 23 = 6.
Zero correlation means independence for normally distributed random variable X and Y
Property Normal.3: If X and Y are jointly normally distributed, then they are independent if,
and only if, Cov(X,Y) = 0.
Property Normal.4: Any linear combination of independent, identically distributed normal
random variables has a normal distribution.
Ex.) Let , for i = 1, 2, and 3, be independent random variables distributed as Normal(,). Define W = + 2 3.
Then, W is normally distributed; its mean and variance as below;
E(W) = E() + 2E() 3E() = + 2 3 = 0.
Also,
Var(W) = Var() + 4Var() 9Var() = 14.

Section 5. The Normal and Related Distributions

Property Normal.4 also implies that the average of independent, normally
distributed
random variables has a normal distribution. If , , , are independent random
variables and each is distributed as Normal(,), then
~ Normal(,/n).
B.40
To be used for statistical inference about in a normal population

Other distributions, such as income distributions, do not appear to follow the normal
probability function. In most countries, income is not symmetrically distributed
about any value;
the distribution is skewed toward the upper tail.
In some cases, a variable can be transformed to achieve normality. A popular
transformation is the natural log, which makes sense for positive random
variables.
If X is a positive random variable, such as income, and Y = log(X) has a normal
distribution, then we say that X has a lognormal distribution. It turns out that the
lognormal distribution fits income distribution pretty well in many countries.
Other variables, such as prices of goods, appear to be well described as
49
log-0normally distributed.

Section 5. The Normal and Related Distributions

The Standard Normal Distribution
One special case of the normal distribution with the mean of zero and the variance
of unity.
standard normal distribution: Random variable Z has a. Normal(0,1) distribution, it has
the pdf of a standard normal random variable is denoted (z) ;
(z) = exp, ,
B.35
The standard normal cumulative distribution function is denoted (z) and is obtained as
the area under , to the left of z; that is (z) = P(Z z); (Figure B.8.)

No simple formula can be used to obtain the values of (z) [because (z) is the integral
of the function in (B.35), and this integral has no closed form].
The values for (z) are easily tabulated; they are given for z between 3.1 and 3.1.
For z 3.1, (z) is less than .001, and for z 3.1, (z) is greater than .999.

Section 5. The Normal and Related Distributions

The Chi-Square Distribution
The chi-square distribution is obtained directly from independent, standard normal random
variables. Let , i = 1, 2,, n, be independent random variables, each distributed as
standard normal.
Define a new random variable as the sum of the squares of the :
X= .
B.41
Then, X has a chi-square distribution with n degrees of freedom
(or df for short), denoted as X ~

The df in a chi-square distribution corresponds to the number of terms in the sum in (B.41).
Have an important role in statistical and econometric analyses.
From equation (B.41), a chi-square random variable is always non negative.
Also, the chi-square distribution is not symmetric about any point.
If X ~ , then the expected value of X is n [the number of terms in (B.41)], and the
variance of X is 2n.

Section 5. The Normal and Related Distributions

The t Distribution
The t distribution is the workhorse in classical statistics and multiple regression analysis.

t distribution is obtained from a standard normal and a chi-square random variable.

Let Z have a standard normal distribution and let X have a chi-square distribution with n
degrees of freedom. Further, assume that Z and X are independent. Then, the random
variable.
T=
B.42
has a t distribution with n degrees of freedom, denoted by T ~ .
The t distribution gets its degrees of freedom from the chi-square random variable in the
denominator of (B.42).
The pdf of the t distribution has a shape similar to that of the standard normal distribution,
except that it is more spread out and therefore has more area in the tails.
The expected value of a t distribution: zero
The variance of a t distribution: n/(n-2)
52

Section 5. The Normal and Related Distributions

The F Distribution
The F distribution will be used for testing hypotheses in the context of multiple
regression analysis. To test the null hypothesis. : = = = = = 0
To define an F random variable, let ~ and ~ and assume that and are independent. Then,
the random variable
F=
B.43
has an F distribution with (,) degrees of freedom, denoted by F ~ .
The pdf of the F distribution with different degrees of freedom is given in Figure B.11.

The order of the degrees of freedom in is critical.

The integer is called the numerator degrees of freedom because it is associated with the
chi-square variable in the numerator.
The integer is called the denominator degrees of freedom because it is associated with the
chi-square variable in the denominator.

Section 5. The Normal and Related Distributions

Practice 2.
Calculation of Covariance and Correlation Coefficients Using SPSS
1. Choose 3 pairs of indicators for which you think that they are highly correlated from
Socio- economic Indicators of Vietnam (Monthly or Quarterly).

2. Again choose 2 pairs of indicators for which you think that they are not correlated.
3. Calculate the covariance and correlation coefficients for each of 5 pairs of indicators.
4. Now produce = - for each of indicators you have and calculate the
correlation coefficients.
5. Compare the correlation coefficients from original data to those from difference.

K EYTE R M S
Bernoulli (or Binary)/Random Variable / Binomial Distribution / Chi-Square Distribution
Conditional Distribution / Conditional Expectation / Continuous Random / Variable
Correlation Coefficient / Covariance / Cumulative Distribution / Function (cdf) /
Degrees of Freedom / Discrete Random Variable / Expected Value Experiment /
F Distribution Independent Random Variables / Joint Distribution / Kurtosis /
Law of Iterated Expectations / Median / Normal Distribution /
Pairwise Uncorrelated Random Variables / Probability Density Function (pdf) /
Random Variable / Skewness / Standard Deviation / Standard Normal Distribution /
Standardized Random Variable / Symmetric Distribution / t Distribution /
Uncorrelated Random Variables / Variance

Best Ones
No ratings yet
Best Ones
32 pages
Probability Distributions: Values That The Random Variable Can Take On. Thus, The Expression P (X X) Symbolizes The
No ratings yet
Probability Distributions: Values That The Random Variable Can Take On. Thus, The Expression P (X X) Symbolizes The
6 pages
Business Analytics: Basic Statistic
No ratings yet
Business Analytics: Basic Statistic
55 pages
CHAPTER 7 Probability Distributions
100% (1)
CHAPTER 7 Probability Distributions
97 pages
Probability & Random Variables
No ratings yet
Probability & Random Variables
8 pages
PRP - Unit 2
No ratings yet
PRP - Unit 2
41 pages
Chapter 4
80% (5)
Chapter 4
21 pages
Week05-06 EC With Annotations
No ratings yet
Week05-06 EC With Annotations
84 pages
CHP 5
No ratings yet
CHP 5
63 pages
Random Variables FinalNotes
No ratings yet
Random Variables FinalNotes
57 pages
2 Random Variable
No ratings yet
2 Random Variable
69 pages
Probability Distribution: Question Booklet
No ratings yet
Probability Distribution: Question Booklet
8 pages
Statistical Method
No ratings yet
Statistical Method
227 pages
Random Variable and Mathematical Expectation
No ratings yet
Random Variable and Mathematical Expectation
9 pages
ISM Session 5 June 2025
No ratings yet
ISM Session 5 June 2025
74 pages
Week 3 Pro
No ratings yet
Week 3 Pro
23 pages
Probability & Random Variables
No ratings yet
Probability & Random Variables
27 pages
Basic-Statistic PDF
No ratings yet
Basic-Statistic PDF
56 pages
Chap 02
No ratings yet
Chap 02
153 pages
Random Variables and Pdfs
No ratings yet
Random Variables and Pdfs
18 pages
Chapter 5
No ratings yet
Chapter 5
26 pages
Understanding Random Variables
No ratings yet
Understanding Random Variables
44 pages
One Dim. RV (CH 3 and 4)
No ratings yet
One Dim. RV (CH 3 and 4)
11 pages
Basic Probability Review
No ratings yet
Basic Probability Review
77 pages
3rd Chapter
No ratings yet
3rd Chapter
45 pages
Random Variables & Probability Basics
0% (1)
Random Variables & Probability Basics
86 pages
Random Variable
No ratings yet
Random Variable
6 pages
Statistical Analysis
No ratings yet
Statistical Analysis
72 pages
Orientation - Basic Mathematics and Statistics - Probability
No ratings yet
Orientation - Basic Mathematics and Statistics - Probability
48 pages
Chapter III Random Variables
No ratings yet
Chapter III Random Variables
99 pages
Topic Two. Random Variable and Probability Distribution
No ratings yet
Topic Two. Random Variable and Probability Distribution
43 pages
Lec 2 Random Variables and PDF
No ratings yet
Lec 2 Random Variables and PDF
21 pages
Random Variables
No ratings yet
Random Variables
21 pages
Student Notes 2.1
No ratings yet
Student Notes 2.1
5 pages
Chapter Two 2. Random Variables and Probability Distributions
100% (1)
Chapter Two 2. Random Variables and Probability Distributions
15 pages
Chapter 2 Pattern
No ratings yet
Chapter 2 Pattern
21 pages
Random Variables and Probability Distributions
No ratings yet
Random Variables and Probability Distributions
71 pages
BM Theory
No ratings yet
BM Theory
25 pages
Unit 1 - Digital Communication - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Digital Communication - WWW - Rgpvnotes.in
11 pages
Digital Communication Basics
No ratings yet
Digital Communication Basics
109 pages
Random Variables
No ratings yet
Random Variables
26 pages
Binomial and Hypergeometric PDF
No ratings yet
Binomial and Hypergeometric PDF
12 pages
Probability
No ratings yet
Probability
69 pages
Stat ch-6
No ratings yet
Stat ch-6
54 pages
Random Variables & Distributions Guide
No ratings yet
Random Variables & Distributions Guide
80 pages
Random Variables - Part 1: Definition of Random Variable Types of Random Variables Discrete Random Variables
No ratings yet
Random Variables - Part 1: Definition of Random Variable Types of Random Variables Discrete Random Variables
14 pages
Probability Theory - D
No ratings yet
Probability Theory - D
80 pages
Chapter - 2
No ratings yet
Chapter - 2
34 pages
Lec-5 - Random Variables N Distributions
No ratings yet
Lec-5 - Random Variables N Distributions
12 pages
Lesson 3 Theory
No ratings yet
Lesson 3 Theory
25 pages
Introduction to Random Variables
No ratings yet
Introduction to Random Variables
49 pages
RandomVariables ProbDist
No ratings yet
RandomVariables ProbDist
40 pages
Mathematics PDF
No ratings yet
Mathematics PDF
280 pages
Random Variables & Distributions Guide
No ratings yet
Random Variables & Distributions Guide
5 pages
Probability B and W PDF For Students
No ratings yet
Probability B and W PDF For Students
100 pages
Chapter I
No ratings yet
Chapter I
41 pages
Viet Nam MICS4 Part 3
No ratings yet
Viet Nam MICS4 Part 3
110 pages
Vietnam Child Malaria Treatment 2010-11
No ratings yet
Vietnam Child Malaria Treatment 2010-11
116 pages
Chapter 2 - Econometrics
No ratings yet
Chapter 2 - Econometrics
41 pages
Tomko Fuse - Home Decoration With Origami PDF
100% (1)
Tomko Fuse - Home Decoration With Origami PDF
129 pages
EssentialDelphi 103
No ratings yet
EssentialDelphi 103
156 pages
Vietnam Population and Housing Census 2009
No ratings yet
Vietnam Population and Housing Census 2009
280 pages
TUTORIAL 6 SOLUTION GUIDE (From Chapter 8) Q4
No ratings yet
TUTORIAL 6 SOLUTION GUIDE (From Chapter 8) Q4
5 pages
GoldFinger PDF
100% (1)
GoldFinger PDF
94 pages
Z T and Chi-Square Tables
No ratings yet
Z T and Chi-Square Tables
6 pages
Statistics Supplement McEvoy
No ratings yet
Statistics Supplement McEvoy
10 pages
Artificial Int Syllabus Sem V Mumbai University
No ratings yet
Artificial Int Syllabus Sem V Mumbai University
39 pages
Consumer Ethnocentrism: A Test of Antecedents and Moderators
No ratings yet
Consumer Ethnocentrism: A Test of Antecedents and Moderators
13 pages
Advanced Statistics Non Parametric Statistics
No ratings yet
Advanced Statistics Non Parametric Statistics
64 pages
Anshu Complete Data Science Files
No ratings yet
Anshu Complete Data Science Files
26 pages
Impact of Family and Financial Position On The Household Investment Decisions
No ratings yet
Impact of Family and Financial Position On The Household Investment Decisions
18 pages
The Effect of Active Learning Strategy Index Card Match-Typed On Childrens' Ability To Count
No ratings yet
The Effect of Active Learning Strategy Index Card Match-Typed On Childrens' Ability To Count
5 pages
Philippine Banking & AIS Evolution
No ratings yet
Philippine Banking & AIS Evolution
65 pages
Percentage Points, Student'S T-Distribution: F (T) N X N N DX
No ratings yet
Percentage Points, Student'S T-Distribution: F (T) N X N N DX
2 pages
Supplement To Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-Distributions, and Degrees of Freedom
No ratings yet
Supplement To Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-Distributions, and Degrees of Freedom
9 pages
The Effects of Physical Attractiveness On Job-Related Outcomes: Meta-Analysis OF Experimental Studies
No ratings yet
The Effects of Physical Attractiveness On Job-Related Outcomes: Meta-Analysis OF Experimental Studies
32 pages
When Do We Use Chi Square?
No ratings yet
When Do We Use Chi Square?
10 pages
Statdisk User Manual
0% (1)
Statdisk User Manual
20 pages
Course Outline M. Com Part 1 BZU, Multan
No ratings yet
Course Outline M. Com Part 1 BZU, Multan
5 pages
Lars WG Manual
No ratings yet
Lars WG Manual
28 pages
Complete Bundle Probability and Statistical Inference 9th Edition Hogg
No ratings yet
Complete Bundle Probability and Statistical Inference 9th Edition Hogg
412 pages
MA2201 - Evaluation Criteria For Students
No ratings yet
MA2201 - Evaluation Criteria For Students
2 pages
EDX As s1 2017 v1
No ratings yet
EDX As s1 2017 v1
4 pages
Simulation Input Modeling Guide
No ratings yet
Simulation Input Modeling Guide
63 pages
Blockchain's Role in HR: An Indian Study
No ratings yet
Blockchain's Role in HR: An Indian Study
19 pages
An Evaluation of Strategic Responses To Consumer Boycotts: Ulku Yuksel, Victoria Mryteza
No ratings yet
An Evaluation of Strategic Responses To Consumer Boycotts: Ulku Yuksel, Victoria Mryteza
21 pages
Total Variance Explained
No ratings yet
Total Variance Explained
9 pages
Final PPT Gearless Scooter Vaibhav 06052018 4.57
No ratings yet
Final PPT Gearless Scooter Vaibhav 06052018 4.57
60 pages
Probability (Schaum's Outline Series, 3rd Ed.) 3rd Edition Seymour Lipschutz All Chapter Instant Download
100% (4)
Probability (Schaum's Outline Series, 3rd Ed.) 3rd Edition Seymour Lipschutz All Chapter Instant Download
57 pages
Introduction To Business Statistics 6th Edition Ronald M. Weiers Instant Download
No ratings yet
Introduction To Business Statistics 6th Edition Ronald M. Weiers Instant Download
52 pages
Problem: Hypothesis Testing: Chi-Square Test:: Usage Gender
No ratings yet
Problem: Hypothesis Testing: Chi-Square Test:: Usage Gender
4 pages
Dr. Arsham's Statistics Site
No ratings yet
Dr. Arsham's Statistics Site
139 pages
Groenewald 2014
No ratings yet
Groenewald 2014
325 pages
11B-Part 5 SPSS Screenshots
No ratings yet
11B-Part 5 SPSS Screenshots
11 pages