TOPIC TWO: INTRODUCTION
TO RANDOM VARIABLE AND DAG09102: STATISCAL
METHOD
PROBABILITY DISTRIBUTION
RANDOM VARIABLES
1. Random Variable (R.V)
This is a variable whose numerical value is determined by the
outcome of a random experiment.
It takes on different values based on the outcome of an uncertain
event or process.
These values are determined by chance and follow a probability
distribution.
In other words this is the variable that is subject to randomness and it
can take on different values which make it different from algebraic
variable.
2. Random Experiment
A random experiment is an experiment or process that results in one of
several possible outcomes, but the exact outcome cannot be predicted in
advance.
The outcome of a random experiment is uncertain and is determined by
chance, even though all possible outcomes are known.
Characteristics of a Random Experiment
Well-Defined Set of Outcomes: The experiment has a clearly defined set of
all possible outcomes, known as the sample space.
Reproducibility: The experiment can be repeated under similar conditions.
Randomness: The outcome of each trial is determined by chance, and
cannot be exactly predicted.
Example of Random Experiments
i. Flipping coin
ii. Rolling a die
iii. Drawing a card
iv. Conducting a Survey:
Experiment: Ask 100 people if they prefer tea or coffee.
Possible Outcomes: Each respondent could choose either "tea" or
"coffee," and their responses vary randomly.
QUESTION: Can Gender and Marital status be considered as a
random Variable
More Examples
1.Survey on Gender: If you randomly sample 100 individuals from
a population, the gender of each individual is a random variable
because the outcome is not known in advance and depends on the
random sampling process. The random variable can take the
categories "Male" or "Female" (or others if applicable).
Survey on Marital Status: Similarly, if you randomly sample 50
individuals, the marital status of each individual is a random
variable. It could take values like "Single," "Married," "Divorced,"
etc., depending on who is randomly selected.
1 Types Of Random Variable
There are two types of random variable (R.V), these are
discrete and
continuous random variables.
A discrete random variables takes on only a finite or countable number of
values .
Examples of discrete random variables are; the number of cars passing
through the roadblock, number of defective items in a sample, number of
death by COVID-19 in year 2020, etc.
On the other hand a continuous random variable is a random variable
that can take on any value (always real numbers)in a given interval of
values. Examples of continuous R.V are, height, weight, rainfall,
temperature etc.
Probability Distribution Of Random Variable
A probability distribution is a Graph, Table or a function that links
each outcome of a random experiment with its probability of occurrence.
In other words we may say probability distribution is a function that can
be used to derive probabilities of each outcome of a random variable.
There are two types of Probability Distribution of a Random Variable
(R.V), these are
discrete and
continuous Probability Distribution.
The following chart may help you understand the branching of probability
distribution
1. Probability Mass Function (PMF)Definition: A probability
mass function (PMF) is a function that gives the probability that a
discrete random variable is exactly equal to a specific value.
Mathematical Notation: For a discrete random variable 𝑋, the PMF is
defined as: 𝑃(𝑋=𝑥)=𝑓(𝑥)
2.Cumulative Distribution Function (CDF) for Discrete
Random Variables. The cumulative distribution function (CDF) for
takes a value less than or equal to 𝑥 .
a discrete random variable gives the probability that the variable
Mathematical Notation: For a discrete random variable 𝑋, the CDF
𝐹(𝑥) is defined as: 𝐹(𝑥)=𝑃(𝑋≤𝑥)
PROPERTIES OF DISTRIBUTION FUNCTIONS
1. DISCRETE CASE
Properties of PMF:
1.For every possible value x, the probability is non-negative:
2.The sum of the probabilities over all possible values of the random
variable is 1:
EXAMPLE: Consider a fair die roll. The random variable X represents
the outcome of the die roll. The PMF of X is:
This means the probability of each outcome is , and the sum of
probabilities across all outcomes equals :
Main Properties of CDF:
1.Non-decreasing:
The CDF is always non-decreasing since probabilities accumulates as x
increases:
2.Range: CDF always takes values in the range [0,1]
3. Limits:
The CDF has the following limits:
and
Meaning : AS the CDF approaches 0 as CDF approaches 1
Example of CDF:
Consider the fair die roll example again. The CDF F(x) represents the
probability that the outcome X is less than or equal to a specific
value xxx.
The CDF increases in steps as x increases and eventually reaches 1
when x≥6
CONTINOUS CASE
Probability Density Function (PDF): A probability density
function (PDF) is used for continuous random variables and
describes the likelihood of the random variable taking on a
specific value.
point 𝑥 does not represent the probability directly;
However, unlike PMF, the value of the PDF at a particular
instead, the probability that 𝑋 falls within an interval [ 𝑎, 𝑏] is
given by the area under the curve of the PDF between 𝑎 and
b
Properties of PDF:
1.The PDF is always non-negative:
for all x
2. The total probability of the random variable must be equal to 1. This
expressed as
This ensures that the random variable takes some value within the
entire real line with probability 1.
3. Probability as Area:
The probability that a continuous random variable X lies within a specific
interval [a,b][ is the area under the curve of the PDF between a and b:
=
Properties of CDF:
1.Non-decreasing:
The CDF is always non-decreasing since probabilities accumulates as x
increases:
2.Range: CDF always takes values in the range [0,1]
3. Limits:
The CDF has the following limits:
and
Meaning : AS the CDF approaches 0 as CDF approaches 1
RELATIONSHIP BETWEEN THE PDF and CDF
1.The CDF is the integral of the PDF
2. The PDF is the derivative of CDF
)=
Example: Let c be a constant and consider the following
density function for continuous random variable Y:
cy if 0 y 1
f ( y ) elsewhere
0
a. Find the value of c.
b. Find P(0.2 Y 0.5)
c. find the cumulative distribution function of Y. Then find
and
20
Common Distributions for Random Variables
Discrete Distributions:
Binomial Distribution: Used for a fixed number of
independent trials, each with two possible outcomes (success
or failure).
Poisson Distribution: Models the number of times an event
occurs within a fixed interval of time or space when the events
occur independently.
Continuous Distributions:
Normal Distribution: A bell-shaped distribution characterized
by its mean and standard deviation. It is widely used due to the
Central Limit Theorem.
Exponential Distribution: Often used to model time until an
event, such as the time between arrivals of customers at a
BINOMIAL DISTRIBUTION
A binomial distribution models the number of successes in 𝑛
independent trials with two outcomes (success/failure).
The PMF gives the probability of exactly k successes in 𝑛 trials
== is the binomial ,which gives the number ways to choke success out
of trials
= is the probability of getting k success
is the probability of getting n-k failures
The mean is and the variance is
Example: A factory produces light bulbs, and the probability that a
bulb is defective is 0.1. If 12 bulbs are randomly selected from a
production batch, what is the probability that exactly 2 bulbs are
defective?
Solution
•Number of trials, n=12 (12 bulbs are chosen)
•Probability of success (a bulb being defective), p=0.1
•We are interested in exactly 2 defective bulbs, so k=2
2. POISSON DISTRIBUTION
A Poisson distribution is a discrete probability distribution, meaning that it
gives the probability of a discrete (i.e., countable) outcome.
For Poisson distributions, the discrete outcome is the number of times an
event occurs, represented by k.
You can use a Poisson distribution to predict or explain the number of events
occurring within a given interval of time or space.
“Examples:
i. Number of accidents in Dodoma per week.
ii. Number of micro-organism per millimeter.
You can use a Poisson distribution if:
i. Individual events happen at random and independently. That is,
the probability of one event doesn’t affect the probability of
another event.
ii. You know the mean number of events occurring within a given
interval of time or space. This number is called λ (lambda), and
it is assumed to be constant.
iii. When events follow a Poisson distribution, λ is the only thing you
need to know to calculate the probability of an event occurring a
certain number of times.
The probability of observing exactly k events in a given interval is
given by the Poisson probability formula:
λ is the average number of events (the rate parameter).𝑘 is the
number of events we are interested in.
Mean and Variance:
For a Poisson random variable X, with parameter λ:
Mean (Expected Value): μ=λ
Variance: =λ
Example:
A call center receives an average of 6 calls per hour. What is the
probability that the call center will receive exactly 4 calls in the
next hour?
•Average number of calls per hour, λ=6.
•We are interested in exactly 4 calls, so k=4.
3. NORMAL DISTRIBUTION
The Normal Distribution is one of the most important and widely
used probability distributions in statistics.
It describes how values of a variable are distributed and is often
referred to as the "bell curve" because of its shape.
Key Features of Normal Distribution:
1.Shape: The normal distribution has a symmetrical, bell-shaped
curve. It is symmetric around its mean, with most of the data
clustered around the center.
2.Symmetry: The curve is symmetric, meaning the left and right sides
are mirror images of each other. The mean, median, and mode of a
normal distribution are all equal and located at the center.
3.Parameters: The normal distribution is defined by two parameters:
Mean (μ): The average or center of the distribution. It determines where the
peak of the curve lies.
Standard Deviation (σ): The spread or width of the distribution. It controls the
spread of the curve (i.e., how far the values tend to deviate from the mean).
1.Probability Density Function (PDF): The formula for the
probability density function of a normal distribution is:
,,
The following diagram presents graphs of for several different
pairs
Normal Distribution Standard Deviation
Generally, the normal distribution has any positive standard
deviation.
We know that the mean helps to determine the line of symmetry of
a graph, whereas
the standard deviation helps to know how far the data are spread
out.
If the standard deviation is smaller, the data are somewhat close
to each other and the graph becomes narrower.
If the standard deviation is larger, the data are dispersed more,
and the graph becomes wider.
The standard deviations are used to subdivide the area under the
normal curve. Each subdivided section defines the percentage of
Thus the he Empirical Rule states that,
Approximately 68% of the data falls within one standard deviation
of the mean. (i.e., Between Mean- one Standard Deviation and
Mean + one standard deviation)
Approximately 95% of the data falls within two standard deviations
of the mean. (i.e., Between Mean- two Standard Deviation and
Mean + two standard deviations)
Approximately 99.7% of the data fall within three standard
deviations of the mean. (i.e., Between Mean- three Standard
Deviation and Mean + three standard deviations)
NOTE: This means that data falling outside of three standard
deviations ("3-sigma") would signify rare occurrences.
STANDARD NORMAL DISTRIBUTION
The standard normal distribution, also called the z-
distribution, is a special normal distribution where the mean is 0
and the standard deviation is 1.
Any normal distribution can be standardized by converting its
values into z scores.
A normal distribution with is called a standard normal distribution
,
Standardizing a normal distribution
When you standardize a normal distribution, the mean becomes 0
and the standard deviation becomes 1.
This allows you to easily calculate the probability of certain values
occurring in your distribution, or to compare data sets with different
means and standard deviations.
While data points are referred to as x in a normal distribution, they
are called z or z scores in the z distribution.
A z score is a standard score that tells you how many standard
deviations away from the mean an individual value (x) lies:
.
The standard normal distribution is a probability distribution,
so the area under the curve between two points tells you the
probability of variables taking on a range of values.
The total area under the curve is 1 or 100%.
Once you have a z score, you can look up the corresponding
probability in a z table.
There are a few different formats for the z table
How to calculate a z score
To standardize a value from a normal distribution, convert the
individual value into a z-score:
i. Subtract the mean from your individual value.
ii. Divide the difference by the standard deviation.
EXAMPLE:
Suppose you work for a company and collect sales data from
a group of employees. The sales amounts follow a normal
distribution, with the mean sales being $115,000 and a
standard deviation of $15,000. You want to calculate the
probability that a randomly selected employee's sales will
exceed $138,000
Solution
The z score tells you how many standard deviations away 1380 is
from the mean.
Step 1: Subtract the mean from the x value.
Step 2: Divide the difference by the standard deviation.
=
The z score for a value of 138000 is 1.53. That means 1380 is 1.53
standard deviations from the mean of your distribution.
Next, we can find the probability of this score using a z table.
i.e. = 0.0668
EXAMPLE 2
The heights of adult women in a city follow a normal distribution
with a mean height of 64 inches and a standard deviation of 3
inches. What is the probability that a randomly selected adult
woman from this city has a height between 61 inches and 67
inches?