Day 02-Random Variable and Probability - Part (I)
Day 02-Random Variable and Probability - Part (I)
Continuous Random Variables: Random variables that can assume any value corresponding to any of the
points contained in one or more intervals They are usually measurements. Things like heights, weights, and
time are continuous random variables
Ex: time it takes to complete a race or the length of time between arrivals at a hospital clinic.
3
Probability Distributions
4
Probability Distributions
Discrete Probability Distribution: The probability distribution of a discrete random variable is a graph, table
or formula that specifies the probability associated with each possible value the random variable can
assume.
5
Discrete probability Distribution:
Binomial Distribution
6
Example : Binomial Distribution
7
Example : Binomial Distribution
8
Binomial Distribution
9
In class exercise:
Probability Calculation for discrete P.D
10
Continuous Probability Distributions
Continuous Probability Distribution: We can describe the probability distribution of a continuous random
variable using a probability density function. A probability density function f(x) is a function that you can use
to find the
probabilities of a continuous variable across a range of values. It tells us
what the shape of the probability distribution is.
Probability is all about how likely things are to happen, and the frequency tells you how often values occur.
The higher the relative frequency, the higher the probability of that value occurring
11
Normal Distribution
It is one of the most famous statistical distributions in use .The normal distribution is a continuous
probability distribution Several phenomenon are modelled with the normal distribution.
For example : Heights of people are normally distributed as well as possible blood pressure levels for
people.
12
Normal Distribution
The Properties of the Normal Distribution:
A continuous random variable X has a normal distribution if its values fall into a smooth curve that is bell
shaped.
• Every normal distribution has its own mean (denoted μ) and its own standard deviation (denoted σ). The
normal distribution is defined by its mean and standard deviation.
• Its shape of the normal distribution is symmetric around the mean.
• The mean, median, and the mode of a normal distribution are equal.
• The area under the curve is 1.
• Normal distributions are denser in the center and less so in the tails.
• Since the normal distribution is mound shaped, it follows the empirical rule i.e.
i) 68% of the area of a normal distribution is within one standard deviation of the mean;
ii) 95% of the data are within 2 standard deviations 13
of the mean;
Normal Distribution
But there are many cases where the data tends to
Data can be "distributed" (spread out) in different ways.
be around a central value with no bias left or right,
and it gets close to a "Normal Distribution" like
this
Sample(1:10000,100)
14
Normal Distribution
The Normal
Distribution has
mean = median = mode
Symmetry about the center
50% of values less than the mean
and 50% greater than the mean
It represents the distance between a given data point and the mean, expressed in standard
deviations. The score is also known as “standardizing” the data point
•Large z-scores tell us that the measurement is larger than almost all other measurements in the data set.
•Similarly, a small z-score tells us that the measurement is small than all other measurements.
• If a score is 0, then the observation lies on the mean.
16
17
18
Probability calculation for Normal Distribution
Consider normally distributed random variable X~N(mu,sigma^2)
To compute probability P(X<=x)
19
Calculating Probabilities : Normal distribution
Ex: find the probability that a normally distributed random variable has
a mean of 60 and a standard deviation of 10 and we want to find the probability that X is less than 70.
20
In class exercise
21
Exercise
22
Exercise
23
Normal Distribution Example and Application
24
Stock
Price
To understand normal distribution and its application, Let us use daily returns of stocks traded in BSE
(Bombay Stock Exchange). Imagine a scenario where an investor wants to understand the risks and returns
associated with various stocks before investing in them.
For this analysis, we will evaluate two stocks: BEML and GLAXO. The daily trading data (open and close
price) for each stock is taken for the period starting from 2010 to 2016 from BSE site (www.bseindia.com)
25
Data
BEML
GLAXO
26
What questions can be
answered?
27
To answer the above questions, we must find out the behavior of daily returns (we will refer to this as
gain hence forward) on these stocks. The gain can be calculated as a percentage change in close price,
from the
previous day’s close price
The method pct_change() in Pandas will give the percentage change in a column value shifted by a
period, which is passed as a parameter to periods
28
29
Mean and Variance
Glaxo :
Mean: 0.0004
Standard Deviation: 0.0134
BEML:
Mean: 0.0003
Standard Deviation: 0.0264
Gain seems to be normally distributed for both the stocks with a mean around 0.00.
BEML seems to have a higher variance than Glaxo
30
Note: This distribution has a long tail, but we will assume normal distribution for simplicity and discuss the example
The expected daily rate of return (gain) is around 0%
for both stocks.
31
To calculate the probability of gain higher than 2% or more, we need to find out what is the sum of all
probabilities that gain can take values more than 0.02 (i.e., 2%).
Probability of making
2% loss or higher in Glaxo: 0.063 or 6.3%
Probability of making
2% loss or higher in BEML: 0.2215 or 22.3%
Probability of making
2% gain or higher in Glaxo: 0.0710 or 7.1%
Probability of making
2% gain or higher in BEML: 0.2276 or 22.76%
32
Normal Distribution codes
33
Student’ t- Distribution codes
34