0% found this document useful (0 votes)
75 views

Day 02-Random Variable and Probability - Part (I)

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

Day 02-Random Variable and Probability - Part (I)

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Random Variable & Probability

© 2013 - 2017 ExcelR Solutions. All Rights Reserved


2
Discrete Random Variables : Random variables that can assume a countable number of values. If a random
variable can only take a finite
number of distinct values, it must be discrete

Ex : number of defective light bulbs in a box, the number of children in a family

Continuous Random Variables: Random variables that can assume any value corresponding to any of the
points contained in one or more intervals They are usually measurements. Things like heights, weights, and
time are continuous random variables

Ex: time it takes to complete a race or the length of time between arrivals at a hospital clinic.

3
Probability Distributions

•Continuous Probability Distributions


•Discrete Probability Distribution

4
Probability Distributions
Discrete Probability Distribution: The probability distribution of a discrete random variable is a graph, table
or formula that specifies the probability associated with each possible value the random variable can
assume.

Discrete Probability distribution properties:

• The sum of all probabilities in a distribution sum to 1.


• Each value has a probability between 0 and 1.

5
Discrete probability Distribution:
Binomial Distribution

A binomial experiment is a statistical experiment that has the following properties:

⮚The experiment consists of n repeated trials.


⮚Each trial can result in just two possible outcomes. We call one of these outcomes
a success and the other, a failure ; Yes or No
⮚The probability of success, denoted by P, is the same on every trial.
⮚The trials are independent; that is, the outcome on one trial does not affect the
outcome on other trials.

6
Example : Binomial Distribution

Consider the following statistical experiment.


You flip a coin 2 times and count the number of times the coin lands on heads.

This is a binomial experiment because:

7
Example : Binomial Distribution

⮚The experiment should consists of n repeated trials.


⮚The experiment consists of repeated trials. We flip a coin 2 times.
⮚Each trial can result in just two possible outcomes
⮚Each trial can result in just two possible outcomes - heads or tails.
⮚The probability of success, denoted by P, is the same on every trial
⮚The probability of success is constant - 0.5 on every trial.
⮚The trials are independent; that is, the outcome on one trial does not affect the
outcome on other trials.
⮚The trials are independent; that is, getting heads on one trial does not affect whether we get heads on
other trials.

8
Binomial Distribution

A binomial random variable is the number of successes x in n repeated trials of a binomial


experiment.

The probability distribution of a binomial random variable is called a binomial distribution

Suppose we flip a coin and count the number of heads (successes).


The binomial random variable is the number of heads,
which can take on values of 0, 1, or 2. The binomial distribution is presented below.

9
In class exercise:
Probability Calculation for discrete P.D

10
Continuous Probability Distributions
Continuous Probability Distribution: We can describe the probability distribution of a continuous random
variable using a probability density function. A probability density function f(x) is a function that you can use
to find the
probabilities of a continuous variable across a range of values. It tells us
what the shape of the probability distribution is.
Probability is all about how likely things are to happen, and the frequency tells you how often values occur.
The higher the relative frequency, the higher the probability of that value occurring

11
Normal Distribution

It is one of the most famous statistical distributions in use .The normal distribution is a continuous
probability distribution Several phenomenon are modelled with the normal distribution.
For example : Heights of people are normally distributed as well as possible blood pressure levels for
people.

12
Normal Distribution
The Properties of the Normal Distribution:
A continuous random variable X has a normal distribution if its values fall into a smooth curve that is bell
shaped.
• Every normal distribution has its own mean (denoted μ) and its own standard deviation (denoted σ). The
normal distribution is defined by its mean and standard deviation.
• Its shape of the normal distribution is symmetric around the mean.
• The mean, median, and the mode of a normal distribution are equal.
• The area under the curve is 1.
• Normal distributions are denser in the center and less so in the tails.
• Since the normal distribution is mound shaped, it follows the empirical rule i.e.
i) 68% of the area of a normal distribution is within one standard deviation of the mean;
ii) 95% of the data are within 2 standard deviations 13
of the mean;
Normal Distribution
But there are many cases where the data tends to
Data can be "distributed" (spread out) in different ways.
be around a central value with no bias left or right,
and it gets close to a "Normal Distribution" like
this

Sample(1:10000,100)
14
Normal Distribution
The Normal
Distribution has
mean = median = mode
Symmetry about the center
50% of values less than the mean 
and 50% greater than the mean

95% of values are within  99.7% of values are within 


68% of values are within
2 standard deviations of the mean 3 standard deviations of the
1 standard deviation of the mean mean
15
Z- Scores
z-score makes use of the mean and the standard deviation of the data set in order to specify the
relative location of a measurement.

It represents the distance between a given data point and the mean, expressed in standard
deviations. The score is also known as “standardizing” the data point

•Large z-scores tell us that the measurement is larger than almost all other measurements in the data set.
•Similarly, a small z-score tells us that the measurement is small than all other measurements.
• If a score is 0, then the observation lies on the mean.

16
17
18
Probability calculation for Normal Distribution
Consider normally distributed random variable X~N(mu,sigma^2)
To compute probability P(X<=x)

from scipy import stats


stats.norm.cdf( x, loc=mean,scale=std), by def it would consider X<=32
1- stats.norm.cdf( x, loc=mean,scale=std), for greater than 32, use Area under the
curve

1. X: It is the random variablel.


2. loc: It is the location parameter of the distribution. It is mean for normal distribution.
3. scale: It is the scale parameter of the distribution. It is standard deviation for normal distribution.

19
Calculating Probabilities : Normal distribution
Ex: find the probability that a normally distributed random variable has
a mean of 60 and a standard deviation of 10 and we want to find the probability that X is less than 70.

from scipy import stats


stats.norm.cdf( x, loc=mean,scale=std)

20
In class exercise

21
Exercise

22
Exercise

23
Normal Distribution Example and Application

24
Stock
Price
To understand normal distribution and its application, Let us use daily returns of stocks traded in BSE
(Bombay Stock Exchange). Imagine a scenario where an investor wants to understand the risks and returns
associated with various stocks before investing in them.

For this analysis, we will evaluate two stocks: BEML and GLAXO. The daily trading data (open and close
price) for each stock is taken for the period starting from 2010 to 2016 from BSE site (www.bseindia.com)

25
Data

BEML

The dataset contains


daily Open and Close price along with daily
High and Low prices, Total Trade
Quantity, and Turnover (Lacs).

Our discussion will involve only close price.


The daily returns of a stock are calculated as
the change in close prices with respect to the
close price of yesterday.

GLAXO

26
What questions can be
answered?

1.What is the expected daily rate of return of these stocks?


2. Which stocks have higher risk or volatility as far as daily returns are concerned?
3. Which stock has higher probability of making a daily return of 2% or more?
4. Which stock has higher probability of making a loss (risk) of 2% or more?

27
To answer the above questions, we must find out the behavior of daily returns (we will refer to this as
gain hence forward) on these stocks. The gain can be calculated as a percentage change in close price,
from the
previous day’s close price

The method pct_change() in Pandas will give the percentage change in a column value shifted by a
period, which is passed as a parameter to periods

28
29
Mean and Variance

Glaxo :
Mean: 0.0004
Standard Deviation: 0.0134

BEML:
Mean: 0.0003
Standard Deviation: 0.0264

Gain seems to be normally distributed for both the stocks with a mean around 0.00.
BEML seems to have a higher variance than Glaxo

30
Note: This distribution has a long tail, but we will assume normal distribution for simplicity and discuss the example
The expected daily rate of return (gain) is around 0%
for both stocks.

Here variance or standard deviation of gain indicates


risk.
So, BEML stock has a higher risk as standard
deviation of BEML is 2.64% whereas the standard
deviation for Glaxo is 1.33%.

31
To calculate the probability of gain higher than 2% or more, we need to find out what is the sum of all
probabilities that gain can take values more than 0.02 (i.e., 2%).

Probability of making
2% loss or higher in Glaxo: 0.063 or 6.3%

Probability of making
2% loss or higher in BEML: 0.2215 or 22.3%

Probability of making
2% gain or higher in Glaxo: 0.0710 or 7.1%

Probability of making
2% gain or higher in BEML: 0.2276 or 22.76%
32
Normal Distribution codes

33
Student’ t- Distribution codes

34

You might also like