0% found this document useful (0 votes)
17 views50 pages

Week 4 - Probability Descriptive Statistics Cont (Post-Class)

The document outlines the learning objectives for Week 4 of AFM 323, focusing on bivariate probability distributions, including joint, marginal, and conditional distributions, as well as covariance and correlation. It discusses the extension of probability concepts from univariate to bivariate distributions, providing examples and properties of joint and marginal distributions. Additionally, it covers the importance of exploratory data analysis and descriptive statistics in the context of financial asset returns.

Uploaded by

yukttha.s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views50 pages

Week 4 - Probability Descriptive Statistics Cont (Post-Class)

The document outlines the learning objectives for Week 4 of AFM 323, focusing on bivariate probability distributions, including joint, marginal, and conditional distributions, as well as covariance and correlation. It discusses the extension of probability concepts from univariate to bivariate distributions, providing examples and properties of joint and marginal distributions. Additionally, it covers the importance of exploratory data analysis and descriptive statistics in the context of financial asset returns.

Uploaded by

yukttha.s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Welcome to AFM 323:

Quantitative Foundations for


Finance
Week 4
Learning Objectives
 Bivariate discrete probability distributions
 Bivariate continuous probability distributions
 Joint, marginal and conditional distributions
 Covariance and correlation
 Univariate to multivariate framework – portfolio of assets
 Linear combinations of normally distributed random variables
 Exploratory Data Analysis for Financial Asset Returns
 Descriptive Statistics
 Univariate Descriptive Statistics continued
 Empirical CDF
 Value-at-Risk
 Additional Measures of Dispersion
 QQ Plots
 Bivariate descriptive statistics
 Covariance and correlation

2
Bivariate Distributions
 So far we have considered the probability distributions of univariate random variables: Suncor stock return or S&P
TSX composite index return
 Now, we extend the definitions and concepts to two random variables, say X and Y:
X=Suncor stock return and Y=S&P TSX composite index return
 First, we discuss how to extend the concept of a probability distribution of a single random variable X to a joint
probability distribution of two random variables X and Y
 We want to know if there is any relation between X and Y.
 In particular, we want to know the probability that X takes on a particular value x and Y takes on a particular value y
 That is, we want to determine p(x,y) = Pr(X = x , Y = y)
 This joint probability distribution function determines the likelihood that rv’s X and Y takes on values in the joint
sample space for X and Y

3
Bivariate Distribution - Example
 Consider two discrete random variables for monthly return on Suncor stock (in percent), labelled X and monthly
return on Encana denoted as Y.

 For simplicity we assume that the sample spaces for X and Y are respectively, so that the
random variables X and Y are discrete

 The joint sample space for X and Y is a two-dimensional grid:

4
Joint Distribution - Example
 The joint distribution for X and Y is given by the following table:
 Now, we can determine the probability that X takes on a particular
value x and Y takes on a particular value y,
i.e., p(x, y) = Pr(X = x, Y = y) from the values in the table on the right
 Example: p(0, 0) = Pr(X = 0, Y = 0) = 1/8; p(1, 1) = 1/8 1/8
 This is a joint probability distribution function because it makes a statement about the probability of two events
occurring together
 The bivariate distribution is illustrated graphically in the figure below as a 3-dimensional bar chart:

5
Properties of a Joint pdf P(x,y)
 The joint sample space for X and Y:
 The joint probability distribution function for X and Y are nonnegative for
all x and y in the joint sample space for X and Y:
 p(0,0) = 1/8
p(1,0) = p(2,0) =
2/8 p(3,0) =
1/8 00
1/8 2/8 1/8
 The joint probability distribution function for X and Y are zero for all x and y not in the joint sample space for X and
Y:
 The joint probability distribution functions for X and Y sum to 1 for all x and y in the joint sample space for X and Y:

p(0,0) + p(0,1) +p (1,0) + p(1,1) + p(2,0) + p(2,1) + p(3,0) + p(3,1) =


1/8 + 0 + 2/8 + 1/8 + 1/8 + 2/8 +0 + 1/8 =1

6
Marginal Distribution
 The joint probability distribution tells the probability of X and Y occurring together.
 What if we only want to know about the probability of X occurring or the probability of Y occurring?
 Suppose that we want to find Pr(X=0) and Pr(Y=1) from a given joint distribution.
 Consider the joint distribution in the table to the right:
 What is Pr(X = 0) independent of the value of Y ?
 Now X can occur if Y = 0 or if Y = 1 and since these two events are
mutually exclusive we have that:
Pr(X=0) = Pr(X=0,Y=0) + Pr(X=0,Y=1) = 1/8+0 = 1/8

 Notice that this probability is equal to the horizontal (row) sum of the probabilities in the table at X=0.

 p(X=1) = 3/8 p(X=2) = 3/8 p(X=3) = 1/8


3/8 3/8 1/8

7
Marginal Probability

 Consider the joint distribution in the table above:


 What is Pr(Y = 0) independent of the value of X ?
 Now Y can occur if X = 0 or 1 or 2 or 3- these events are mutually exclusive we have that
Pr(Y=0) = Pr(X=0,Y=0) + Pr(X=1,Y=0) = Pr(X=2,Y=0) + Pr(X=3,Y=0) = 0 + 1/8 + 2/8 + 1/8
p(Y=0)= 4/8

 Notice that this probability is equal to the vertical (row) sum of the probabilities in the table at Y=0.

 p(Y=1) = 4/8
4/8

8
Marginal Probability
 The probability Pr(X=x) is the marginal probability distribution function of X and is in general given by

 Similarly, the probability Pr(Y=y) is the marginal probability distribution function of Y and is in general given by

 It is a called a marginal probability distribution function because it depends only on totals found in the margins of the
table.

 The marginal probabilities of X=x are given in the last column of the above Table.
 The marginal probabilities of Y=y are given in the last row of Table.
 Notice that these probabilities sum to 1.

9
Conditional Probability
 Suppose that we know that Y=0.
 How does this particular knowledge affect the probability that X=0, 1, 2, or 3, or how can we make good use of this
information to improve the probability that X=0,1, 2, or 3?
 i.e., what are: Pr(X=0|Y=0), Pr(X=1|Y=0), Pr(X=2|Y=0), or Pr(X=3|Y=0) equal to?

 Similarly, suppose that we know Y=1


 How does this particular knowledge affect the probability that X=0, 1, 2, or 3?
 i.e., what are Pr(X=0|Y=1), Pr(X=1|Y=1), Pr(X=2|Y=1), or Pr(X=3|Y=1) equal to?

 The answer is conditional probability.


 Suppose that we know that Y=0.
 Using Bayes’ Law

 Pr(X=0|Y=0) means the probability that X=0 given that Y= 0.


 Pr(X=0|Y=0) = 1/4; Similarly Pr(X=1|Y=0) = Pr(X=2|Y=0) = Pr(X=3|Y=0) =
2/4
2/4 1/4
1/4 00

10
Conditional Probability
 Pr(X=0|Y=0) = 1/4 > Pr(X=0) = 1/8
 Hence, knowledge that Y=0 does increase the likelihood that X=0
 Clearly, X depends on Y, i.e., knowing that Y=0 gives us a higher
probability that X=0 (1/4) compared to not knowing that Y=0, in
which case the probability that X=0 is 1/8
 In contrast, the marginal probability, Pr(X=0) ignores information about Y.
 Now suppose that we know that X=0
 How does this knowledge affect the probability that Y=0?
 To find out we compute

 Notice that Pr(Y=0|X=0)=1 > Pr(Y=0) =1/2


 That is, knowledge that X=0 makes it certain that Y=0.

11
Conditional Probability
 Similarly, we can calculate:

 In general, the conditional probability that X = x given that Y = y (provided that Pr(Y = y) ≠ 0) is

 The conditional probability that Y = y given that X = x (provided that Pr(X = x) ≠ 0) is

12
Independence
 Let X and Y be two discrete random variables with:
 pdfs: p(x), p(y)

 sample spaces:

 joint pdf: p(x,y)

 Then X and Y are (statistically) independent random variables if and only if the joint PDF of X and Y is the product of
individual PDFs: for all x in SX and y in SY.
 If X and Y are independent random variables, then the conditional PDF of X given Y (or Y given X) is equal to its
respective marginal PDF:

 Intuition
 X and Y are independent if knowledge of X does not influence probabilities associated with Y and knowledge of
Y does not influence probabilities associated with X.

13
Bivariate Distributions for Continuous RV
 The joint pdf of continuous rv’s X and Y is a non-negative function f (x, y) such that
 The three-dimensional plot of the joint probability distribution gives a probability
surface whose total volume is unity.
 Let [x1, x2] and [y1, y2] be intervals on the real line. Then
 Example of a bivariate standard normal distribution

 It has the shape of a symmetric bell centered at


x = 0 and y = 0

14
Bivariate Standard Normal Distribution
 To find Pr(−1 < X < 1, −1 < Y < 1), we need to solve

which does not have an analytical solution.

 Numerical approximation methods (available in Excel) are required to evaluate the above integral.

15
Covariance and Correlation
 In panel (a) we see no relationship between X and Y
 In panel (b) we see a perfectly positive linear
relationship between X and Y
 In panel (c) we see a perfectly negative linear
relationship
 In panel (d) we see a positive, but less than perfect,
linear relationship.

 Let X and Y be two random variables


 The covariance between X and Y measures the direction of a linear relationship between any two random variables.
 The correlation between X and Y measures both the direction and strength of a linear relationship between any two
random variables
 Note the adjective, linear, in the above sentences.

16
Covariance
 Definition:

 The covariance between two random variables X and Y is given by

17
Covariance - Example
 Example: For the data in the table below:

 Mean (X) = 3/2 Mean (Y) = 1/2


3/2 1/2

18
Properties of Covariances
 Let X and Y be random variables and let a and b be constants.
 Some important properties of Cov(X, Y) are
 Cov(X, X) = Var(X)

 Cov(X, Y) = Cov(Y, X)

 Cov(aX, bY) = a ∙ b ∙ Cov(X, Y)

 If X and Y are independent then Cov(X, Y) = 0 (i.e. no association implies no linear association)

 However, if cov(X, Y) = 0, then X and Y are not necessarily independent (no linear association does not

necessarily imply no association – could have nonlinear association)


 If X and Y are jointly normally distributed, then Cov(X, Y) = 0 implies that X and Y are independent.

19
Correlation
 Correlation: measures both the direction and strength of the linear relationship between any two random variables
 The correlation between two random variables X and Y is given by
 i.e. the correlation coefficient is a scaled/normalized covariance
 Example: For the data in the table, we have

 = 0.577
0.577

20
Correlation
Properties of Correlations:

21
Linear Combinations of Two RV (Review)
 Let X and Y be random variables
 Define a new random variable Z that is a linear combination of X and Y : Z = aX + bY , where a and b are constants

 Then

 And

 And

 Result: A linear combination of two normally distributed random variables is itself a normally distributed random
variable.

22
Portfolio Returns (Review)
 RA = return on asset A with E[RA] = μA and Var(RA) = σ2A

 RB = return on asset B with E[RB] = μB and Var(RB) = σ2B

 Cov(RA, RB) = σAB

 Cor(RA, RB) = ρAB =

 Portfolio
 x = share of wealth invested in asset A
A
 xB = share of wealth invested in asset B
 xA + x B = 1
 The portfolio return is

23
Portfolio Returns and Risk
 How much wealth should be invested in assets A and B?
 Portfolio expected return (this is the gain from investing):

 Portfolio variance /SD (this is the risk from investing):

24
Multi-Period Continuously Compounded
Return
 Let rt = ln(1+Rt) be monthly continuously compounded returns.
 Assume that for all t so that
 Then the annual cc return is equal the sum of twelve-monthly cc returns:
 Since each monthly return is normally distributed, the annual return is
also normally distributed.
 Then the expected annual return:

 Hence, the expected 12-month (annual) return is equal to 12 times the expected monthly return.
 The variance of the annual return:
so that the annual variance is also
equal to 12 times the monthly variance.

25
Multi-Period Continuously Compounded
Return
 The SD of the annual return:

 Hence, the annual standard deviation is times the monthly standard deviation (this result is famously known
as the square root of time rule)
Data Analysis – Excel Add-in
We will be using the Data Analysis ToolPak Add-In for Excel in this course extensively!
To activate it:
 File -> Options -> Add-Ins on the Left Sidebar
 Highlight Analysis ToolPak & hit GO (not OK)
 Check the Analysis ToolPak Option and hit OK

To see the Data Analysis tab -> go to DATA tab and at the far right end banner you should see Data Analysis under the
Analysis section

Refer to the Excel Primer pdf on the Learn site throughout the course if you ever need to go back and remember how we
use various Data Analysis features to calculate statistics and figures

27
Population & Samples
 A population is defined as all members of a specified group
 descriptive measure of a population characteristic (mean, variance) is a parameter

 A sample is a subset of the population


 descriptive measure of a sample characteristic is a sample statistic.

 Nominal scale  Interval scale


 weakest level of measurement  provide ranking

 categorize data – do not rank them  differences between scale values are equal – can be

 example: hedge fund classifications added or subtracted


 example: Celsius and Fahrenheit temperature scales
 Ordinal scale
 sort data into categories  Ratio scale
 example: Standard & Poor’s bond ratings  strongest level of measurement

 all characteristics of interval scale and zero as origin

 apply widest range of statistical tools to data that are

on a ratio scale
 example: returns, earnings per share.

28
Concept of Random Sampling
 A random sample is a sequence of (usually an infinite number of)independently and
identically distributed (i.i.d.) random variables with an unknown pdf, p(x)

 An observed sample (we call data) are (usually a finite number of)
observations generated by the random sample

 Descriptive Statistics are data summaries used to


 describe certain features of the observed sample (or data)

 learn about the unknown pdf, p(x), and

 capture observed dependencies, if any, in the data.

29
Histogram
 A frequency distribution is a tabular display of data summarized in a relatively small number of intervals
 A histogram is the graphical equivalent of a frequency distribution
 A histogram is used to describe the shape of the distribution of the observed sample (or data):
 How to construct a histogram?
 Order data from smallest to largest values; min = smallest value, max = largest value, range = max – min

 Bin width (Scott’s normal reference rule) = 3.5*standard deviation/(number of observations 1/3)

 Number of bins = number of observations 1/2

 Divide the range into N equally spaced bins

 Count the number of observations in each bin


 Create a bar chart.
 Excel – Under Data Analysis, go to Histogram -> Let’s Try it Out

30
Monthly CC Returns - Histogram
Suncor Monthly CC Returns Histogram
80
 The histogram has a bell-shape like the normal
distribution and is centered around values slightly more
70
than zero
60  The bulk of the Suncor returns are between -5% and 15%.
50  The histogram for Suncor is slightly skewed left (long left
Frequency

40 tail) due to larger negative returns than large positive


30 returns
20
 Note: When comparing two or more return distributions,
10
try using the same bins for each histogram – this allows us
to visually see the distribution and compare easier
0
e
5% 0% 5% 0% 5% 0% 5% 0% -5% 0% 5% 10% 15% 20% 25% 30% 35% or
-4 -4 -3 -3 -2 -2 -1 -1 M

-0.5 Take
Take55minutes
minutestotoopen
open“Descriptive
“DescriptiveStatistics
Statistics––
 Eliminating gaps between bars in a histogram (Excel primer pp. 4-8) In-Class
In-ClassProblems”
Problems”and
andAttempt
AttemptS&P
S&PTSXTSX
 Right-mouse the column bar. Click Format. Hit Format Selection on the left side. Returns Histogram tab
Returns Histogram tab
 On the right side under Format Data Series, change Gap Width from 150% to 0%.
Hit ENTER.

31
Monthly Price Data Time Plot
Suncor Adjusted Closing Price (CAD)  What do you observe about asset prices in the plots
2001-2023
shown?
80
60
 The prices exhibit random-walk like behavior with no
40 tendency for the observations on the prices to revert to a
20 constant (or time independent) mean and, thus, appear to
0
be non-stationary
2000/12
2001/08
2002/04
2002/12
2003/08
2004/04
2004/12
2005/08
2006/04
2006/12
2007/08
2008/04
2008/12
2009/08
2010/04
2010/12
2011/08
2012/04
2012/12
2013/08
2014/04
2014/12
2015/08
2016/04
2016/12
2017/08
2018/04
2018/12
2019/08
2020/04
2020/12
2021/08
2022/04
2022/12
2023/08
 Both the Suncor stock price and the S&P TSX Composite
index show the run-up to the global financial crisis of 2008
S&P TSX Composite Index 2001-2023 and then the sharp drop and the subsequent recovery after
25000
20000
the financial crisis. In 2022, there is a variation between
15000 the two - the index has passed the highs of January 2020
10000 while Suncor’s price dropped sharply at the start of the
5000 health crisis and has since recovered.
0
 There is a common trend observed between the two price
2000/12
2001/09
2002/06
2003/03
2003/12
2004/09
2005/06
2006/03
2006/12
2007/09
2008/06
2009/03
2009/12
2010/09
2011/06
2012/03
2012/12
2013/09
2014/06
2015/03
2015/12
2016/09
2017/06
2018/03
2018/12
2019/09
2020/06
2021/03
2021/12
2022/09
2023/06
series.

32
Monthly CC Returns Time Plot
40%
 What do you observe about asset prices in the plots shown?
30% SUNCOR Monthly CC Return 2001-2023
20%  In contrast to asset prices, asset returns are mean-reverting and the
10%

0%
common monthly mean values seem close to zero
-10%  The constant mean value assumption of stationarity looks to hold.
-20%  However, the volatility (i.e., the fluctuation of returns about the
-30%01 11 09 07 05 03 01 11 09 07 05 03 01 11 09 07 05 03 01 11 09 07 05 03 01 11 09 07
01 01 02 03 04 05 06 06 07 08 09 10 11 11 12 13 14 15 16 16 17 18 19 20 21 21 22 23
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
-40%
mean) of both series appears to change over time
-50%
 Both series show higher volatility during the 2008 financial crisis
-60% and the 2020 health crisis
15%
S&P TSX Composite Monthly CC Returns 2001-2023  This is an indication of time-varying conditional volatility (which
10%
is a form of non-stationarity in volatility).
5%

0%
 There does not appear to be any evidence of systematic time
dependence in the returns
-5%

-10%
 Later on we will see that the estimated autocorrelation coefficients
-15%
(which is a new concept to be discussed) are very close to zero
-20%
 The returns for Suncor and the S&P TSX index tend to move
-25%
200101 200212 200411200610 200809201008 201207 201406 201605 201804 202003 202202 together suggesting a positive correlation.

33
Monthly CC Returns Time Plot (Another Perspective)
40%
Monthly CC Returns 2001-2023
30%

20%

10%  Suncor is more volatile than the S&P TSX index


0%

-10%
 In general, the lower volatility of the S&P TSX index
represents the reduced risk of a large diversified
-20%
2001/012003/022005/032007/042009/052011/062013/072015/082017/092019/102021/11 portfolio.
-30%

-40%

-50%
Suncor cc return SU_Ret
-60% S&P TSX cc return
S&P_TSX_Ret

34
Empirical Quantiles
 Mean and variance describe the shape characteristics of a distribution of data such as continuously compounded
returns

 Often we are also interested in describing a relative location of a particular measurement within a given data set

 One such measure is a percentile

 When a company XYZ reports that its yearly sales are in the 90th percentile of all companies in the industry – what
does it mean?
 It means that 90% of all companies in this industry have yearly sales less than XYZ, and only 10% have yearly

sales exceeding XYZ

35
Percentiles
 Empirical percentiles that partition a data set into 4 segments, with each segment containing exactly 25% of the measurement are
known as quartiles .
 The lower (or first) quartile is the 25th percentile,
 The middle (or second) quartile is the median or 50 th percentile,
 The upper (or third) quartile is the 75 th percentile,
 The second empirical quartile is the sample median and is the data point such that half of the data is less than or equal to its
value.
 The distance between the upper (3rd) and lower (1st) quartiles is known as the interquartile range (IQR):
 IQR shows the size of the middle of the distribution of the data
 Quartiles are useful in finding unusual observations in a data set.
 Use [Link] (representing inclusive) to calculate percentiles of dataset

36
Sample Statistics
 To calculate sample quantities for the mean, variance (or standard deviation), skewness and kurtosis of our financial
data, two critical assumptions about the data must be met:
1. data must be covariance (or weakly) stationary, so that the population quantities for the mean, variance (or
standard deviation), skewness and kurtosis of the data are constants and not functions of time. This allows the
sample quantities to be calculated as sample averages
2. Over the sample of observation (t=1,..,T), there must be only one regime/process generating the data, so that
sample quantities can be calculated as one sample average for each moment.
 Under these two assumptions, we calculate the sample mean, variance (or standard deviation), skewness and kurtosis
as follows:

37
Outliers
 Extremely large or small values are called “outliers”
 Outliers can be thought of in two ways:
 First, an outlier can be the result of a data entry error - the outlier is not a valid observation and should be

removed from the data sample


 Second, an outlier can be a valid data point whose behavior is seemingly unlike the other data points - the outlier

provides important information and should not be removed from data sample
 For financial market data, outliers are typically extremely large or small values that could be the result of a data entry
error (e.g. price entered as 1 instead of 10) or a valid outcome associated with some unexpected news.
 Outliers are problematic for data analysis because they can greatly influence the value of sample statistics: the sample
mean, variance, standard deviation, skewness and kurtosis
 Percentile measures are more robust to outliers; outliers do not greatly influence these measures (e.g. median instead
of mean; IQR instead of SD)
 IQR (interquartile range) – outlier robust measure of spread

 Moderate Outlier: Extreme outlier:

38
Outliers

 To illustrate the impact of outliers on sample statistics, the simulated data (i.e. i.i.d N(0,1) data is polluted by a single
large negative outlier)
 The above table compares the sample statistics of the unpolluted and polluted data.
 The sample statistics are influenced by the outliers:
 mean

 skewness

 kurtosis

 standard deviation
Sample Statistics - Example
Excel – Under Data Analysis, go to Descriptive Statistics -> Let’s Try it Out

Take
Take55minutes
minutestotoopen
open“Descriptive
“DescriptiveStatistics
Statistics––
In-Class
In-ClassProblems”
Problems”and
andAttempt
AttemptS&P
S&PTSXTSX
Returns DStats tab
Returns DStats tab

Calculate:
Calculate:
–– Descriptive statistics
Descriptive statistics
–– 1stst, 5thth, 10thth, 25thth, 50thth, 75thth, 90thth, 95ththand 99thth
1 , 5 , 10 , 25 , 50 , 75 , 90 , 95 and 99
percentiles
percentiles
–– Interquartile range
Interquartile range
–– Moderate outliers
Moderate outliers
–– Extreme outliers
Extreme outliers

40
Additional Measures of Dispersion
 Relative Dispersion: Coefficient of Variation = standard deviation/mean
 Free of scale – allows comparison of dispersion across datasets - how much dispersion exists relative to the

mean of the distribution


 Amount of risk per unit of return

 Sharpe Ratio: Amount of excess return per unit of risk


 (Mean portfolio return - mean risk-free return)/standard deviation of portfolio return

 Common measure of portfolio performance.

 Chebyshev’s inequality using standard deviation as a measure of dispersion


 Let k be any positive constant greater than 1. The proportion of the observations within k standard deviations of the
mean is at least (1- 1/k2) for all k>1
 The inequality holds for samples and populations and for discrete and continuous data regardless of the shape
of the distribution
 K =2, 75% of the observations should lie within +/- 2 standard deviations of the mean
 K =5, 96% of the observations should lie within +/- 5 standard deviations of the mean

41
Empirical CDF
 Recall that the CDF of a rv X is
 Then the empirical CDF of a random sample is
 How to compute and plot the empirical
for a sample of data ?

 Sort data from smallest to largest values in the form of order statistics:

 Plot against sorted data

 are known as order statistics, in particular, and


 Why are we interested in computing an empirical CDF?
 This is a simple way to assess whether a given empirical distribution of asset return is normally distributed or
close to being normally distributed as often assumed
 Next, we compare the empirical CDF of a random variable (which is our asset return) to the CDF of a N(0,1)
distribution.

42
Calculating the Empirical CDF
 Question: Does the observed data come from a normal distribution? Let’s
Let’ssee
seewhat
whatthis
thislooks
lookslike
likeininExcel
Excel
 To answer this question, we follow the steps given below:
 Step 1. Standardize data to have a zero mean and a variance equal to one

 Step 2. Sort standardized data from smallest to largest values:

 Step 3. Compute standard normal (also known as Gaussian White Noise – GWN) CDF at each sorted value:

 Plot and against the sorted data.


 We can interpret the Empirical CDF as follows:
1.20
CDF_SU_Ret
(EMPIRICAL)
– If the red curve is close to the blue curve which is the
CDF_SU_Ret
(Normal)
1.00 reference distribution which is normal, then our
0.80 conjecture the that empirical distribution of Suncor’s cc
returns is normal is appropriate
0.60
– The closer the two curves, the more plausible it is that
0.40 the data is sampled from a normally distributed
0.20
population
– We notice there are deviations especially around the tails
0.00
-8.00 -6.00 -4.00 -2.00 0.00 2.00 4.00 6.00 (positive and negative) for Suncor’s returns from a
normal distribution.

43
Value at Risk (Review)
 Let denote a sample of T simple monthly returns on an investment.
 Let be the initial value of an investment
 For , the historical VaRα is for simple returns where
 Note: For cc returns , we use where
 Consider investing $10,000 in Suncor for a month, and we calculate the VaR at 1%, we can say VaR0.01 =
10,000*(exp(q0.01) - 1) = $1,854. So we say that a $10,000 monthly investment in Suncor will lose $1,854 or more
with 1% probability -> recall from last week!
 If the corresponding VaR at 1% for the S&P TSX is $858, since this is considerably smaller than Suncor’s 1% VaR,
we can say that investing in Suncor is riskier than investing in the S&P 500 index.

44
Quantile-Quantile (Q-Q) Plot Let’s
Let’ssee
seewhat
whatthis
thislooks
lookslike
likeininExcel
Excel

 A normal probability or Quantile-Quantile (QQ plot) is useful for comparing the data with the quantile of a specified
or reference distribution (usually a normal distribution) that we think is appropriate for the return data -> i.e. if we
believe the distribution is normal and want to check it
 The QQ-plot is an XY plot with the reference distribution (normal distribution quantiles on the x-axis and the
empirical quantiles (Suncor empirical quantiles) on the y-axis.
 How to construct a QQ Plot
1. Column C is rank, i ranging from 1: n (n is number of observations in the data series)
2. Column D is the sorted Suncor returns
3. Column E is the cumulative relative frequency: i/n
4. Column F lists the standard normal quantiles: NORMINV(E2,0,1)
5. Column F values are copied and pasted as values in column G
6. Column H is the standardized Suncor returns
7. Highlight columns F, G and H and draw a scatter XY plot.

45
Q-Q Plot Interpretation
Q-Q Plot Suncor Monthly CC returns
 We can interpret the QQ plot in the following way:
4  If all of the points are close to a straight line, then the
reference distribution we conjecture is appropriate
2  If the points do not fall close to a straight line, then the
reference distribution we conjecture is not appropriate and
-4 -3 -2 -1
0
0 1 2 3 4 5
we should consider a different distribution instead
 The closer the red dots are to the blue dots, the more
-2 plausible it is that the data is sampled from a normally
distributed population.
-4  The QQ plot for Suncor’s returns indicate that there are
outliers indicating deviation from a normal distribution.
-6

standard normal quantiles standardized SU returns


 Standard normal quantiles are plotted against themselves
 Standardize SU returns (Y axis) are plotted against standard normal
quantiles (X axis)

46
Bivariate Descriptive Statistics
SUNCOR CC Ret vs. S&P TSX CC Ret
S
u 15%
 Sample covariance
n
c
o 10%
r  Sample correlation
5%

0%
-60% -50% -40% -30% -20% -10% 0% 10% 20% 30% 40%
 Sample covariance and correlation between Suncor and
S&P TSX Return -5% S&P TSX cc returns
-10%
 (Use Data/Data Analysis/Analysis Tools/Covariance)
-15%

-20%

-25%

 Suncor’s returns appear to be positively correlated (moderately high


positive correlation) to the S&P TSX index returns. The correlation
0.63

47
Wrap Up & Next Class

48
Market/Economics Graphics Report &
Presentation
Goal: To link economics & finance to a topic of your choice. Find an article or research a topic that
interests you and your group. Link that topic or article to the concept of economics or finance.

You must cover the following elements:


• Motivation for choosing the topic (why is it relevant or important)
• Description of your findings
• Intuitive economic explanations for your finding
• Main takeaways
• Suggestions for additional research/analysis that could provide additional insights to the findings

The selection of your topics is pretty open ended – you can really discuss anything as long as it has a relation to
economics and enough content to create a report, graphic and presentation.

Format:
• Max 2-page report (excluding references) + 1 pager graphic (graphic not included in the 2-page count)
• The graphic is meant to be an infographic that the user can read and pick up the key concepts of your report
from

Presentation:
• 10-minute presentation to the class with a 5 min Q&A session
• Q&A team will ask questions to the presenting team and if time permits, we will open up Q&A to the entire
class
• First presentations will take place on October 22 – your report & slide deck is due to the dropbox by
12PM on the day of your presentation
49
Now What for Week 5?
Week 5 Focus: Constant Expected Return (CER) Model
• What does the CER mean?
• How can we define error terms?
• Estimating regression parameters of the CER model
• Statistical properties of estimators

Problem Sets:
• Problem Set 3 – Descriptive Statistics now available on Learn (attempt to complete it to test your
understanding)
• Review the Probability Review (Part V), Descriptive Stats (Part I) and Descriptive Stats (Part II) excel files
on Learn for the sample calculations

Assignments Due:
• Assignment 2 – Random Variables & Descriptive Statistics due 7PM on October 8
• Please review the assigned stock information (under the Admin folder on Learn) to see what stock you
have been assigned. Note that you will stick with this stock to complete all assignments in the course

Projects & Presentations:


• None

50

You might also like