0% found this document useful (0 votes)

61 views9 pages

Understanding Gaussian Distributions

This document introduces the basics of probabilistic modeling for continuous data, focusing on the Gaussian (Normal) distribution and its properties. It covers concepts such as cumulative distribution functions (CDFs), probability density functions (PDFs), and parameter estimation for Gaussian distributions. The document also explains how to extend the univariate Gaussian to a multivariate case, including the use of mean vectors and covariance matrices.

Uploaded by

Pinjala Anoop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views9 pages

Understanding Gaussian Distributions

Uploaded by

Pinjala Anoop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Note 8 Informatics 2B - Learning Note 8 Informatics 2B - Learning

Cumulative Distribution Function

Gaussians

1.0
0.8
Hiroshi Shimodaira∗

January-March 2020

0.6
F(x)

0.4
In this chapter we introduce the basics of how to build probabilistic models of continuous-valued data,
including the most important probability distribution for continuous data: the Gaussian, or Normal,
distribution. We discuss both the univariate Gaussian (the Gaussian distribution for one-dimensional

0.2
data) and the multivariate Gaussian distribution (the Gaussian distribution for multi-dimensional data).

0.0
8.1 Continuous random variables
0 10 20 30 40

First we review the concepts of the cumulative distribution function and the probability density function x
of a continuous random variable.
Many events that we want to model probabilistically are described by real numbers rather than discrete Figure 8.1: Cumulative distribution function of random variable X in the ‘bus’ example.
symbols or integers. In this case we must use continuous random variables. Some examples of
continuous random variables include:
The probability distribution function for a random variable assigns a probability to each value that the
variable may take. It is impossible to write down a probability distribution function for a continuous
• The sum of two numbers drawn randomly from the interval 0 < x < 1; random variable X, since P(X = x) = 0 for all x. This is because X is continuous and can take infinitely
• The length of time for a bus to arrive at the bus-stop; many values (and 1/∞ = 0). However we can write down a cumulative distribution F(X), which gives
the probability of X taking a value that is less than or equal to x. For the current example:
• The height of a member of a population. 


 0 x<0

F(x) = P(X ≤ x) = 
 (x − 0)/30 = x/30 0 ≤ x ≤ 30 (8.1)

 1
8.1.1 Cumulative distribution function x > 30

We will develop the way we model continuous random variables using a simple example. In writing down this cumulative distribution, we have made the (reasonable) assumption that the
Consider waiting for a bus, which runs every 30 minutes. We shall make the idealistic assumption that probability of a bus arriving increases in proportion to the interval of time waited (in the region 0–30
the buses are always exactly on time, thus a bus arrives every 30 minutes. If you are waiting for a bus, minutes). This cumulative density function is plotted in Figure 8.1.
but don’t know the timetable, then the precise length of time you need to wait is unknown. Let the Cumulative distribution functions have the following properties:
continuous random variable X denote the length of time you need to wait for a bus.
Given the above information we may assume that X never takes values above 30 or below 0. We can (i) F(−∞) = 0;
write this as:
(ii) F(∞) = 1;
P(X < 0) = 0
P(0 ≤ X ≤ 30) = 1 (iii) If a ≤ b then F(a) ≤ F(b).
P(X > 30) = 0
To obtain the probability of falling in an interval we can do the following:
∗c 2014-2020 University of Edinburgh. All rights reserved. This note is heavily based on notes inherited from Steve
Renals and Iain Murray. P(a < X ≤ b) = P(X ≤ b) − P(X ≤ a) = F(b) − F(a). (8.2)

1 2
Note 8 Informatics 2B - Learning Note 8 Informatics 2B - Learning

0.25

Probability Density Function

0.2

0.030
0.025
0.15

f(x)
0.020

0.1
p(x)

0.015

0.05
0.010
0.005

0
−8 −6 −4 −2 0 2 4 6 8
x
0.000

Figure 8.3: P(−2 < X ≤ 2) is the shaded area under the pdf.
0 10 20 30 40

x The probability that the random variable lies in interval (a, b) is the area under the pdf between a and b:

Figure 8.2: Probability density function of random variable X in the ‘bus’ example. P(a < X ≤ b) = F(b) − F(a)
Z b Z a
= p(x) dx − p(x) dx
For the ‘bus’ example: −∞ −∞
Z b
P(15 < X ≤ 21) = F(21) − F(15) = p(x) dx .
a
= 0.7 − 0.5 = 0.2
This integral is illustrated in Figure 8.3. The total area under the pdf equals 1, the probability that x
8.1.2 Probability density function takes on some value between −∞ and ∞. We can also confirm that F(∞) − F(−∞) = 1 − 0 = 1.

Although we cannot define a probability distribution function for a continuous random variable, we
can define a closely related function, called the probability density function (pdf), p(x): 8.2 The Gaussian distribution
d
p(x) = F(x) = F 0 (x) The Gaussian (or Normal) distribution is the most commonly encountered (and easily analysed)
dx
Z x continuous distribution. It is also a reasonable model for many situations (the famous ‘bell curve’).
F(x) = p(x) dx . If a (scalar) variable has a Gaussian distribution, then it has a probability density function with this
−∞
form: !
The pdf is the gradient of the cdf. Note that p(x) is not the probability that X has value x. However, the 1 −(x − µ)2
p(x | µ, σ2 ) = N(x; µ, σ2 ) = √ exp . (8.3)
pdf is proportional to the probability that X lies in a small interval centred on x. The pdf is the usual 2πσ2 2σ2
way of describing the probabilities associated with a continuous random variable X. We usually write
The Gaussian is described by two parameters:
probabilities using upper case P and probability densities using lower case p.
We can immediately write down the pdf for the ‘bus’ example: • the mean µ (location)


 0 x<0

 • the variance σ2 (dispersion)
p(x) =  1/30 0 ≤ x ≤ 30

 0 x > 30
We write p(x | µ, σ2 ) because the pdf of x depends on the parameters. Sometimes (to slim down the
Figure 8.2 shows a graph of this pdf. X is said to be uniform on the interval (0, 30). notation) we simply write p(x).

3 4
Note 8 Informatics 2B - Learning Note 8 Informatics 2B - Learning

pdf of Gaussian Distribution

0.4 pdfs of Gaussian distributions
mean=0
variance=1 1.6
0.35

1.4
0.3 µ=0.0
1.2 σ=0.25
0.25

p(x|m,s)
p(x|m,s) 1
0.2 0.8

0.6 µ=0.0
0.15 σ=0.5
µ=0.0
0.4 σ=1.0
0.1
0.2 µ=0.0
σ=2.0
0.05
0
-6 -4 -2 0 2 4 6
0
−4 −3 −2 −1 0 1 2 3 4 x
x

Figure 8.4: One dimensional Gaussian (µ = 0, σ2 = 1) Figure 8.5: Four Gaussian pdfs with zero mean and different standard deviations.

All Gaussians have the same shape, with the location controlled by the mean, and the dispersion 8.3 Parameter estimation
(horizontal scaling) controlled by the variance.1 Figure 8.4 shows a one-dimensional Gaussian with
zero mean and unit variance (µ = 0, σ2 = 1.) A Gaussian distribution has two parameters the mean (µ) and the variance(σ2 ). In machine learning or
In Equation (8.3), the mean occurs in the exponential part of the pdf, exp(−0.5(x − µ) /σ ). This term 2 2 pattern recognition we are not given the parameters, we have to estimate them from data. As in the
will have a maximum (exp(0) = 1) when x = µ; thus the peak of the Gaussian corresponds to the mean, case of Naive Bayes we can choose the parameters such that they maximise the likelihood of the model
and we can think of it as the location parameter. generating the training data. In the case of the Gaussian distribution the maximum likelihood estimate
(MLE) of the mean and the variance2 results in:
In one dimension, the variance can be thought of as controlling the width of the Gaussian pdf. Since
the area under the pdf must equal 1, this means that the wide Gaussians have lower peaks than 1X
N

narrow Gaussians. This explains why the variance occurs twice in the formula for a Gaussian. In the µ̂ = xn , (8.4)
N n=1
exponential part exp(−0.5(x − µ)2 /σ2 ), the variance parameter controls the width: for larger values of
N
σ2 √
, the value of the exponential decreases more slowly as x moves away from the mean. The term 1X
σ̂2 = (xn − µ̂)2 , (8.5)
1/ 2πσ2 is the normalisation constant, which scales the whole pdf to ensure that it integrates to 1. N n=1
This term decreases with σ2 : hence as σ2 decreases so the pdf gets a taller (but narrower) peak. The
behaviour of the pdf with respect to the variance parameter is shown in Figure 8.5. where xn denotes the feature value of n’th sample, and N is the number of samples in total.

8.3.1 Maximum likelihood parameter estimation for Gaussian distribution

The two formulas Equation (8.4) and Equation (8.5) are very popular, but it is not a good practice
that we just remember them without understanding how they are derived in the context of Gaussian
distribution.
We here consider the parameter estimation as an optimisation problem:
max p(x1 , . . . , xN | µ, σ2 ) , (8.6)
µ, σ2

where we try to find such µ and σ2 that maximise the likelihood. Note that this likelihood is a function
of µ and σ2 , and not of the data, since the data are fixed - they are given and do not change. To
2
The maximum likelihood estimate of the variance turns out to be a biased form of the sample variance that is normalised
1
To be precise, the width of a distribution scales with its standard deviation, σ, i.e. the square root of the variance. by N −1 rather than N.

5 6
Note 8 Informatics 2B - Learning Note 8 Informatics 2B - Learning

make the optimisation problem tractable, we introduce an assumption that all the training samples are
0.4
independent from each other, so that the optimisation problem is simplified to
0.35 P(x|S)
max p(x1 | µ, σ2 ) · · · p(xN | µ, σ2 ) (8.7)
µ, σ2
0.3

Applying the natural log to the likelihood and letting it denoted by LL(µ, σ2 ), 0.25

LL(µ, σ2 ) = ln p(x1 | µ, σ2 ) · · · p(xN | µ, σ2 ) (8.8)

p(x)
0.2
N
X
= ln p(x1 | µ, σ2 ) (8.9) 0.15
P(x(T)
n=1
!! 0.1
XN
1 −(xn − µ)2
= ln √ exp (8.10)
n=1 2πσ2 2σ2 0.05

N
N N X (xn − µ)2 0
0 5 10 15 20 25
= − ln(2π) − ln σ2 − (8.11) x
2 2 n=1
2σ2

As we studied in Section 5.5 in Note 5, we can find the optimal parameters of this unconstrained
Figure 8.6: Estimated Gaussian pdfs for class S (µ̂ = 10, σ̂2 = 1) and class class T (µ̂ = 12, σ̂2 = 4)
optimisation problem by solving the following system of equations:

∂LL(µ, σ2 ) The process of estimating the parameters from the training data is sometimes referred to as fitting the
=0 (8.12)
∂µ distribution to the data.
∂LL(µ, σ2 ) Figure 8.6 shows the pdfs for each class. The pdf for class T is twice the width of that for class S : the
=0 (8.13)
∂σ2 width of a distribution scales with its standard deviation, not its variance.
You can easily confirm that Equation (8.4) and Equation (8.5) are the solutions.
8.5 The multivariate Gaussian distribution and covariance
8.4 Example
The univariate (one-dimensional) Gaussian may be extended to the multivariate (multi-dimensional)
case. The D-dimensional Gaussian is parameterised by a mean vector, µ = (µ1 , . . . , µD )T , and a
A pattern recognition problem has two classes, S and T . Some observations are available for each
covariance matrix3 , Σ = (σi j ), and has a probability density
class:
!
1 1 T −1
Class S : 10 8 10 10 11 11 p(x | µ, Σ) = exp − (x − µ) Σ (x − µ) . (8.14)
(2π)D/2 |Σ|1/2 2
Class T : 12 9 15 10 13 13
The 1-dimensional Gaussian is a special case of this pdf. The covariance matrix gives the variance
of each variable (dimension) along the leading diagonal, and the off-diagonal elements measure the
We assume that each class may be modelled by a Gaussian. Using the above data, estimate the
correlations between the variables. The argument to the exponential 12 (x − µ)T Σ−1 (x − µ) is an example
parameters of the Gaussian pdf for each class, and sketch the pdf for each class.
of a quadratic form. |Σ| is the determinant of the covariance matrix Σ.
The mean and variance of each pdf are estimated with MLE shown in Equation (8.4) and Equation (8.5).
The mean vector µ is the expectation of x:
µ = E[x] .
(10 + 8 + 10 + 10 + 11 + 11)
µ̂S = = 10
6 The covariance matrix Σ is the expectation of the deviation of x from the mean:
2 2
(10 − 10) + (8 − 10) + (10 − 10)2 + (10 − 10)2 + (11 − 10)2 + (11 − 10)2
σ̂2S = =1 Σ = E[(x − µ)(x − µ)T ] . (8.15)
6
(12 + 9 + 15 + 10 + 13 + 13) From Equation (8.15) it follows that Σ = (σi j ) is a D×D symmetric matrix; that is Σ = Σ : T
µ̂T = = 12
6
(12 − 12) + (9 − 12) + (15 − 12)2 + (10 − 12)2 + (13 − 12)2 + (13 − 12)2
2 2 σi j = E[(xi − µi )(x j − µ j )] = E[(x j − µ j )(xi − µi )] = σ ji .
σ̂2T = =4
6 3
Σ is a D-by-D square matrix, and σi j or Σi j denotes its element at i’th row and j’th column.

7 8
Note 8 Informatics 2B - Learning Note 8 Informatics 2B - Learning

Note that σi j denotes not the standard deviation but the covariance between i’th and j’th elements of x. (Remember the covariance matrix is symmetric so a12 = a21 .) To avoid clutter, assume that µ = (0, 0)T ,
For example, in the 1-dimensional case, σ11 = σ2 . then the quadratic form is:
It is helpful to consider how the covariance matrix may be interpreted. The sign of the covariance, σi j , a11 a12 ! x1 !
helps to determine the relationship between two components: xT Σ−1 x = x1 x2
a12 a22 x2
a11 x1 + a12 x2 !
• If x j is large when xi is large, then (x j − µ j )(xi − µi ) will tend to be positive;4 = x1 x2
a12 x1 + a22 x2
• If x j is small when xi is large, then (x j − µ j )(xi − µi ) will tend to be negative. = a11 x12 + 2a12 x1 x2 + a22 x22 .

Thus we see that the argument to the exponential expands as a quadratic of D variables (D = 2 in this
If variables are highly correlated (large covariance) then this may indicate that one does not give case).7
much extra information if the other is known. If two components of the input vector, xi and x j , are
In the case of a diagonal covariance matrix:
statistically independent then the covariance between them is zero, σi j = 0.
!−1 1 ! !
σ11 0 0 a11 0
Σ−1 = = σ11
1 = ,
Correlation coefficient The values of the elements of the covariance matrix depend on the unit 0 σ22 0 σ22
0 a22
of measurement: consider the case when x is measured in metres, compared when x is measured in
and the quadratic form has no cross terms:
millimetres. It is useful to define a measure of dispersion that is independent of the unit of measurement.
To do this we may define the correlation coefficient5 between features xi and x j , ρ(xi , x j ): xT Σ−1 x = a11 x12 + a22 x22 .
σi j
ρ(xi , x j ) = ρi j = √ . (8.16) In the multidimensional case the normalisation term in front of the exponential is (2π)d/21 |Σ|1/2 . Recall
σii σ j j
that the determinant of a matrix can be regarded as a measure of its size. And the dependence on the
The correlation coefficient ρi j is obtained by normalising the covariance σi j by the square root of the dimension reflects the fact that the volume increases with dimension.
product of the variances σii and σ j j , and satisfies −1 ≤ ρi j ≤ 1: Consider a two-dimensional Gaussian with the following mean vector and covariance matrix:
! !
ρ(x, y) = +1 if y = ax + b a > 0 0 1 0
µ= Σ=
0 0 1
ρ(x, y) = −1 if y = ax + b a < 0 .
We refer to this as a spherical Gaussian since the probability distribution has spherical (circular)
symmetry. The covariance matrix is diagonal (so the off-diagonal correlations are 0), and the variances
The correlation coefficient is both scale-invariant and location(or shift)-invariant, i.e.: are equal (1). This pdf is illustrated in the plots of this pdf in Figure 8.7a.
Now consider a two-dimensional Gaussian with the following mean vector and covariance matrix 8 :
ρ(xi , x j ) = ρ(axi + b, cx j + d) . (8.17)
! !
0 1 0
where a > 0, c > 0, and c and d are arbitrary constants. µ= Σ=
0 0 4

In this case the covariance matrix is again diagonal, but the variances are not equal. Thus the resulting
8.6 The 2-dimensional Gaussian distribution pdf has an elliptical shape, illustrated in Figure 8.7b.
Now consider the following two-dimensional Gaussian:
Let’s look at a two dimensional case, with the following inverse covariance matrix6 : ! !
0 1 −1
µ= Σ=
!−1 ! ! 0 −1 4
σ11 σ12 1 σ22 −σ12 a11 a12
Σ−1 = = = .
σ21 σ22 σ11 σ22 − σ12 σ21 −σ21 σ11 a12 a22 In this case we have a full covariance matrix (off-diagonal terms are non-zero). The resultant pdf is
shown in Figure 8.7c.
4
Note that xi in this section denotes the i’th element of a vector x (which is a vector of D random variables) rather than
7
the i’th sample in a data set {x1 , . . . , xN }. Any covariance matrix is positive semi-definite, meaning xT Σ x ≥ 0 for any real-valued vector x. The inverse of
5
This is normally referred as ‘Pearson’s correlation coefficient’, whose another version for sampled data was discussed covariance matrix, if it exists, is also positive semi-definite, i.e., xT Σ−1 x ≥ 0.
8
in Note 2. Like the covariance shown here, a covariance matrix whose off-diagonal elements are all zeros is called ‘diagonal
6
The inverse covariance matrix is sometimes called the precision matrix. covariance matrix’, as opposed to ‘full covariance matrix’ that has non-zero off-diagonal elements.

9 10
Note 8 Informatics 2B - Learning Note 8 Informatics 2B - Learning

8.7 Parameter estimation

Surface plot of p(x 1 , x 2 ) Contour plot of p(x1 , x 2 )

4 It is possible to show that the mean vector and covariance matrix that maximise the likelihood of the
0.16
3
Gaussian generating the training data are given by: 9
N
1X
0.14
2

xn (8.18)
0.12
µ̂ =
N n=1
p(x 1 ,x 2 )

0.1 1

0.08

x2
0
N
1X
0.06

(xn − µ̂) (xn − µ̂)T . (8.19)

0.04
-1
Σ̂ =
0.02

0 -2
N n=1
4

Alternatively, in a scalar representation:

2 4 -3

0 2
0
-2 -4
x2 -2
N
1X
-4 -4 x1
-4 -3 -2 -1 0 1 2 3 4

xni , for i = 1, . . . , D (8.20)

x1
µ̂i =
N n=1
(a) Spherical Gaussian (diagonal covariance, equal variances) N
1X
σ̂i j = (xni − µ̂i ) (xn j − µ̂ j ) for i, j = 1, . . . , D . (8.21)
Surface plot of p(x 1 , x 2 ) Contour plot of p(x1 , x 2 )
N n=1
4

3
As an example consider the data points displayed in Figure 8.8a. To fit a Gaussian to these samples we
compute the mean and variance with MLE. The resulting Gaussian is superimposed as a contour map
0.16

0.14
2
0.12 on the training data in Figure 8.8b.
p(x 1 ,x 2 )

0.1 1

0.08
x2

0
0.06

0.04
-1 8.8 Bayes’ theorem and Gaussians
0.02

0 -2

4
2 4 -3
Many of the rules for combining probabilities that were outlined at the start of the course, are similar
0
0
2

-4
for probability density functions. For example, if x and y are continuous random variables, with
-2
probability density functions (pdfs) p(x), and p(y):
x2 -2
-4 -4 x1
-4 -3 -2 -1 0 1 2 3 4
x1

p(x, y) = p(x | y) p(y) (8.22)

Z
(b) Gaussian with diagonal covariance matrix
p(x) = p(x, y) dy , (8.23)
Surface plot of p(x 1 , x 2 ) Contour plot of p(x1 , x 2 )

4
where p(x | y) is the pdf of x given y.
0.16
3 Indeed we may mix probabilities of discrete variables and probability densities of continuous variables,
0.14
2
for example if x is continuous and z is discrete:
0.12

p(x, z) = p(x | z) P(z) . (8.24)

p(x 1 ,x 2 )

0.1 1

0.08
x2

0
0.06
Proving that this is so requires a branch of mathematics called measure theory.
0.04
-1
0.02
-2
We can thus write Bayes’ theorem for continuous data x and discrete class k as:
0
4
p(x |Ck ) P(Ck )
2 4 -3
P(Ck | x) =
p(x)
0 2
0
-2 -4
x2 -2
-4 -4 x1

p(x |Ck ) P(Ck )

-4 -3 -2 -1 0 1 2 3 4
x1

= PK (8.25)
`=1 p(x |C ` ) P(C ` )
(c) Gaussian with full covariance matrix
P(Ck | x) ∝ p(x |Ck ) P(Ck ) (8.26)
Figure 8.7: Surface and contour plots of 2–dimensional Gaussian with different covariance structures
9
Again the estimated covariance matrix with MLE is a biased estimator, rather than the unbiased estimator that is
normalised by N −1.

11 12
Note 8 Informatics 2B - Learning Note 8 Informatics 2B - Learning

The posterior probability of the class given the data is proportional to the probability density of the
data times the prior probability of the class.
10
We can thus use Bayes’ theorem for pattern recognition with continuous random variables.
If the pdf of continuous random variable x given class k is represented as a Gaussian with mean µk and
variance σ2k , then we can write: 10

P(Ck | x) ∝ p(x |Ck ) P(Ck )

5
∝ N(x ; µk , σ2k ) P(Ck )
!
1 −(x − µk )2
∝ q exp P(Ck ) . (8.27)
2πσ2k 2 σ2k
X2

We call p(x |Ck ) the likelihood of class k (given the observation x).
0

8.9 Appendix: Plotting Gaussians with Matlab

plotgauss1D is a function to plot a one-dimensional Gaussian with mean mu and variance sigma2:
−5
−4 −2 0 2 4 6 8 10 function plotgauss1D(mu, sigma2)
X1
% plot 1 dimension Gaussian with mean mu and variance sigma2
sd = sqrt(sigma2); % std deviation
x = mu-3*sd:0.02:mu+3*sd; % location of points at which x is calculated
(a) Training data g = 1/(sqrt(2*pi)*sd)*exp(-0.5*(x-mu).ˆ2/sigma2);
plot(x,g);
10

Recall that the standard deviation (SD) is the square root of the variance. It is a fact that about 0.68 of
the probability mass of a Gaussian is within 1 SD (either side) of the mean, about 0.95 is within 2 SDs
of the mean, and over 0.99 is within 3 SDs of the mean. Thus plotting a Gaussian for x ranging from
µ − 3σ to µ + 3σ captures over 99% of the probability mass, and we take these as the ranges for the
5 plot.
The following Matlab function plots two-dimensional Gaussians as a surface or a contour plot (and
was used for the plots in the previous section). We could easily write it to take a (2-dimensional) mean
X2

vector and 2x2 covariance matrix, but it can be convenient to write the covariance matrix in terms of
variances σ j j and correlation coefficient, ρ jk . Recall that:
0 σ jk
ρ jk = √ . (8.28)
σ j j σkk

Then we may write:

√ √
σ jk = ρ jk σ j j σkk (8.29)
−5
√
−4 −2 0 2 4 6 8 10
where σ j j is the standard deviation of dimension j. The following code does the job:
X1

function plotgauss2D(xmu, ymu, xvar, yvar, rho)

(b) Estimated Gaussian
% make a contour plot and a surface plot of a 2D Gaussian
Figure 8.8: Fitting a Gaussian to a set of two-dimensional data samples
% xmu, ymu - mean of x, y
10
The suffix k in µk and σk denotes the class number rather than the k’ th element of vector.

13 14
Note 8 Informatics 2B - Learning Note 8 Informatics 2B - Learning

% xvar, yvar - variance of x, y error(’Covariance matrix should be square’);

% rho correlation coefficient between x and y end

xsd = sqrt(xvar); % std deviation on x axis % force MU and X into column vectors
ysd = sqrt(yvar); % std deviation on y axis mu = reshape(mu, d, 1);
x = reshape(x, d, 1);
if (abs(rho) >= 1.0)
disp(’error: rho must lie between -1 and 1’); % subtract the mean from the data point
return x = x-mu;
end
covxy = rho*xsd*ysd; % calculation of the covariance invcovar = inv(covar);

C = [xvar covxy; covxy yvar]; % the covariance matrix y = 1/sqrt((2*pi)ˆd*det(covar)) * exp (-0.5*x’*invcovar*x);
A = inv(C); % the inverse covariance matrix
However, for efficiency it is usually better to estimate the Gaussian pdfs for a set of data points together.
% plot between +-2SDs along each dimension The following function, from the Netlab toolbox, takes an n×d matrix x, where each row corresponds
maxsd = max(xsd,ysd); to a data point.
x = xmu-2*maxsd:0.1:xmu+2*maxsd; % location of points at which x is calculated
y = ymu-2*maxsd:0.1:ymu+2*maxsd; % location of points at which y is calculated function y = gauss(mu, covar, x)
% Y = GAUSS(MU, COVAR, X) evaluates a multi-variate Gaussian density
[X, Y] = meshgrid(x,y); % matrices used for plotting % in D-dimensions at a set of points given by the rows of the matrix X.
% The Gaussian density has mean vector MU and covariance matrix COVAR.
% Compute value of Gaussian pdf at each point in the grid %
z = 1/(2*pi*sqrt(det(C))) * % Copyright (c) Ian T Nabney (1996-2001)
exp(-0.5 * (A(1,1)*(X-xmu).ˆ2 + 2*A(1,2)*(X-xmu).*(Y-ymu) + A(2,2)*(Y-ymu).ˆ2));
[n, d] = size(x);
surf(x,y,z); [j, k] = size(covar);
figure;
contour(x,y,z); % Check that the covariance matrix is the correct dimension
if ((j ˜= d) | (k ˜=d))
The above code computes the vectors x and y over which the function will be plotted. meshgrid takes error(’Dimension of the covariance matrix and data should match’);
these vectors and forms the set of all pairs ([X, Y]) over which the pdf is to be estimated. surf plots end
the function as a surface, and contour plots it as a contour map, or plan. You can use the Matlab help
to find out more about plotting surfaces. invcov = inv(covar);
In the equation for the Gaussian pdf in plotgauss2D, because we are evaluating over points in a grid, mu = reshape(mu, 1, d); % Ensure that mu is a row vector
we write out the quadratic form fully. More generally, if we want to evaluate a D-dimensional Gaussian
pdf for a data point x, we can use a Matlab function like the following: x = x - ones(n, 1)*mu; % Replicate mu and subtract from each data point
fact = sum(((x*invcov).*x), 2);
function y=evalgauss1(mu, covar, x)
% EVALGAUSS1 - evauate a Gaussian pdf y = exp(-0.5*fact);

% y=EVALGAUSS1(MU, COVAR, X) evaluates a multivariate Gaussian with y = y./sqrt((2pi)ˆddet(covar));

% mean MU and covariance COVAR for a data point X
Check that you understand how this function works. Note that sum(a,2) sums along rows of matrix a
to return a column vector of the row sums. (sum(a,1) sums down columns to return a row vector.)
[d b] = size(covar);

% Check that the covariance matrix is square

if (d ˜= b)

15 16
Note 8 Informatics 2B - Learning

Exercises

1. Draw a one-dimensional Gaussian distribution by hand as accurate as possible when µ =

3.0, σ2 = 1.0. (You may use a calculator)

2. Using a calculator, find the height (i.e. maximum value) of a one-dimensional Gaussian distribu-
tion for σ2 = 10, 1.0, 0.1, 0.01, 0.001. What the height will be, when σ2 → 0?

3. By solving the system of equations (8.12) and (8.13), confirm that the MLE for a a Gaussian
distribution is given as (8.4) and (8.5).

4. Confirm that the correlation coefficient defined in Equation (8.16) is the same as the Pearson’s
correlation coefficient in Note 2.

5. Prove that the correlation coefficient is scale-invariant and location-invariant as is shown in

Equation (8.17).

6. Consider a 2-dimensional Gaussian distribution

! with a mean vector µ = (µ1 , µ2 )T and a diagonal
σ11 0
covariance matrix, i.e., Σ = , show that its pdf can be simplified to the product of
0 σ22
two pdfs, each of which corresponds to a one-dimensional Gaussian distribution.

p(x | µ, Σ) = p(x1 | µ1 , σ11 ) p(x2 | µ2 , σ22 )

7. For each of the Gaussian distributions shown in Figure 8.7, which type of correlation do x1 and
x2 have, (i) a positive correlation, (ii) a negative correlation, or (iii) no correlation?

Gaussian Distribution Overview
No ratings yet
Gaussian Distribution Overview
72 pages
Understanding Integrals and Probability
No ratings yet
Understanding Integrals and Probability
51 pages
Joint CDF Properties Overview
No ratings yet
Joint CDF Properties Overview
74 pages
Continuous Probability Distributions
No ratings yet
Continuous Probability Distributions
53 pages
Understanding Parametric Distributions
No ratings yet
Understanding Parametric Distributions
27 pages
Understanding Random Variables in Probability
No ratings yet
Understanding Random Variables in Probability
64 pages
Probability Spaces and Random Variables
No ratings yet
Probability Spaces and Random Variables
21 pages
Understanding Random Variables and Distributions
No ratings yet
Understanding Random Variables and Distributions
40 pages
Understanding Random Variables in Engineering
No ratings yet
Understanding Random Variables in Engineering
30 pages
Digital Communication Concepts Explained
No ratings yet
Digital Communication Concepts Explained
11 pages
Digital Communication: Random Variables & Distributions
No ratings yet
Digital Communication: Random Variables & Distributions
109 pages
Understanding Random Variables and Distributions
No ratings yet
Understanding Random Variables and Distributions
29 pages
Random Process Distributions Overview
No ratings yet
Random Process Distributions Overview
8 pages
Types of Probability Distributions
No ratings yet
Types of Probability Distributions
10 pages
Continuous Probability Distributions Explained
No ratings yet
Continuous Probability Distributions Explained
59 pages
Class 10 Probability Concepts Explained
No ratings yet
Class 10 Probability Concepts Explained
17 pages
Understanding Continuous Distributions
No ratings yet
Understanding Continuous Distributions
39 pages
Continuous Random Variables in Math403
No ratings yet
Continuous Random Variables in Math403
42 pages
Understanding Discrete Random Variables
No ratings yet
Understanding Discrete Random Variables
5 pages
Random Variables in Electrical Engineering
No ratings yet
Random Variables in Electrical Engineering
43 pages
Understanding Continuous Random Variables
No ratings yet
Understanding Continuous Random Variables
4 pages
Continuous Random Variables in Electronics
No ratings yet
Continuous Random Variables in Electronics
161 pages
Marginal Distributions of Errors
No ratings yet
Marginal Distributions of Errors
87 pages
Continuous Random Variables in Statistics
No ratings yet
Continuous Random Variables in Statistics
10 pages
Special Continuous Random Variables
No ratings yet
Special Continuous Random Variables
37 pages
Valid Probability Mass Functions
No ratings yet
Valid Probability Mass Functions
30 pages
Understanding Continuous Random Variables
No ratings yet
Understanding Continuous Random Variables
66 pages
Principles of Probability Overview
No ratings yet
Principles of Probability Overview
13 pages
Continuous Random Variables & Distributions
No ratings yet
Continuous Random Variables & Distributions
15 pages
Random Variables in Electrical Engineering
No ratings yet
Random Variables in Electrical Engineering
63 pages
Continuous Random Variables Overview
No ratings yet
Continuous Random Variables Overview
28 pages
Understanding Continuous Random Variables
No ratings yet
Understanding Continuous Random Variables
28 pages
Random Variables and Probability Distributions
No ratings yet
Random Variables and Probability Distributions
44 pages
Probability and Statistics: Random Variables
No ratings yet
Probability and Statistics: Random Variables
24 pages
Gaussian Noise Detection and Estimation
No ratings yet
Gaussian Noise Detection and Estimation
55 pages
Characteristics of Continuous Distributions
No ratings yet
Characteristics of Continuous Distributions
46 pages
UNITIIProbability DFTheory by DR NVNagendram
No ratings yet
UNITIIProbability DFTheory by DR NVNagendram
86 pages
Continuous Random Variables Overview
No ratings yet
Continuous Random Variables Overview
7 pages
Continuous Random Variables Explained
No ratings yet
Continuous Random Variables Explained
29 pages
Random Variables and Distributions Overview
No ratings yet
Random Variables and Distributions Overview
51 pages
Understanding Random Variables and Distributions
No ratings yet
Understanding Random Variables and Distributions
40 pages
Understanding Random Variables in Probability
No ratings yet
Understanding Random Variables in Probability
69 pages
Continuous Random Variables Overview
No ratings yet
Continuous Random Variables Overview
16 pages
Properties of Continuous Distributions
No ratings yet
Properties of Continuous Distributions
9 pages
Special Probability Distributions Overview
No ratings yet
Special Probability Distributions Overview
42 pages
Understanding Random Variables and Distributions
No ratings yet
Understanding Random Variables and Distributions
37 pages
Understanding Probability Density Functions
No ratings yet
Understanding Probability Density Functions
42 pages
Continuous Random Variables in 18.440
No ratings yet
Continuous Random Variables in 18.440
44 pages
Special Probability Distributions Explained
No ratings yet
Special Probability Distributions Explained
8 pages
Understanding Random Variables and Distributions
No ratings yet
Understanding Random Variables and Distributions
156 pages
Continuous Probability Distributions Overview
No ratings yet
Continuous Probability Distributions Overview
9 pages
Overview of Probability Concepts
No ratings yet
Overview of Probability Concepts
6 pages
Probability Distribution Overview Guide
No ratings yet
Probability Distribution Overview Guide
8 pages
Understanding Random Variables and Distributions
No ratings yet
Understanding Random Variables and Distributions
42 pages
Probability Distributions Explained
No ratings yet
Probability Distributions Explained
280 pages
Continuous Probability Distributions Explained
No ratings yet
Continuous Probability Distributions Explained
17 pages
Bounds and Inequalities in Communications
No ratings yet
Bounds and Inequalities in Communications
76 pages
Understanding Random Variables and CDF
No ratings yet
Understanding Random Variables and CDF
8 pages
Directional Valve Project Specification
No ratings yet
Directional Valve Project Specification
1 page
Rainfall Comparison Quiz Problem
No ratings yet
Rainfall Comparison Quiz Problem
1 page
Titanic Survival Data Analysis
No ratings yet
Titanic Survival Data Analysis
1 page
CO2 Cylinder Assembly Specifications
No ratings yet
CO2 Cylinder Assembly Specifications
2 pages
HPCL Medical Claim Submission Guide
No ratings yet
HPCL Medical Claim Submission Guide
1 page
Median and IQR Data Analysis
No ratings yet
Median and IQR Data Analysis
1 page
Understanding Untitled Notebooks
No ratings yet
Understanding Untitled Notebooks
1 page
2 Phase Flow Orifice
No ratings yet
2 Phase Flow Orifice
14 pages
Discharge Nozzle Data Sheet
No ratings yet
Discharge Nozzle Data Sheet
1 page
Inline Static Mixer Design Guidelines
No ratings yet
Inline Static Mixer Design Guidelines
11 pages
Document 73
No ratings yet
Document 73
38 pages
LC and LG Arrangement Guidelines
No ratings yet
LC and LG Arrangement Guidelines
13 pages
Air Foam Chamber Data Sheet
No ratings yet
Air Foam Chamber Data Sheet
1 page
UOP PSA Users' Conference 2019 Insights
No ratings yet
UOP PSA Users' Conference 2019 Insights
2 pages
Cylinder Rack & Piping Manifold Specs
100% (1)
Cylinder Rack & Piping Manifold Specs
1 page
Beacon - Corrosion Under Insulation
No ratings yet
Beacon - Corrosion Under Insulation
1 page
Tank Mixing Design Guidelines
No ratings yet
Tank Mixing Design Guidelines
9 pages
Hydrotesting Incident Analysis Report
No ratings yet
Hydrotesting Incident Analysis Report
1 page
201906beaconenglish-Corrosion Under Insulation
No ratings yet
201906beaconenglish-Corrosion Under Insulation
1 page
Overview of Plan 53B Seal System
No ratings yet
Overview of Plan 53B Seal System
2 pages
OISD Guidelines for Health Monitoring
No ratings yet
OISD Guidelines for Health Monitoring
50 pages
OISD Safety Compliance Checklist
100% (2)
OISD Safety Compliance Checklist
5 pages
Refinery Energy Performance Metrics
100% (6)
Refinery Energy Performance Metrics
2 pages
Refinery Energy Performance Metrics
No ratings yet
Refinery Energy Performance Metrics
2 pages
July 2015 Variable Cost Analysis
No ratings yet
July 2015 Variable Cost Analysis
11 pages
Centrifugal Compressor Surge Control Methods
100% (2)
Centrifugal Compressor Surge Control Methods
1 page
Steam and Fuel Consumption Analysis
No ratings yet
Steam and Fuel Consumption Analysis
16 pages
Katakam Sandeep Reddy's Resume
No ratings yet
Katakam Sandeep Reddy's Resume
2 pages
EM-Based Estimation for Hyperbolic Distributions
No ratings yet
EM-Based Estimation for Hyperbolic Distributions
11 pages
IoT Smart Energy Management System
100% (1)
IoT Smart Energy Management System
11 pages
Computer Science Sample Paper 2024-25
100% (1)
Computer Science Sample Paper 2024-25
7 pages
Understanding Industrial Design
No ratings yet
Understanding Industrial Design
17 pages
IIM Shillong Research Associate Opening
No ratings yet
IIM Shillong Research Associate Opening
2 pages
Blockchain's Role in Inclusive Trade Finance
No ratings yet
Blockchain's Role in Inclusive Trade Finance
14 pages
Cisco Switch Setup: Step-by-Step Guide
No ratings yet
Cisco Switch Setup: Step-by-Step Guide
2 pages
P&ID Overview and Symbol Standards
No ratings yet
P&ID Overview and Symbol Standards
6 pages
Training for Effective Business Tool Use
No ratings yet
Training for Effective Business Tool Use
6 pages
This User's Manual Is Intended For Both PNC-1850 and PNC-1200
No ratings yet
This User's Manual Is Intended For Both PNC-1850 and PNC-1200
53 pages
Back Jump Mechanics Tutorial FF
No ratings yet
Back Jump Mechanics Tutorial FF
6 pages
Revolut Integration Specialist Challenge
No ratings yet
Revolut Integration Specialist Challenge
3 pages
Multimedia Overview and Applications
No ratings yet
Multimedia Overview and Applications
25 pages
CimatronE SDK User Guide 10.0
No ratings yet
CimatronE SDK User Guide 10.0
1,278 pages
Amharic OCR Mobile Application Project
No ratings yet
Amharic OCR Mobile Application Project
80 pages
Knowledge Representation in AI
No ratings yet
Knowledge Representation in AI
41 pages
SolidCAM Post-Processor Debugger Guide
No ratings yet
SolidCAM Post-Processor Debugger Guide
14 pages
Open-World Panoptic Segmentation with Con2MAV
No ratings yet
Open-World Panoptic Segmentation with Con2MAV
22 pages
Mobile App for Village Grievance Redressal
No ratings yet
Mobile App for Village Grievance Redressal
37 pages
Seatbelt Detection with YOLO & DenseNet
No ratings yet
Seatbelt Detection with YOLO & DenseNet
18 pages
A.M. Mobile Telecom Locations in Kolkata
No ratings yet
A.M. Mobile Telecom Locations in Kolkata
5 pages
Huawei S5732-H Series Switch Datasheet
No ratings yet
Huawei S5732-H Series Switch Datasheet
34 pages
CSC264 Group Project Final Report
No ratings yet
CSC264 Group Project Final Report
103 pages
Essential Concepts of .NET Framework
No ratings yet
Essential Concepts of .NET Framework
20 pages
Sofimshc 1
No ratings yet
Sofimshc 1
109 pages
Online Library Management System Report
No ratings yet
Online Library Management System Report
19 pages
Importing T24 Data to FCM DB
No ratings yet
Importing T24 Data to FCM DB
5 pages
Java Lab: Object-Oriented Programming
No ratings yet
Java Lab: Object-Oriented Programming
39 pages
Ratnadeep Roy: Software Engineer Profile
No ratings yet
Ratnadeep Roy: Software Engineer Profile
1 page
Manufacturing UL 508A Industrial Control Panels 60497065
No ratings yet
Manufacturing UL 508A Industrial Control Panels 60497065
4 pages

Understanding Gaussian Distributions

Uploaded by

Understanding Gaussian Distributions

Uploaded by

Note 8 Informatics 2B - Learning Note 8 Informatics 2B - Learning

Cumulative Distribution Function

Probability Density Function

pdf of Gaussian Distribution

8.3.1 Maximum likelihood parameter estimation for Gaussian distribution

8.7 Parameter estimation

(xn − µ̂) (xn − µ̂)T . (8.19)

Alternatively, in a scalar representation:

xni , for i = 1, . . . , D (8.20)

p(x, y) = p(x | y) p(y) (8.22)

p(x, z) = p(x | z) P(z) . (8.24)

p(x |Ck ) P(Ck )

P(Ck | x) ∝ p(x |Ck ) P(Ck )

8.9 Appendix: Plotting Gaussians with Matlab

Then we may write:

function plotgauss2D(xmu, ymu, xvar, yvar, rho)

% xvar, yvar - variance of x, y error(’Covariance matrix should be square’);

% y=EVALGAUSS1(MU, COVAR, X) evaluates a multivariate Gaussian with y = y./sqrt((2*pi)ˆd*det(covar));

% Check that the covariance matrix is square

1. Draw a one-dimensional Gaussian distribution by hand as accurate as possible when µ =

5. Prove that the correlation coefficient is scale-invariant and location-invariant as is shown in

6. Consider a 2-dimensional Gaussian distribution

p(x | µ, Σ) = p(x1 | µ1 , σ11 ) p(x2 | µ2 , σ22 )

You might also like

% y=EVALGAUSS1(MU, COVAR, X) evaluates a multivariate Gaussian with y = y./sqrt((2pi)ˆddet(covar));