Transformation of random variables
Stephan Schmidt
February 27, 2024
1 Transformations of random variables
The purpose of this document is to derive the probability density functions
of transformed random variables, with a specific emphasis on linearly trans-
formed random variables. More theory on this can be found in textbooks
such as Papoulis and Unnikrishna Pillai (2002). This is a draft document
and therefore please bring any mistakes to my attention.
Transformations are often used in models. For example, the linear regres-
sion model
y = w0 + w1 · x + ϵ (1)
transforms a known x to y for a specific set of parameters w0 and w1 , with
ϵ ∼ p(ϵ). Also, there are data scaling methods such as standardisation
y−µ
z= (2)
σ
and normalisation
y − ymin
z= (3)
ymax − ymin
that can written as a linear transformation z = a · y + b. For the multivariate
case, we often encounter transformations of this form:
y = Ax + b (4)
For example, we can choose A and b so that y ∼ N (y; 0, I) for x ∼
N (y; µ, Σ).
Reference:
• Papoulis, A. and Unnikrishna Pillai, S., 2002. Probability, random
variables and stochastic processes. Fourth edition. McGraw-Hill.
1
1.1 Change of variables
If y = ψ(x), we can use a change of variables to obtain
dx
py (y) = px (ψ −1 (y)) · (5)
dy
if ψ(x) is monotonic and x = ψ −1 (y) where px (x) is the probability density
function of the random variable x and py (y) is the probability density function
over the random variable y.
For multivariate problems, y = f (x) and if x = f −1 (y) exists, then
dx
p(y) = p(x) det (6)
dy
−1 dx
p(y) = p(f (y)) det (7)
dy
1.2 Linear transformations of random variables
If y = a · x + b, we can use a change of variables to obtain
y−b 1
py (y) = px · (8)
a a
If y = Ax + b and if A is invertible, we can write x = A−1 (y − b) and
therefore
−1 dx
px (y) = px (f (y)) det (9)
dy
can be simplified as follows:
py (y) = px A−1 (y − b) det A−1
(10)
1.3 Linear transformations of Gaussian variables
We can show that if
y =a·x+b (11)
where a and b are deterministic and x ∼ N (µx , σx ) that
p(y) = N y|a · µx + b, a2 σx2
(12)
2
Derivation: The Gaussian probability density function over x is given by
2 1 1 2
N (x|µx , σx ) = p exp − 2 (x − µx ) (13)
2πσx2 2σx
The distribution of y is as follows:
y−b 1
py (y) = px · (14)
a a
( 2 )
1 1 y−b 1
=p exp − 2 − µx · (15)
2πσx2 2σx a a
1 1 2
=p exp − 2 2 (y − (a · µx + b)) (16)
2πa2 σx2 2a σx
1 1 2
=p exp − 2 (y − µy ) (17)
2πσy2 2σy
= N (y|µy , σy2 ) (18)
= N (y|a · µx + b, a2 σx2 ) (19)
Figure 1 shows an example of transforming a Gaussian variable with a
linear operation y = a · x + b. If we draw samples from xsample ∼ N (µx , σx2 )
and substitute it in the equation ysample = a · xsample + b, we can obtain
an estimate of the probability density function with the histogram of the
samples ysample without knowing the distribution of y. Since x is Gaussian
and we are performing a linear operation, the analytical probability density
function of y is available and given by p(y) = N (y|a · µx + b, a2 σx2 ). In the
right plot of Figure 1, the sampling approach and the analytical probability
density function are compared.
Let’s consider the multivariate case, where x is a multivariate Gaussian
variable. We can show that if
y = Ax + b (20)
with x ∼ N (µx , Σx ), then
p(y) = N y|Aµx + b, AΣx AT
(21)
The derivation of this equation is given on page 5.
We can use the expected values to calculate the mean and covariance of
y for any distribution. Since y is Gaussian distributed,
p(y) = N y|µy , Σy (22)
3
0.8 0.200
0.7 0.175
0.6 0.150
0.5 0.125
Samples: xsamples (x|3, 0.52) Samples: xsamples (x|3, 0.52), then y = 4 xsamples + 2
PDF
PDF
0.4 Analytical: (x|3, 0.52) 0.100 Analytical: (y|14, 2.02)
0.3 0.075
0.2 0.050
0.1 0.025
0.0 0.000
1 2 3 4 5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5
x y
Figure 1: The left plot shows the probability density function of x ∼
N (x|3, 0.52 ) and the right plot shows the probability density function of
y = 4 · x + 2.
the expected value of y can be used to calculate its mean, and the covariance
of y can be used to calculate its covariance matrix, i.e.
p(y) = N y|Ex∼N (µx ,Σx ) {y} , covx∼N (µx ,Σx ) {y} (23)
where µy = Ex∼N (µx ,Σx ) {y} and
n T o
covx∼N (µx ,Σx ) {y} = Ex∼N (µx ,Σx ) y − µy y − µy (24)
Note, other distributions are not necessarily parametrised by their expected
values.
Exercise questions
• Derive the distribution of y if x is Laplacian distributed and y = a·x+b.
• Use the expected value formulas to derive Equation (21).
• How should A and b be selected so that y ∼ N (y; 0, I) for x ∼
N (x; µ, Σ) with y = Ax + b?
4
Derivation: The probability density function of x is Gaussian and given by
−D/2 −1/2 1 T −1
px (x) = (2π) det(Σx ) exp − (x − µx ) Σx (x − µx ) (25)
2
and the probability density function of y is given by
py (y) = px A−1 (y − b) det A−1
(26)
if y = Ax + b and A is invertible. The probability density function over y
is given by
1 −1 T −1 −1
py (y) = κ · exp − (A (y − b) − µx ) Σx (A (y − b) − µx ) (27)
2
1 T −T −1 −1
= κ · exp − (y − (Aµx + b)) A Σx A (y − (Aµx + b))
2
(28)
1 −1
= κ · exp − (y − (Aµx + b))T AΣx AT (y − (Aµx + b))
2
(29)
where κ = (2π)−D/2 det(Σx )−1/2 det A−1 and A−T Σ−1 −1
x A =
T −1
AΣx A . The exponential term is in the form of a Gaussian distri-
bution with a mean of µy = Aµx + b and a covariance of Σy = AΣx AT
and therefore the normalisation constant can be determined with the form
of multivariate Gaussian probability density functions:
−D/2 −1/2 1 T −1
py (x) = (2π) det(Σy ) exp − (y − µy ) Σy (y − µy ) (30)
2
1 T −1
T
exp − 2 (y − Aµx − b) AΣx A (y − Aµx − b)
= (31)
(2π)D/2 det(AΣx AT )1/2
p(y) = N y|Aµx + b, AΣx AT
(32)