0% found this document useful (0 votes)
65 views4 pages

Data Science Math: Mean & Variance

This document provides an introduction to mean and variance in statistics. It defines the mean as the sum of all values divided by the total number of values. The variance is defined as the average of the squared distances of each value from the mean, and measures how spread out the data is. It then gives examples of calculating the mean and variance of different data sets to illustrate these concepts.

Uploaded by

danjohhn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views4 pages

Data Science Math: Mean & Variance

This document provides an introduction to mean and variance in statistics. It defines the mean as the sum of all values divided by the total number of values. The variance is defined as the average of the squared distances of each value from the mean, and measures how spread out the data is. It then gives examples of calculating the mean and variance of different data sets to illustrate these concepts.

Uploaded by

danjohhn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Data Science Math Skills

Paul Bendich and Daniel Egger


Duke University

Sigma Notation: Mean and Variance


Video companion

1 Introduction
Important equations for this video:

X = {x1 , ..., xn }
n
1X
x = xi
n i=1
" n #
1 X
x 2 = (xi x )2
n i=1

The symbol x is the mean of x, and x 2 is the variance of x. The standard deviation
is denoted x .

2 Mean
Example:

Z = {1, 5, 12}
|Z| = 3
1 + 5 + 12 18
z = = =6
3 3
The mean z is also denoted (z) or simply .

Symbolic example:

Y = {y1 , y2 , y3 , y4 }
1
y = (y1 + y2 + y3 + y4 )
4 !
4
1 X
= yi
4 i=1

1
Data Science Math Skills
Paul Bendich and Daniel Egger
Duke University

In general, suppose you have a set

X = {x1 , x2 , ..., xn },

then the mean of X is


n
!
1 X
x = xi .
n i=1

The variable i is a counter. The variable n is a number, which tells you when to stop counting.

3 Mean centering

Z = {1, 5, 12}
z = 6

0 1 5 6 12 R

Z 0 = {1 6, 5 6, 12 6}
= {5, 1, 6}
z 0 = 0

5 1 0 6 R

Mean centering data produces a new data set, which has the same relationships, but the
mean is zero.

2
Data Science Math Skills
Paul Bendich and Daniel Egger
Duke University

4 Variance

Z = {1, 5, 12}
z = 6

W = {5, 6, 7}
w = 6

5 6 7

0 1 5 6 12 R

Set Z (blue) is more spread out than set W (olive).

If X = {x1 , ..., xn }, the variance of X is


" n #
2 1 X 2
x = (xi x ) .
n i=1

The standard deviation is given by


p
x = x 2 .

Z and W have the same mean, but Z is more spread out, so z should be greater than w .

" 3 #
1 X
w2 = (wi w )2
3 i=1
1
(5 6)2 + (6 6)2 + (7 6)2

=
3
1
(1)2 + 02 + 12

=
3
2
=
3
r
2
w =
3

3
Data Science Math Skills
Paul Bendich and Daniel Egger
Duke University

" 3
#
1 X
z2 = (zi z )2
3 i=1
1
(1 6)2 + (5 6)2 + (12 6)2

=
3
1
(5)2 + (1)2 + 62

=
3
62
=
3
r
62
w =
3
z2  w2 , so Z is much more spread out than W .

You might also like