Reference card
Data aggregations and descriptive statistics
Measures of central tendency
A single value that seeks to describe a set of data by identifying the typical value.
Mean Median Mode
The value that identifies the centre by calculating The value that is exactly in the middle of a set of The value that appears most frequently in the
the arithmetic average of all the data points. values after we have ordered the values from dataset.
smallest to largest.
Commonly used when the data do not contain Commonly used when the data are categorical. That
outliers since the mean is very sensitive to outliers. Commonly used when the data contain a lot of is, the data contain a fixed number of groups.
outliers.
=AVERAGE(value1, [value2, ...]) =MODE(value1, [value2, ...])
=MEDIAN(value1, [value2, ...])
Sum of all data points
Mean = { 5, 1, 5, 8, 5, 1, 5 } Mode = 5
Number of data points { 1, 3, 5, 8, 10, 13, 16 } Median = 8
1+3+5+8+10+13+16
= =8
7
Measures of spread
Describe how far the values of a dataset lie from each other and from the mean, median or mode.
A measure of spread gives us an idea of how well the mean, for example, describes the data.
Standard deviation Interquartile range Quartiles
Measures the amount of variation that exists in the The value that measures the spread of the middle The quartiles segment the data distribution into four
dataset by calculating the difference between half of the data. equal quarters, with the median being in the middle
all data points and the mean. of these quartiles.
Assesses the variability of the middle 50% of the
=STDEV(value1, [value2, ...]) data, where we assume the majority of the values lie. =QUARTILE(data, quartile_number)
=QUARTILE(data, 3) - QUARTILE(data, 1) Q1 Median Q2
SUM(x - mean) x = Each number in the set
2
n n = total number of items in the set Q1 Median Q2
σ =
2
{ 1, 3, 5, 6, 8, 10, 13, 16, 17, 19, 22 } 17 - 5 = 12
{ 1, 3, 5, 6, 8, 10, 13, 16, 17, 19, 22 }
Variance
Range
Measures the amount of variation that exists in the
dataset by calculating the difference between all The value that measures the spread by considering
data points and the mean. the difference between the smallest and largest
values.
=VAR(value1, [value2, ...])
=MAX(value1, [value2, ...])
- MIN(value1, [value2, ...])
SUM(x - mean) x = Each number in the set
2
σ =
2
n n = total number of items in the set
{ 1, 3, 5, 8, 10, 13, 16 } 16 - 1 = 15