0% found this document useful (0 votes)

15 views83 pages

Data Interpretation Techniques Explained

Uploaded by

2jpkdb4hst

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views83 pages

Data Interpretation Techniques Explained

Uploaded by

2jpkdb4hst

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Interpreting Data

Jonathan Bestwick
Wolfson Institute of Population Health
Interpreting Data

• Summarising data
• Sampling from a population
• Describing associations
• Comparisons and p-values

2
Types of data
Data

Qualitative Quantitative

Nominal Ordinal
(unordered) (ordered)

Binary Categorical Discrete Continuous

6.49cm

Patient may
live or die Colours Short, 10 graduates Lengt
(yes/no) medium, tall h
Summarising data:
Measures of location
8, 4, 2, 11, 8, 3, 7
Median = middle value when values ordered from smallest to largest
2, 3, 4, 7, 8, 8, 11

Mode = most common value

2, 3, 4, 7, 8, 8, 11

Mean = average = sum of all the values divided by the number of values
2+3+4+7+8+8+11 = 43 = 6.1
7 7
Example: BMI in 163 1st year students
Mode = 21 = Median
40

30
Number of students

Mean = 22
20

0
14 16 18 20 22 24 26 28 30 32 34 36 38 40

BMI
Summarising data:
Measures of spread
8, 4, 2, 11, 8, 3, 7 6.1
(2-6.1)2
• Standard deviation
(3-6.1)2
(4-6.1)2
– Mean was 6.1
(7-6.1)2
– Distance from the mean, squared
(8-6.1)2

(2-6.1)2 = 16.81 , (3-6.1)2 =9.61, etc (8-6.1)2

(11-6.1)2
– Average squared distance from mean 1 2 3 4 5 6 7 8 9 10 11 12

16.81+9.61+4.41+0.81+3.61+3.61+24.01 62.87
= = 9
7 7
– Standard deviation = square root of 9 = 3
Summarising data:
Measures of spread
8, 4, 2, 11, 8, 3, 7
• Interquartile range

– 25th to 75th centile Median = 50th centile

2, 3, 4, 7, 8, 8, 11

25th centile 75th centile

Example: BMI in 163 1st year students
25th 75th
centile centile
40

30
Number of students

Interquartile range
= 20 to 23
20

0
14 16 18 20 22 24 26 28 30 32 34 36 38 40

BMI
Mean or median?
1, 2, 3, 4, 5, 6, 7
Median = 4, Mean = 1+2+3+4+5+6+7 = 4
7
Either would do

1, 2, 3, 4, 5, 6, 100
1+2+3+4+5+6+100 = 17.3
Median = 4, Mean =
7
Better to use median in this case, can avoid the influence of outliers
Standard deviation or interquartile range?
1, 2, 3, 4, 5, 6, 7
2 2 2 2 2 2 2
Standard deviation = (1−4) +(2−4) +(3−4) +(4−4) +(5−4) +(6−4) +(7−4)
7
=2
Interquartile range (IQR) = 2 to 6 Either would do

1, 2, 3, 4, 5, 6, 100
Standard deviation = 33.8
Interquartile range (IQR) = 2 to 6
Better to use IQR in this case, to avoid the influence of outliers
Which measures to use?
Example: AFP levels
1.0

Which measure of central location?

Number of women

40
A. Mean
B. Median
20 C. Mode

0
0 0.5 1 1.5 2 2.5 3 3.5
Alphafetoprotein level
Which measures to use?
Example: AFP levels
1.0
0.8 1.3
60

Which measure of spread?

Number of women

40
A. Standard deviation
B. Interquartile range
20

0
0 0.5 1 1.5 2 2.5 3 3.5
Alphafetoprotein level
Distribution of TSH
(sample of pregnant women in Britain)
1500

1250
How should this data be summarised?
A. Median and standard deviation
Number of women

1000
B. Mean and interquartile range
750 C. Mean and standard deviation
D. Median and interquartile range
500

250

0
0 1 2 3 4 5 6

TSH (mIU/L)
Distribution of free thyroxine
(sample of pregnant women in Britain)
1250

1000
How should this data be summarised?
A. Median and standard deviation
Number of women

750
B. Mean and interquartile range
C. Mean and standard deviation
500 D. Median and interquartile range

250

0
8 10 12 14 16 18 20

FT4 (pmol/L)
Distribution of thyroxine
(sample of pregnant women in Britain)
1250

1000

Gaussian distribution
Number of women

750

500

250

0
8 10 12 14 16 18 20

FT4 (pmol/L)
Carl Friedrich Gauss 1777-1855
• German mathematician and scientist
• The formula for the Gaussian distribution is
æ - ( x - m )2 ö
ç ÷
1 ç 2 sd 2 ÷
y= e è ø

2p sd
• The Gaussian distribution is determined only by the mean (m)
and standard deviation (sd)
• Abraham de Moivre actually specified the formula 100 years
before
Gaussian Distribution

Mean 110 120 130

50 60 70 80 90 100 110 120 130 140 150 160 170 180 190

Systolic Blood Pressure (mmHg)

Gaussian Distribution
Standard
deviation

50 60 70 80 90 100 110 120 130 140 150 160 170 180 190

Systolic Blood Pressure (mmHg)

Gaussian Distribution
Useful properties

• A constant proportion of values will lie within any

specified number of Standard Deviations above or
below the mean
Gaussian Distribution
Useful properties

68%

1 SD 1 SD
16% 16%

-4 sd -3 sd -2 sd -1 sd Mean
0 +1 sd +2 sd +3 sd +4 sd
Variable
Variable
Gaussian Distribution
Useful properties

90%

5% 1.64 SD 1.64 SD 5%

-4 sd -3 sd -2 sd -1 sd Mean
0 +1 sd +2 sd +3 sd +4 sd
Variable
Variable
Gaussian Distribution
Useful properties

95%

2.5% 2.5%
1.96 SD 1.96 SD

-4 sd -3 sd -2 sd -1 sd Mean
0 +1 sd +2 sd +3 sd +4 sd
Variable
Variable
Gaussian Distribution
Useful properties
A constant proportion of values will lie within any
specified number of Standard Deviations above or
below the mean: reference ranges
99% range (0.5th to 99.5th centile) = mean ± 2.58 SDs

95% range (2.5th to 97.5th centile) = mean ± 1.96 SDs

90% range (5th to 95th centile) = mean ± 1.64 SDs

Getting
narrower
Distribution of free thyroxine
(sample of pregnant women in Britain)
1250 Mean = 14

1000 95% reference range

SD = 1.7 = 14 ± 1.96×1.7
Number of women

750 = 10.7 to 17.3

500 Observed middle 95%

of women in the sample
250 = 10.9 to 17.5

0
8 10 12 14 16 18 20
FT4 (mIU/L)
Interpreting Data

• Summarising data
• Sampling from a population
• Describing associations
• Comparisons and p-values

26
What can a sample tell us about
the population?
Population
e.g. true mean BMI in all 1st year
students in English universities

Statistics

Sample
e.g. sample mean
of BMI in 163 QMUL
1st year students
Repeated sampling from a population
Population

Sample mean
True mean

…. .…

Sample .…
Sample 1 Sample 2 …. Samples
100
Sample mean 1 Sample mean 2 Sample mean 100

• If the sample size isn’t too small then the distribution of the
sample mean will be Gaussian
• The standard deviation of this distribution is called the
standard error
Standard error of the mean
• The standard error is a measure of the statistical accuracy of
an estimate
• The standard error of the mean is the standard deviation of
the distribution of all possible sample means
• This can be estimated from a single sample as

Standard deviation
Standard error of the mean =
sample size
Example: BMI in 163 first year QMUL students

• n=163
• mean=22
• standard deviation=4

Standard deviation 4
Standard error = = = 0.3
n 163
Confidence interval for the mean
The 95% confidence interval (CI) of a sample mean is
95% CI = sample mean ± 1.96 × standard error
In our example: 95% CI = mean ± 1.96 x SE
= 22 ± 1.96 x 0.3
= 21.4 to 22.6
We would expect 95% of samples of the same size to have
a mean BMI between 21.4 and 22.6

In the population we are 95% sure that the mean BMI

could be as low as 21.4 or as high as 22.6
95% confidence interval for the mean weight
of a sample of 30 adult men is 75kg to 81kg

Which is the correct definition?

A. In the population we are 95% sure that the mean weight

could be as low as 75kg or as high as 81kg

B. In this study 95% of men weighed between 75kg and 81kg

Confidence intervals

99% CI = sample mean ± 2.58 x standard error

95% CI = sample mean ± 1.96 x standard error

90% CI = sample mean ± 1.64 x standard error

Getting
narrower

• Use standard deviation for ranges (for individual values)

• Use standard error for confidence intervals (for means)

What happens as sample size increases?
Example: systolic blood pressure

Measured in samples of Each time mean = 120,

25, 50 and 100 SD = 15

50 60 70 80 90 100 110 120 130 140 150 160 170 180 190
Systolic Blood Pressure (mmHg)

Sample size 95% range 95% CI

25 114.1 to 125.9
50 115.8 to 124.2
100
117.1 to 122.9
Does the 95% range A. Get wider B. Get narrower C. Stay the same
What happens as sample size increases?
Example: systolic blood pressure

Measured in samples of Each time mean = 120,

25, 50 and 100 SD = 15

50 60 70 80 90 100 110 120 130 140 150 160 170 180 190
Systolic Blood Pressure (mmHg)

Sample size 95% range 95% CI

25 90.6 to 149.4 114.1 to 125.9
50 90.6 to 149.4 115.8 to 124.2
100 90.6 to 149.4 117.1 to 122.9
Does the 95% range A. Get wider B. Get narrower C. Stay the same
What happens as sample size increases?
Example: systolic blood pressure

Measured in samples of Each time mean = 120,

25, 50 and 100 SD = 15

50 60 70 80 90 100 110 120 130 140 150 160 170 180 190
Systolic Blood Pressure (mmHg)

Sample size 95% range 95% CI

25 90.6 to 149.4 114.1 to 125.9
50 90.6 to 149.4 115.8 to 124.2
100 90.6 to 149.4 117.1 to 122.9
Does the 95% CI A. Get wider B. Get narrower C. Stay the same
What happens as sample size increases?
Example: systolic blood pressure

Measured in samples of Each time mean = 120,

25, 50 and 100 SD = 15

50 60 70 80 90 100 110 120 130 140 150 160 170 180 190
Systolic Blood Pressure (mmHg)

Sample size 95% range 95% CI

25 90.6 to 149.4 114.1 to 125.9

50 90.6 to 149.4 115.8 to 124.2

100 90.6 to 149.4 117.1 to 122.9

Does the 95% CI A. Get wider B. Get narrower C. Stay the same
Interpreting Data

• Summarising data
• Sampling from a population
• Describing associations
• Comparisons and p-values

38
Example data: BRradykinesia-Akinesia
INcoordination (BRAIN) Test
Computerised alternate finger tapping test

Best 2 measures
Kinesia score (number of keystrokes)
Akinesia time (mean dwell time on each key)
Correlation
100

Total Motor UPDRS 60

0
0 20 40 60 80
Kinesia score
Correlation
100

80
r = -0.53
Total Motor UPDRS 60

0
0 20 40 60 80
Kinesia score

• Total motor UPDRS scores are negatively correlated with Kinesia scores
• The correlation coefficient, usually denoted r, takes a value between -1 and +1
Correlation (Pearson)
5
r=0
4
No correlation

Variable y
3

1
1 2 3 4 5
Variable x
Correlation (Pearson)
5
r=0
4
No correlation

Variable y
3

2
5
r=1
1
4 1 2 3 4 5
Variable x
Variable y

Perfect
2
positive
correlation
1
1 2 3 4 5
Variable x
Correlation (Pearson)
5
r=0
4
No correlation

Variable y
3

2
5 5
r=1 r = -1
1
4 1 2 3 4 5 4
Variable x
Variable y

Variable y
3 3

Perfect Perfect
2
positive 2 negative
correlation correlation
1 1
1 2 3 4 5 1 2 3 4 5
Variable x Variable x
Correlation (Pearson)
5
r=0
4
No correlation

Variable y
3

2
5 5
r=1 r = -1
1
4 1 2 3 4 5 4
Variable x
Variable y

Variable y
3 3

Perfect 𝑐𝑜𝑣[𝑋, 𝑌] Perfect

2
positive 𝑟= 2 negative
𝑠𝑑! ×𝑠𝑑"
correlation correlation
1 1
1 2 3 4 5 𝑐𝑜𝑣 𝑋, 𝑌 = 𝐸 𝑋 − 𝜇! 𝑌 − 𝜇" 1 2 3 4 5
Variable x Variable x
Correlation (Spearman’s rank)
• If the data do not follow a Gaussian distribution or there is a non-linear
monotonic relationship Spearman’s rank correlation can be used
• Both variables are ranked and the correlation is calculated using the difference
(d) in ranks
100

6 ∑ 𝑑!" Pearson r=0.88

𝑟 =1− 60
Spearman’s rank r=0.96
𝑛 𝑛" − 1 y

0
0 20 40 60 80 100
x
Linear regression
100

Total Motor UPDRS 60

25.5
20

0
0 20 40 43.8 60 80
Kinesia score
Linear regression
100

Total Motor UPDRS 60

0
0 20 40 60 80
Kinesia score
Linear regression
100

80 Regression line minimises

the sum of the vertical
Total Motor UPDRS 60 distances from each point to
the line
40

0
0 20 40 60 80
Kinesia score
Linear regression
100

Dependant Independent
80
variable variable

Total Motor UPDRS

55.8
60 y = b0 + b 1x
UPDRS = b0 + b1×KS
40 UPDRS = 55.8 – 0.7×KS

0
0 20 40 60 80
Kinesia score

UPDRS decreases by 0.7 for every unit increase in kinesia score

Correlation (Pearson) and linear regression

Pearson’s correlation and linear regression only describe linear associations

3000

2000
r=0
y

1000
Regression line

0
0 20 40 60 80 100
x
Some other regression models

• Many different regression curves can be fitted where the data pattern is not
linear
• Polynomial – quadratic (y=b0+b1x+b2x2)
Some other regression models

• Many different regression curves can be fitted where the data pattern is not
linear
• Polynomial – quadratic (y=b0+b1x+b2x2), cubic (y=b0+b1x+b2x2+b3x3) etc
Some other regression models

• Many different regression curves can be fitted where the data pattern is not
linear
• Polynomial – quadratic (y=b0+b1x+b2x2), cubic (y=b0+b1x+b2x2+b3x3) etc
• Exponential growth (y=bx;b>1)
Some other regression models

y = b0 + b1x1 + b2x2 + b3x3 + b4x4 + …

• Independent variables can be continuous,

categorical or a mix
Multivariate linear regression
Dependant Independent
variable variables

y = b0 + b1x1 + b2x2 + b3x3 + b4x4 + …

• Independent variables can be continuous,

categorical or a mix
• Example: kinesia score according to age
and sex
• Kinesia score decreases by an average of
0.2 points per year of age (p=0.005)
• Kinesia score also on average 1.5 points
higher in females (p=0.035)
Predicting gestational age from crown rump length
Which regression should you be doing?
A: GA=61+0.4×CRL B: CRL=-13+0.9×GA
Gestational age from LMP (days)

105 85
80

Crown-rump length (mm)

98 75
70
91
65
60
84
55

77 50
45
70 40
40 45 50 55 60 65 70 75 80 85 70 77 84 91 98 105
Crown-rump length (mm) Gestational age from LMP (days)

What is being predicted should always be on the vertical axis

Interpreting Data

• Summarising data
• Sampling from a population
• Describing associations
• Comparisons and p-values

60
Statistical significance
• An observed sample difference between groups
might be due to chance

• We want to know whether a result is statistically

significant i.e. unlikely to be due to chance

• To determine whether an observed difference was

due to chance we look at confidence intervals and
p-values
Statistical significance
Example: mean BMI in males and females
In our sample of 163 QMUL students, suppose we are interested in
whether BMI differs between male and female students
N Mean (SD) BMI in kg/m2 Difference 95% CI for difference

Male 82 23.1 (4.2)

1.6 0.36 to 2.84
Female 81 21.5 (3.9)
What does this tell us about the difference in BMI between males
and females in the population?
We look at the confidence interval:
95% CI = mean difference ± 1.96 × SE of mean difference
Statistical significance
Example: mean BMI in males and females
Difference in means (95% CI): 1.6 (0.36 to 2.84)

Interpretation:
We are 95% sure that the difference in mean BMI
between male and female 1st year students in English
universities is between 0.36 and 2.84 kg/m2

Is this a true difference in the population or is it likely to

be a chance finding in this sample?
P-values
• If there was truly no difference between the males
and females (the underlying assumption), then the p-
value is the probability of observing a difference of at
least 1.6 kg/m2

• In general, a p-value for a result is the probability of

observing a result as or more extreme than the
sample result if the underlying assumption in the
population is true
Tossing a coin
• If you throw a fair coin 3 times, there are 8 possible combinations of outcomes

Throw 1

H T

Throw 2
H T H T

Throw 3
H T H T H T H T

Total heads 3 2 2 1 2 1 1 0
Total tails 0 1 1 2 1 2 2 3

• The probability of throwing 3 heads is 1/8

• The probability of throwing 2 heads and 1 tail is 3/8
• The probability of throwing at least 2 heads is 4/8 = 1/2
Biased coin?
• We suspect a coin of being biased towards
heads. We throw it three times and it landed
head side up 3 times. Is there enough
evidence to say it is biased?
• The probability of this occurring is 1/8 or
0.125
• We would need to have thrown the coin more
times for their to be sufficient evidence it was
biased
Another coin
• We suspect another coin is biased (but don’t know to which side). To
test this we threw it 22 times and it landed head side up 17 times.
• Is there enough evidence to say the coin is biased?
Probability of getting that
Coin lands heads side up number of heads*
17 0.00628
18 0.00174
19 0.000367
20 0.0000551
21 0.00000525
22 0.000000238
Total 0.008
*Assuming the coin is fair
• Assuming the coin is fair the probability of getting at least 17 heads
from 22 throws is 0.008
P-values
• A p-value for a result is
The probability of observing a result as or more extreme than the sample
result if the underlying assumption in the population is true

• If the coin was fair (the underlying assumption):

The probability of observing 17 heads is 0.006

The probability of observing 17 tails is also 0.006 (as extreme)

The probability of observing at least 17 heads is 0.008 (as or more

The probability of observing at least 17 tails is also 0.008 extreme)

• So the probability of observing at least 17 heads or at least 17 tails is

0.008+0.008 = 0.016. This is the p-value
Statistical significance
Statistical significance

Very unlikely to be a Probably not a Cannot rule out

chance effect chance effect chance effect
(assuming the coin (assuming
Probablythedifferent
coin (assuming the coin
is fair) is fair) is fair)

0.0001 Very likely to be different

0.001 0.01 0.1 1
0.016 0.05
P-value
• The p-value is less than 0.05 so we can be reasonably confident the coin is biased
When can p-values be calculated?

• When there is a comparison

– 2 means – are they different i.e. is their difference
different from 0?
– Association – are the observed results different
from those expected
– Regression – is the slope different from 0?
Statistical significance
• In a study after 1 year patients receiving propranolol had on average a heart rate
10bpm lower than patients receiving a placebo, p=0.0003
Statistical significance

Very unlikely to be a Probably not a Cannot rule out

chance effect chance effect chance effect
(assuming the coin (assuming
Probablythedifferent
coin (assuming the coin
is fair) is fair) is fair)

0.0001 Very likely to be different

0.001 0.01 0.1 1
0.0003 0.05
P-value
• The p-value is much lower than 0.05 so we can be strongly confident propranolol
reduces heart rate
Are girls smarter than boys?!
• IQ measured at age 3 in sample of 150 girls and 156 boys
Mean Standard error 95% CI
Girls 110 0.98 108 to 112
Boys 106 1.00 107 to 109
Difference 4 1.40 1 to 7

• On average girls in this study had an IQ 4 points higher than boys.

• In the population 95% sure that the mean difference is between 1 and 7
• The test used to compare the means is the two-sample t-test
• P-value calculated by comparing the t-statistic (4/1.40) to Student’s t-
distribution with 304 degrees of freedom
Degrees of freedom
• What is meant by degrees of freedom?
• These are the number of values that are free to vary
• Say you had to pick three numbers that had a mean of 10
• Could pick 9, 10 and 11
• Or 8, 10 and 12
• Or 5, 9 and 16
• Once you have picked the first two numbers, to get a mean of 10, the third
number has to be fixed – only 2 of the 3 numbers are free to vary so the
degrees of freedom is 2
• In the example we have 150 girls and 156 boys, so there are 149+155=304
degrees of freedom.
Are girls smarter than boys?!
• IQ measured at age 3 in sample of 150 girls and 156 boys
Mean Standard error 95% CI
Girls 110 0.98 108 to 112
Boys 106 1.00 107 to 109
Difference 4 1.40 1 to 7

• P-value calculated by comparing 2.85; p=0.007

the t-statistic (4/1.40=2.85) to
Student’s t-distribution with 304
degrees of freedom
P-values and confidence intervals
In our example
95% CI: 1 to 7 doesn’t contain 0
P-value: 0.007 <0.05

They are consistent:

If the 95% CI for a difference excludes 0 then p<0.05

contains 0 then p≥0.05
P-values and confidence intervals
In general:

If the 99% CI for a difference excludes 0 then p<0.01

contains 0 then p≥0.01

If the 95% CI for a difference excludes 0 then p<0.05

contains 0 then p≥0.05

If the 90% CI for a difference excludes 0 then p<0.1

contains 0 then p≥0.1
P-values and confidence intervals
The p-value for the difference in birth weight of
children born to smokers compared with non-
smokers is 0.02
Which is the correct 95% confidence interval for the
difference in birth weight?
1. -0.70 to 0.06kg
2. -0.06 to 0.70kg
3. 0.06 to 0.70kg
P-values and confidence intervals
In a study, a group of patients took statins and another group
placebo. The mean difference in LDL cholesterol was 1 mmol/L:

The 95% CI was 0.2 to 1.8

The 99% CI was -0.1 to 2.1
Which is correct?
1. P-value is less than 0.01
2. P-value is less than 0.05 but greater than 0.01
3. P-value is greater than 0.05
T-tests and non-parametric tests
• T-tests can also be performed where two measurements are
made on the same group of people – paired t-test
• Calculate individual differences – is the mean of those different to 0?
• T-tests assume data follows a Gaussian distribution
• If not, non-parametric test can be performed
• These are based on the ranks of the data rather than the actual values
of the data
• Instead of 2-sample t-test do Mann Whitney U-test (Wilcoxon rank
sum test)
• Instead of paired t-test do Wilcoxon signed rank test
Are alcohol drinkers more likely to smoke?
Sample of first year medical students
Ever smoked?
Yes No Total
Drink Yes 36 75 111
Observed alcohol?
No 5 47 52
Total 41 122 163

111×41/163
Ever smoked?
Yes No Total
Drink Yes 28 83 111
Expected alcohol?
No 13 39 52
Total 41 122 163
Are alcohol drinkers more likely to smoke?
Sample of first year medical students
Observed Expected
Ever smoked? Ever smoked?
Yes No Total Yes No Total
Drink Yes 36 75 111 Drink Yes 28 83 111
alcohol? alcohol?
No 5 47 52 No 13 39 52

Total 41 122 163 Total 41 122 163

• Test statistic calculated by comparing the observed with the expected

• For the first cell this is (36-28)2/28 = 2.3
• Do for the other three cells and add up to get 9.8, the test statistic
• Compare to chi-squared distribution to get p-value; p=0.002
Chi-squared test and Fisher’s exact test
If the data are sparse (less than 5 in at least one of the cells of the table) then
Fisher’s exact test is more appropriate than the Chi-squared test

Ever smoked?
The probability of observing the data
Yes No Total
given the fixed row and column totals is
Drink Yes a b a+b
alcohol? 𝑎+𝑏 !+ 𝑐+𝑑 !+ 𝑎+𝑐 !+ 𝑏+𝑑 !
No c d c+d
Total a+c b+d a+b+c+d (=n) 𝑎!×𝑏!×𝑐!×𝑑!×𝑛!

The same calculation is done for all possible tables that can be created given the
fixed row and column totals. The p-value is then the sum of all probabilities less
than or equal to the probability of the data observed

Epidemiology
No ratings yet
Epidemiology
43 pages
Central Tendency & Dispersion Guide
No ratings yet
Central Tendency & Dispersion Guide
256 pages
أسس أحصاء حيوي طبع
No ratings yet
أسس أحصاء حيوي طبع
215 pages
1000008355
No ratings yet
1000008355
23 pages
Research L8 Stats
No ratings yet
Research L8 Stats
128 pages
Central Limit Theorem & Applications
No ratings yet
Central Limit Theorem & Applications
6 pages
Bio Statistics
No ratings yet
Bio Statistics
147 pages
Central Tendency and Dispersion Explained
No ratings yet
Central Tendency and Dispersion Explained
23 pages
تصميم وتحليل التجارب الإحصائية
No ratings yet
تصميم وتحليل التجارب الإحصائية
22 pages
IM (VM 8.2.1.4) Distributions and Probability 2024-2025 3
No ratings yet
IM (VM 8.2.1.4) Distributions and Probability 2024-2025 3
68 pages
Medical Statistics for Students
No ratings yet
Medical Statistics for Students
67 pages
Understanding Statistical Concepts
No ratings yet
Understanding Statistical Concepts
253 pages
IM (VM 8.2.1.4) Distributions and Probability 2024-2025 4
No ratings yet
IM (VM 8.2.1.4) Distributions and Probability 2024-2025 4
68 pages
Statistics in Dentistry Workshop
No ratings yet
Statistics in Dentistry Workshop
55 pages
Biostatistics 4
No ratings yet
Biostatistics 4
41 pages
Statistical Analysis of Blood Glucose
No ratings yet
Statistical Analysis of Blood Glucose
253 pages
Instalinotes - PREVMED II
100% (1)
Instalinotes - PREVMED II
24 pages
Chapter 4 Normal Distrubution
No ratings yet
Chapter 4 Normal Distrubution
46 pages
Dispersion
No ratings yet
Dispersion
13 pages
2.4 General Epidemiological Measures
No ratings yet
2.4 General Epidemiological Measures
32 pages
Lec3&4 02sep2016
No ratings yet
Lec3&4 02sep2016
43 pages
Biostatistics Measures of Central Tendency
No ratings yet
Biostatistics Measures of Central Tendency
3 pages
8 (1) .Basic Stat Inference
No ratings yet
8 (1) .Basic Stat Inference
41 pages
Measures of Variability in Statistics
No ratings yet
Measures of Variability in Statistics
24 pages
03-Biostat AFCM-Probability Distribution - Normal Distribution
No ratings yet
03-Biostat AFCM-Probability Distribution - Normal Distribution
24 pages
Measures of Central Tendency - and - Dispersion
No ratings yet
Measures of Central Tendency - and - Dispersion
44 pages
Biostatistics SGU
No ratings yet
Biostatistics SGU
84 pages
Descriptive + Hypothesis
No ratings yet
Descriptive + Hypothesis
28 pages
Biostatistics Lecture 8 Sampling Distribution of The Mean and CI
No ratings yet
Biostatistics Lecture 8 Sampling Distribution of The Mean and CI
13 pages
Basic Statistics: Populations and Samples
No ratings yet
Basic Statistics: Populations and Samples
10 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
53 pages
Medical Data Analysis Workshop
No ratings yet
Medical Data Analysis Workshop
46 pages
Statistical Inference: Prepared By: Antonio E. Chan, M.D
No ratings yet
Statistical Inference: Prepared By: Antonio E. Chan, M.D
227 pages
Statistics
No ratings yet
Statistics
46 pages
Confidence Interval
No ratings yet
Confidence Interval
22 pages
Biostat 6&7
No ratings yet
Biostat 6&7
6 pages
UDEC1203 - Topic 6 Analysis of Experimental Data
No ratings yet
UDEC1203 - Topic 6 Analysis of Experimental Data
69 pages
Biostatistics Module: Amulya Gupta - Manraj Singh Sra - Archisman Mazumder
No ratings yet
Biostatistics Module: Amulya Gupta - Manraj Singh Sra - Archisman Mazumder
15 pages
2 - Biostatistics
No ratings yet
2 - Biostatistics
77 pages
Biostat 4&5
No ratings yet
Biostat 4&5
6 pages
Cheat Sheet 1
No ratings yet
Cheat Sheet 1
2 pages
10 - Normal Distribution - Confidence Interval-Hypothsesis Testing-P Value - Dr. Mirella' With You
No ratings yet
10 - Normal Distribution - Confidence Interval-Hypothsesis Testing-P Value - Dr. Mirella' With You
30 pages
Biostatistics QUestions and Answers
No ratings yet
Biostatistics QUestions and Answers
7 pages
Statistics: Central Tendency & Dispersion
No ratings yet
Statistics: Central Tendency & Dispersion
35 pages
5
No ratings yet
5
24 pages
EM4 Lecture Notes
No ratings yet
EM4 Lecture Notes
6 pages
Statistics Made Easy Presentation
100% (2)
Statistics Made Easy Presentation
226 pages
Statistics Made Easy Presentation PDF
No ratings yet
Statistics Made Easy Presentation PDF
226 pages
Understanding Normal Distribution in BP
100% (1)
Understanding Normal Distribution in BP
36 pages
BiostatisticsEpidemiologySlides2020 FullSize 1
No ratings yet
BiostatisticsEpidemiologySlides2020 FullSize 1
245 pages
Biostatistics PDF
No ratings yet
Biostatistics PDF
40 pages
Seminar1 1
No ratings yet
Seminar1 1
44 pages
Confidence Interval For Means
No ratings yet
Confidence Interval For Means
37 pages
2statsnotes 1
No ratings yet
2statsnotes 1
24 pages
Measures of Centrality & Variability
No ratings yet
Measures of Centrality & Variability
41 pages
Psychopedagogy and Its Contributions To The Teaching-Learning Process
No ratings yet
Psychopedagogy and Its Contributions To The Teaching-Learning Process
12 pages
AI's Role in Healthcare: Pros & Cons
No ratings yet
AI's Role in Healthcare: Pros & Cons
2 pages
Iso 17123-1-2010
No ratings yet
Iso 17123-1-2010
42 pages
Guidelines
No ratings yet
Guidelines
2 pages
Set Texts (Complete Course Pack)
100% (1)
Set Texts (Complete Course Pack)
358 pages
Team 17
No ratings yet
Team 17
1 page
Analysis and Analyses of Electroacustic Music
No ratings yet
Analysis and Analyses of Electroacustic Music
11 pages
CBE 111.docx KARATINA UNIVERSITY
No ratings yet
CBE 111.docx KARATINA UNIVERSITY
26 pages
Advancing Positive Gender Norms and Socialization Through UNICEF Programmes - Monitoring and Documenting Change
No ratings yet
Advancing Positive Gender Norms and Socialization Through UNICEF Programmes - Monitoring and Documenting Change
70 pages
Graphical Measures of Central Tendency
No ratings yet
Graphical Measures of Central Tendency
38 pages
S2017 Hubble Expansion
No ratings yet
S2017 Hubble Expansion
25 pages
2009 Pronin - Introspection Illusion
No ratings yet
2009 Pronin - Introspection Illusion
332 pages
Nihms 697032
No ratings yet
Nihms 697032
24 pages
CESGA 3.1 Examination Syllabus
No ratings yet
CESGA 3.1 Examination Syllabus
4 pages
Introduction to Operations Research
No ratings yet
Introduction to Operations Research
8 pages
TikTok Live: Impulse Buying Factors
No ratings yet
TikTok Live: Impulse Buying Factors
29 pages
MBA Research & Operations Course
No ratings yet
MBA Research & Operations Course
3 pages
Technical Analysis of HDFC Bank
No ratings yet
Technical Analysis of HDFC Bank
58 pages
Unit 1 What Is TQM
No ratings yet
Unit 1 What Is TQM
3 pages
Among Grade 7 Students in Malobago Pagsang-An National High School."
No ratings yet
Among Grade 7 Students in Malobago Pagsang-An National High School."
4 pages
Trainee Work System Design in The Hot Kitchen Sect
No ratings yet
Trainee Work System Design in The Hot Kitchen Sect
10 pages
Business Change: Option or Compulsion?
85% (13)
Business Change: Option or Compulsion?
9 pages
Proofread
No ratings yet
Proofread
10 pages
Parents' Education & Student Success
No ratings yet
Parents' Education & Student Success
75 pages
Historiopreneurship Related Paper 3
No ratings yet
Historiopreneurship Related Paper 3
13 pages
Official GAT Practice Test
100% (3)
Official GAT Practice Test
24 pages
Rohit Final Project 01
No ratings yet
Rohit Final Project 01
6 pages
Structural Interdependence in Teams: An Integrative Framework and Meta-Analysis
No ratings yet
Structural Interdependence in Teams: An Integrative Framework and Meta-Analysis
22 pages
Water: Ipeat Calibration Tool of SWAT
No ratings yet
Water: Ipeat Calibration Tool of SWAT
17 pages
Understanding Audit Evidence and Programs
No ratings yet
Understanding Audit Evidence and Programs
11 pages

Data Interpretation Techniques Explained

Uploaded by

Data Interpretation Techniques Explained

Uploaded by

Interpreting Data

Binary Categorical Discrete Continuous

Mode = most common value

(2-6.1)2 = 16.81 , (3-6.1)2 =9.61, etc (8-6.1)2

– 25th to 75th centile Median = 50th centile

25th centile 75th centile

Which measure of central location?

Which measure of spread?

Mean 110 120 130

Systolic Blood Pressure (mmHg)

Systolic Blood Pressure (mmHg)

• A constant proportion of values will lie within any

95% range (2.5th to 97.5th centile) = mean ± 1.96 SDs

90% range (5th to 95th centile) = mean ± 1.64 SDs

1000 95% reference range

750 = 10.7 to 17.3

500 Observed middle 95%

In the population we are 95% sure that the mean BMI

Which is the correct definition?

A. In the population we are 95% sure that the mean weight

B. In this study 95% of men weighed between 75kg and 81kg

99% CI = sample mean ± 2.58 x standard error

95% CI = sample mean ± 1.96 x standard error

90% CI = sample mean ± 1.64 x standard error

• Use standard deviation for ranges (for individual values)

• Use standard error for confidence intervals (for means)

Measured in samples of Each time mean = 120,

Sample size 95% range 95% CI

Measured in samples of Each time mean = 120,

Sample size 95% range 95% CI

Measured in samples of Each time mean = 120,

Sample size 95% range 95% CI

Measured in samples of Each time mean = 120,

Sample size 95% range 95% CI

50 90.6 to 149.4 115.8 to 124.2

Total Motor UPDRS 60

Perfect 𝑐𝑜𝑣[𝑋, 𝑌] Perfect

6 ∑ 𝑑!" Pearson r=0.88

Total Motor UPDRS 60

Total Motor UPDRS 60

80 Regression line minimises

Total Motor UPDRS

UPDRS decreases by 0.7 for every unit increase in kinesia score

Pearson’s correlation and linear regression only describe linear associations

y = b0 + b1x1 + b2x2 + b3x3 + b4x4 + …

• Independent variables can be continuous,

y = b0 + b1x1 + b2x2 + b3x3 + b4x4 + …

• Independent variables can be continuous,

Crown-rump length (mm)

What is being predicted should always be on the vertical axis

• We want to know whether a result is statistically

• To determine whether an observed difference was

Male 82 23.1 (4.2)

Is this a true difference in the population or is it likely to

• In general, a p-value for a result is the probability of

• The probability of throwing 3 heads is 1/8

• If the coin was fair (the underlying assumption):

The probability of observing 17 heads is 0.006

The probability of observing at least 17 heads is 0.008 (as or more

• So the probability of observing at least 17 heads or at least 17 tails is

Very unlikely to be a Probably not a Cannot rule out

0.0001 Very likely to be different

• When there is a comparison

Very unlikely to be a Probably not a Cannot rule out

0.0001 Very likely to be different

• On average girls in this study had an IQ 4 points higher than boys.

• P-value calculated by comparing 2.85; p=0.007

They are consistent:

If the 95% CI for a difference excludes 0 then p<0.05

If the 99% CI for a difference excludes 0 then p<0.01

If the 95% CI for a difference excludes 0 then p<0.05

If the 90% CI for a difference excludes 0 then p<0.1

The 95% CI was 0.2 to 1.8

Total 41 122 163 Total 41 122 163

• Test statistic calculated by comparing the observed with the expected

You might also like