0% found this document useful (0 votes)
11 views10 pages

Revision Questions -- SA3(Q)

The document contains revision questions related to univariate and bivariate data analysis, including tasks on histograms, boxplots, descriptive statistics, and regression analysis. It covers various topics such as salary distribution, test scores in Maths and Science, hours spent studying, and the relationship between blood pressure and age. Additionally, it includes practical exercises on calculating statistics, interpreting regression equations, and evaluating data relationships.

Uploaded by

Mary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views10 pages

Revision Questions -- SA3(Q)

The document contains revision questions related to univariate and bivariate data analysis, including tasks on histograms, boxplots, descriptive statistics, and regression analysis. It covers various topics such as salary distribution, test scores in Maths and Science, hours spent studying, and the relationship between blood pressure and age. Additionally, it includes practical exercises on calculating statistics, interpreting regression equations, and evaluating data relationships.

Uploaded by

Mary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

MUFY – MUF0092 2/2021

Revision Questions – Univariate Data, Bivariate Data & Linearization

Question 1

The above histogram shows the salary (in thousand $) for a company with 2700 employees.
(a) What is the nearest percentage (estimated) for the employees that has salary less than
$55 thousand? [57%]
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………

(a) If a boxplot to be constructed based on this histogram, How the boxplot likely to be?
Draw an estimated boxplot without giving the exact 5-number summary.

Midpoint No. Employees


5 50
16 300
27 250
38 400
49 550
60 450
71 250
82 350
93 100
Total 2700

(b) What is the shape of distribution of the data?

…………………………………………………………………………………………………

RevQ – SA3 (Q) Page | 1


MUFY – MUF0092 2/2021

Question 2
Test scores (in percent) for two units Maths and Science are recorded for 16 students as
follows:
Maths
Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Maths 47 50 26 56 71 75 76 90 85 56 80 94 52 76 65 57
Score
(%)
Science 38 20 24 42 52 16 70 45 73 52 85 88 48 80 84 84
Score
(%)

(a) Use graphics calculator calculate the following descriptive statistics for both the units,
give answer correct the nearest percent.

Maths Science
Mean 66 56
Median 68 52
Mode 56, 76 84
Standard Deviation 18 25
Lower Quartile 54 40
Upper Quartile 78 82
Range = Max – Min 68 72

(b) Complete the following statement:


The middle 50% of the Maths scores were between and .

The highest 25% of the Science scores were all above or equal to .

(c) Determine if there is any outlier for Science scores. Show all working to justify your
answer.

…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………

RevQ – SA3 (Q) Page | 2


MUFY – MUF0092 2/2021

(d) The following parallel boxplots for both Maths and Science scores are shown below.

By comparing the given boxplots, comment the quartiles, shape of distributions,


measurement of centre and dispersion.

…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………

RevQ – SA3 (Q) Page | 3


MUFY – MUF0092 2/2021

Question 3
Twenty students, selected randomly were asked to estimate the number of hours (n) that they
had spent studying in the past week (in and out of class). The responses are recorded below.

(a) Write the answers for the following:

(i) Modal interval


(ii) Median interval

(b) Calculate estimated mean. Show working.

…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………

RevQ – SA3 (Q) Page | 4


MUFY – MUF0092 2/2021

Teacher of the class, Johnny, noticed that the number of hours spent studying in the past
week (n) (in and out of class) can give the estimated average marks (M) for the academic
performance. He has done some statistics analysis and he obtained the information as
follows:

𝑛̅ = 39.5, ̅ = 76.8,
𝑀 𝑠𝑛 = 13.2, 𝑠𝑀 = 20.6

He also found that the Pearson correlation coefficient, 𝑟 = 0.8796.

(c) Use the given information, write the regression equation for the number of hours
spent studying (n) and average marks (M).

…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………

(d) What is the slope for the regression equation? Interpret the value relate to the context
of the question.

…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………

(e) Use the regression obtained in part (c) to estimate the average marks for a student
who spent 35 hours in the past week. Give answer to the nearest whole number.

…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………

RevQ – SA3 (Q) Page | 5


MUFY – MUF0092 2/2021

Question 4

…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………

(b) Comment about the skewness of the distributions for this two villages.

…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………

RevQ – SA3 (Q) Page | 6


MUFY – MUF0092 2/2021

Question 5
A chemical solution was gradually heated. At five-minute intervals the time, t minutes, and
the temperature, T ℃, were noted.

Time 0 5 10 15 20 25 30 35
Temperature 0.8 3.0 6.8 10.9 15.6 19.6 23.4 26.7

(a) Which is the independent variable and which is the dependent variable?
…………………………………………………………………………………………………
…………………………………………………………………………………………………
(b) Use your Graphics calculator to draw a scatter plot and comment on the relationship
of t and T by stating the strength, direction and form.
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………

(c) Evaluate the Person’s correlation coefficients and comment about it if it supports your
answer in (b).
…………………………………………………………………………………………………

…………………………………………………………………………………………………
(d) Comment about the coefficient of determination for the relationship of t and T.
…………………………………………………………………………………………………
…………………………………………………………………………………………………
(e) Calculate the equation of the regression line of T on t. [𝑻 = 𝟎. 𝟕𝟖𝒕 − 𝟎. 𝟐𝟓]
…………………………………………………………………………………………………
(f) Use your equation to estimate the temperature after 12 minutes. [𝟗. 𝟏]
Is this estimation reliable?
…………………………………………………………………………………………………
…………………………………………………………………………………………………

RevQ – SA3 (Q) Page | 7


MUFY – MUF0092 2/2021

Question 6
A medical officer wishes to study the relationship between the blood pressure and the age of
male patients. He gets the following results from 12 patients.
BP 118 147 143 160 145 125 115 149 152 130 152 150
Age (years) 36 56 47 72 49 42 38 63 68 42 60 55

(a) Which is the explanatory variable and which is the response variable?

…………………………………………………………………………………………………
…………………………………………………………………………………………………

The equation of the best line fit for this set of data is

𝐵𝑃 = 1.1382 × 𝐴𝑔𝑒 + 80.933

(b) Interpret the coefficient of Age and constant relate to the context in this data set.
Discuss about the constant if it is sensible? Explain.

…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………

(c) Use the given regression equation to estimate the blood pressure of a 25-year-old
patient. Is this reliable and why?

𝐵𝑃 = 1.1382 × (25) + 80.933 = 109.39 ≈ 109

This estimating is NOT reliable, because it is extrapolation.


…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………

RevQ – SA3 (Q) Page | 8


MUFY – MUF0092 2/2021

(d) The residual plot for this data is shown below.

Residual
12
10
8
6
4
2
0
30 35 40 45 50 55 60 65 70 75
-2
-4
-6
-8

Does this residual plot suggest that the linear equation is not appropriate? Explain.

…………………………………………………………………………………………………
…………………………………………………………………………………………………

Does this residual plot suggest that the linear relationship? Explain.
…………………………………………………………………………………………………
…………………………………………………………………………………………………

A squared transformation can be applied to the variable BP to linearize the scatterplot.


(e) Apply the squared transformation to the variable BP and fit a least squares line to the
transformed data. Write the equation of the transformed regression correct to 2
decimal places.
…………………………………………………………………………………………………
…………………………………………………………………………………………………

Another transformation was done to variable Age to linearize the scatterplot.


(f) Apply the transformation log(𝑨𝒈𝒆) and fit a least squares line to the transformed
data. Write the equation of the transformed regression correct to 2 decimal places.
…………………………………………………………………………………………………
…………………………………………………………………………………………………

RevQ – SA3 (Q) Page | 9


MUFY – MUF0092 2/2021

(g) Fill in the information below, correct to 3 decimal places.

Models 𝐵𝑃 𝑣𝑠 𝐴𝑔𝑒 𝐵𝑃 2 𝑣𝑠 𝐴𝑔𝑒 𝐵𝑃 𝑣𝑠 log(𝐴𝑔𝑒)

r-value

(h) Which model is better? Does it suggest better linear relationship? Give reason.

…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………

(i) It is known that the data has the relationship in the form of 𝐴𝑔𝑒 = 𝑎 × 𝑏 𝑘(𝐵𝑃) , where
a, b and k are constants. Use equation in part (f) to get the values of a, b and k.

…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………
…………………………………………………………………………………………………

RevQ – SA3 (Q) Page | 10

You might also like