0% found this document useful (0 votes)
225 views9 pages

SBST3203 Elementary Data Analysis MAY 2020: Name: Arif Soebah Id No: 830811125679001 Phone Number: 013-8880791 Email

This document contains a student's final exam for an elementary data analysis course. It includes two parts - Part A with three multiple choice questions and Part B with two longer questions. The student correctly answered Questions 1 and 3 in Part A, providing the regression equation and confidence interval for Question 1, and performing the chi-square test for independence for Question 3. In Part B, the student analyzed the regression model, finding a high R-squared value and using ANOVA to reject the null hypothesis. They also derived the prediction equation.

Uploaded by

anwar soebah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
225 views9 pages

SBST3203 Elementary Data Analysis MAY 2020: Name: Arif Soebah Id No: 830811125679001 Phone Number: 013-8880791 Email

This document contains a student's final exam for an elementary data analysis course. It includes two parts - Part A with three multiple choice questions and Part B with two longer questions. The student correctly answered Questions 1 and 3 in Part A, providing the regression equation and confidence interval for Question 1, and performing the chi-square test for independence for Question 3. In Part B, the student analyzed the regression model, finding a high R-squared value and using ANOVA to reject the null hypothesis. They also derived the prediction equation.

Uploaded by

anwar soebah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

111Equation Chapter 1 Section 1

SBST3203

ELEMENTARY DATA ANALYSIS

MAY 2020

FINAL EXAM

NAME : ARIF SOEBAH

ID NO : 830811125679001

PHONE NUMBER : 013-8880791

EMAIL : [email protected]
PART A (choose Question 1 and 3)

Question 1 (a)

^
y  4.907  0.4105 x

1 which is the slope that indicates y is expected to increase by 0.4105 unit when x increase

by 1 unit.

 0 which is y-intercept of the regression line indicates that when value of x  0, y  4.907 .

^
y  4.907  0.4105 x
^
y  4.907  0.4105  120 
^
y  44.353

Question 1 (b)(i)

Based on table A

x y xy x2
110 44  110   44   4840 1102  12100
135 50 18225
6750
85 28 7225
2380
95 33 9025
3135
115 41 13225
4715
70 25 4900
1750
Total = 610 Total = 221 Total = 23570 Total = 64700

  y
2

s yy   y  i
n
 44  50  28  33  41  25
2

s yy   44  50  28  33  41  25
2 2 2 2 2 2
 
6
2
221
s yy  8615 
6
 s yy  474.8333
s xy   xy  
x y
n

s xy  23570 
 610  221
6
 sxy  1101.6667

s yy  b1s xy
s2 
n2
474.8333  0.4105  1101.6667 
s2 
62
s  5.6498
2

Question 1 (b)(ii)

95% Confidence Interval for


1

 b1  t s  1 
,n  2
2

t0.025,4  2.776

s2
s  b1 
sxx

  x
2
6102
s xx   x 2
  64700   2683.3333
n 6
5.6498
s  b1   0.0459
2683.3333

95% Confidence Interval for


1

 0.4105  2.776  0.0459 


  0.2831 , 0.5379 
Question 3

3(a)(i)

Let,

 0 is Null Hypothesis and 1 is Alternative Hypothesis

Hypothesis:

 0 : Attending or not attending the course has no effect on Award Status

1 : Attending or not attending the course has effect on Award Status

3(a)(ii)

The formula for expected values is,

E  (Row total containing that cell)(Column Total Containing that cell) / (Grand Total)

The row and column calculated as below:

Attended Did Not Attend Total


Received 20 12 32
Do not Receive 32 36 68
Total 52 48 100

The table below shows the calculations to obtain the table with expected values:

Expected Values Attended Did Not Attend Total


Received  52   32   16.64
 48  32   15.36
32
100 100
Do not Receive  52   68  35.36  48  68  32.64 68
100 100
Total 52 48 100
3(b)(i)

  m  1  n  1
Degree of freedom

Where, m is no. of row, n is no. of column.

df   m  1  n  1   2  1  2  1  1

  0.01

 2 ,df   2 0.01,1  6.635


Critical value for this test:

Decision rule: We reject the null hypothesis if   6.635


2

3(b)(ii)

 Oi  Ei 
2

 
2

Ei
 20  16.64   12  15.36  32  35.36   36  32.64 
2 2 2 2

 
2
  
16.64 15.36 35.36 32.64
  0.678  0.319  0.735  0.346
2

 2  2.079

Decision:

By using the rejection region approach:

Since,
 2 test : 2.079  6.635,

We fail to reject the


0

Conclusion:

We do not have enough evidence at   0.01 to show that attending or not attending the
course has effect on whether a salesman wins or doesn’t win an industry award.
PART B

QUESTION 1 (a)

R 2  0.998

The total variation in y is explained by


x1 and x2 by 99.8% .

2
Since R value is high, we conclude that the model fit the data.

QUESTION 1 (b)

Hypothesis

 0  1   2  0

1   i  0, for at least one value of i , i  1, 2

Consider ANOVA table,

MSR 442.4
Ftest statistic    1474.6667
MSE 0.30

From the table,

F0.99,2,7  9.547

Conclusion:

F  F0.99,2,7  0 at 1% significance level.


Since , we reject

Thus, the model contributes significantly to the prediction of y at   0.01 level of


significance.

^
x x  4 is
The prediction equation relating y and 1 when 2
^
y  28.3906  1.4631x1  3.8445 x2
^
y  28.3906  1.4631x1  3.8445  4 
^
y  13.0126  1.4631x1

 y  13.0126  1.4631x1 is the required prediction equation.

QUESTION 2 (a)

Let,

xi = The number of hours studied for a statistics test

yi = The test score for a statistics test

a) We know that,

The fitted simple linear regression model is,

^ ^ ^
y  a  bx

^ ^
Where, a is estimate of y-intercept of regression line while b is estimate of slope of
regression line.

Also we have,

^ cov( x, y ) ^ ss
b b  xy ^  ^ 
var( x ) or ss x and a  y  b x

Where,

  x  x   y  y 


 
ssxy  i i

  x  x 

ss x  i

Here,

n  10, x i  100, y i  564



x 
x i

100
 10 ,

y 
y i

564
 56.4
n 10 n 10

2
 x  x   x  x   y  y 
 i   i  i 
    
 4  10 
2
 36  4  10  31  56.4   152.4
1 1.6
0 0
16 66.4
36 116.4
9 37.2
4 7.2
144 415.2
81 318.6
49 193.2
Total = 376 Total = 1305

From that, we get

^ ss xy 1305
b    3.470745
ssx 376

^
To find a ,

^  ^ 
a  y bx
^
a  56.4   3.470745  10 
^
a  21.692553

 A simple linear regression model that fit the given data is,

^
y  21.692553  3.470745 x
QUESTION 2(b)

^
y  21.692553  3.470745 x
^
y  21.692553  3.470745  16 
^
y  77.224468
^
y  77

The average test score of a student who spends 10 hours studying for the test is 77.

You might also like