0% found this document useful (0 votes)
12 views13 pages

FandI Subj101 200309 Examreport

The Examiners' Report for Subject 101 - Statistical Modelling from September 2003 provides an overview of the exam performance, noting that the paper's standard was comparable to previous years, though some questions were particularly challenging. It includes detailed solutions and comments on various statistical problems, highlighting methods and interpretations relevant to the syllabus. The report aims to assist candidates in understanding the exam expectations and their performance outcomes.

Uploaded by

lmaluleke893
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views13 pages

FandI Subj101 200309 Examreport

The Examiners' Report for Subject 101 - Statistical Modelling from September 2003 provides an overview of the exam performance, noting that the paper's standard was comparable to previous years, though some questions were particularly challenging. It includes detailed solutions and comments on various statistical problems, highlighting methods and interpretations relevant to the syllabus. The report aims to assist candidates in understanding the exam expectations and their performance outcomes.

Uploaded by

lmaluleke893
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Faculty of Actuaries Institute of Actuaries

REPORT OF THE BOARD OF EXAMINERS

September 2003

Subject 101 — Statistical Modelling

EXAMINERS’ REPORT

Introduction

The attached subject report has been written by the Principal Examiner with the aim of
helping candidates. The questions and comments are based around Core Reading as the
interpretation of the syllabus to which the examiners are working. They have however
given credit for any alternative approach or interpretation which they consider to be
reasonable.

J Curtis
Chairman of the Board of Examiners

11 November 2003

© Faculty of Actuaries
© Institute of Actuaries
Faculty of Actuaries Institute of Actuaries

EXAMINATIONS

September 2003

Subject 101 — Statistical Modelling

EXAMINERS’ REPORT

© Faculty of Actuaries
© Institute of Actuaries
Subject 101 (Statistical Modelling) — September 2003 — Examiners’ Report

General comments

The Examiners are of the view that, overall, the paper was of a comparable standard to those set
in recent diets. However they do recognise that most candidates found the last two questions in
the paper rather demanding.

1 Let X be the number remaining in the scheme for at least 10 years.

X ~ binomial(50, 0.4)

So, approximately X ~ N(20, 12)

We require P(X > 25).

Using a continuity correction,

25.5 − 20
P(X > 25.5) O P ( Z > ) = P ( Z > 1.59) = 1 − 0.944 = 0.056
12

⎛ 3X ⎞
2 P( X > S) = P⎜ > 3 ⎟ = P ( t8 > 3)
⎝ S ⎠

which is between 0.005 and 0.01.

θˆ (1 − θˆ )
3 ±1.96 where θ̂ = 0.16
200

(0.16)(0.84)
⇒ ±1.96 ⇒ ±1.96(0.026) ⇒ ±0.051
200

or, as an interval: 0.16 ± 0.051 ⇒ (0.109, 0.211)

∑ i=1 xi = 51.2, ∑ i=1 xi2 = 243.19;


16 16
4 n = 16,

51.22
243.19 −
x = 3.20, s 2 = 16 = 79.35 = 5.29.
15 15

s = 5.29 = 2.3

Page 3
Subject 101 (Statistical Modelling) — September 2003 — Examiners’ Report

The 95% confidence interval is given by:

s 2.3
x ± t0.025,n−1 = 3.2 ± 2.131 = 3.2 ± 1.23
n 4

i.e. (1.97, 4.43)

5 P(at least one test is significant | each null hypothesis is true)

= 1 − P(no test is significant | each null hypothesis is true)

= 1 − (1 − 0.05)10 as the 10 tests are independent

= 0.4

Comment: a false-positive is very likely with the 10 multiple tests.

6 Sxx = 5 – 32/3 = 2 , Sxy = y + 4 – 3(y+2)/3 = 2

So fitted slope = 2/2 = 1

7 N is Poisson(kλ) with MN(t) = exp[kλ{exp(t) – 1}].

S has a compound distribution with mgf MS(t) = MN{logMX(t)}

and MX(t) = exp(µt + σ2t2/2).

So mgf of S is MN(µt + σ2t2/2) and correct suggestion is A.

OR: by using the result quoted in the Formulae and Tables book

OR: we must have MS(0) = 1, so B is wrong.

8 Let X be the number of forms with incomplete information in a batch of n forms.

Then X ~ Poisson(0.02n) approximately

With n = 516, X ~ P(10.32) and P(X ≤ 16) = 0.965 approx, by linear interpolation

With n = 515, X ~ P(10.3) and P(X ≤ 15) = 0.940 approx, by linear interpolation

So he requires 516 forms

Page 4
Subject 101 (Statistical Modelling) — September 2003 — Examiners’ Report

OR: the distribution of X can be modelled approximately using a normal distribution


with mean 0.02n and variance 0.0196n; we require P(X ≤ n − 500) to be at least 0.95;
the analysis is more awkward, but solving a quadratic in n gives n ≥ 515

9 (i)

Y
2 4 6
1 0.2 0.0 0.2 0.4
X 2 0.0 0.2 0.0 0.2
3 0.2 0.0 0.2 0.4
0.4 0.2 0.4

E[X] = 0.4 × 1 + 0.2 × 2 + 0.4 × 3 = 2

E[Y] = 0.4 × 2 + 0.2 × 4 + 0.4 × 6 = 4

E[XY] = 1 × 2 × 0.2 + 1 × 4 × 0.0 + 1 × 6 × 0.2


+ 2 × 2 × 0.0 + 2 × 4 × 0.2 + 2 × 6 × 0.0
+ 3 × 2 × 0.2 + 3 × 4 × 0.0 + 3 × 6 × 0.2 = 8

E[XY] − E[X]E[Y] = 0 Therefore uncorrelated.

X and Y are not independent since

P(X = x and Y = y) ≠ P(X = x) P(Y = y)

e.g. x = 1 y = 2, 0.2 ≠ 0.4 × 0.4 = 0.16.

(ii) X and Y are independent if joint probability is:

Y
2 4 6
1 0.2 0.0 0.2 0.4
X 2 0.1 0.0 0.1 0.2
3 0.2 0.0 0.2 0.4
0.5 0.0 0.5

10 The mean number of claims per day is

{(32 × 1) + (17 × 2) + (2 × 3) + (0 × 4) + (1 × 5)}/100 = 0.77.

Use 0.77 as an estimate of the mean of the Poisson distribution. Thus

Page 5
Subject 101 (Statistical Modelling) — September 2003 — Examiners’ Report

e−λ λ x
P ( X = x) = is estimated by
x!

e−0.77 0.77 x
P ( X = x) = , x = 0, 1, 2, …
x!

The expected frequencies are given by 100 × P(X = x).

No. of claims (x) 0 1 2 3 4 ≥5 Total


Obs. frequency (fi) 48 32 17 2 0 1 100
Exp. frequency (ei) 46.3 35.7 13.7 3.5 0.7 0.1 100.0

Categories x = 3, 4, and ≥5 are grouped together to ensure that all ei are greater than 1.

The expected frequency for ≥ 3 is 3.5 + 0.7 + 0.1 = 4.3; the corresponding observed
frequency is 3.

1.7 2 3.7 2 3.32 1.32


χ 2 = ∑ ( fi − ei ) 2 / ei = + + + = 1.63.
46.3 35.7 13.7 4.3

There are 2 d.f. [4 categories x = 0,1,2, and ≥3, and 1 parameter estimated from the
data.]

The probability value =

(
P χ 22 > 1.63 ) ≅ 1 – 0.557 = 0.443 from the Yellow Tables p164

There is insufficient evidence to suggest that the number of claims does not follow a
Poisson distribution (i.e. the model provides a good fit to the data).

An alternative solution (in this over-conservative approach some information is


thrown away unnecessarily - but it was awarded full marks):

Grouping categories x = 2, 3, 4 and ≥ 5 and using only 3 cells with observed


frequencies 48, 32, and 20 and expected frequencies 46.3, 35.7, and 18.0 gives χ2 =
0.668 on 1 degree of freedom. The probability value is 0.414. Same conclusion.

11 (i) Total sum: 13046 + 12592 + 10965 + 12128 = 48731

Total sum of squares: 17322090 + 16116822 + 12208021 + 14929994


= 60576927

SST = 60576927 − 487312/40 = 1209168

Page 6
Subject 101 (Statistical Modelling) — September 2003 — Examiners’ Report

SSB = (130462 + 125922 + 109652 + 121282)/10 − 487312/40


= 59607619 − 487312/40 = 239860

SSR = SST − SSB = 1209168 − 239860 = 969308

Source of variation df Sums of Squares Mean Squares


Between regions 3 239860 79953
Residual 36 969308 26925
Total 39 1209168

F = 79953/26925 = 2.97 on 3, 36 d.f.

Therefore, since the value of F3,36 (0.05) is 2.866, the observed F value (2.97)
exceeds it and so the null hypothesis that the population means are equal is
rejected at the 5% level of significance. However, as F3,36 (0.01) is 4.377, the
null hypothesis is not rejected at the 1% level.

(ii) Means:

A: y1. = 1304.6 B: y2. = 1259.2

C: y3. = 1096.5 D: y4. = 1212.8

Least significant difference, for each pair of regions, is (5% level):

t0.025,36 σˆ ( 1/10 + 1/10)1/2 = 2.028 26925 (2/10)1/2 = 149

Differences between pairs of means:

y1. − y2. = 45.4 , y1. − y3. = 208.1 , y1. − y4. = 91.8

y2. − y3. = 162.7 , y2. − y4. = 46.4 , y3. − y4. = −116.3

Region C Region D Region B Region A


y3. y4. y2. y1.

(Alternative answers which have the following conclusion are acceptable:


The population mean claim amount for region C appears to be less than the
population mean of region A and the population mean of region B. However,
the population mean for region C and the population mean for region D do not
appear to differ.)

Page 7
Subject 101 (Statistical Modelling) — September 2003 — Examiners’ Report

12 (i) E[ X ] = E[ E ( X | U )] = E[U ] = α / λ
V [ X ] = E[V ( X | U )] + V [ E ( X | U )] = E[U ] + V [U ] = α / λ + α / λ 2

(ii) Using the method of moments, α and λ may be estimated by solving the
equations

α α α
x= and s 2 = + 2
λ λ λ

which gives

x2 x
αˆ = 2
and λˆ = 2
.
s −x s −x

(iii) If s 2 ≤ x , then the method of moments produces inadmissible estimates as the


parameters α and λ must be positive and finite.

13 (i) (a)

. : .
. : : : : . : . . . .
---+---------+---------+---------+---------+---------+---ppm
75.0 77.5 80.0 82.5 85.0 87.5

Dotplot shows moderate positive skewness

1587 2
126131 −
1587 20 = 10.66
(b) x= = 79.35, s 2 =
20 19

s2
95% confidence interval is x ± t0.025,19
20

10.66
giving 79.35 ± 2.093 ⇒ 79.35 ± 1.53 ⇒ (77.82,80.88)
20

(c) This t confidence interval requires normality of the observations.

This may be doubtful in view of the skewness shown in part (a), but
the sample size of 20 is perhaps large enough to justify the validity due
to the robustness of the t analysis.

Page 8
Subject 101 (Statistical Modelling) — September 2003 — Examiners’ Report

(ii) (a) Differences (before − after) are:

2 4 0 -1 1 3 3 0 1 0
-3 2 1 0 -2 2 1 5 2 -1

Dotplot of differences:

: : :
. . : : : : : . .
-+---------+---------+---------+---------+---------+-----diff
-3.0 -1.5 0.0 1.5 3.0 4.5

Seems quite symmetrical and normal

(b) Paired t test is appropriate.

Σd = 20 and Σd2 = 94

202
94 −
20 20 = 3.895
d= = 1.0 s 2 =
20 19

1.0
Observed t = = 2.27 on 19 d.f.
3.895
20

For one-sided test: 5% point = 1.729, 2.5% point = 2.093 and 1% point
= 2.539

P-value is approx. 0.020

So there is some evidence that the modifications have reduced the


contaminant content.

(c) This t analysis requires normality of the differences and this seems
reasonable from part (a).

14 (i) (a) The least squares estimate of β minimises

q = ∑ i =1 ( yi − β xi ) 2 = ∑ i =1 yi2 − 2β∑ i =1 xi yi + β2 ∑ i =1 xi2 .


n n n n

Differentiating with respect toβ gives

Page 9
Subject 101 (Statistical Modelling) — September 2003 — Examiners’ Report

= 2(β∑ i =1 xi2 − ∑ i =1 xi yi ).
dq n n

Equating to zero gives the least squares estimator as

∑ xiYi
n
βˆ 1 = i =n1 as required.
∑ i=1 xi2
(b) Mean and variance of β̂1 :

E (βˆ 1 ) = ∑ i =1 xi E (Yi | xi ) / ∑ i =1 xi2


n n

= ∑ i =1 xiβ xi / ∑ i =1 xi2 = β
n n

V (βˆ 1 ) = σ 2 ∑ i =1 xi2 /(∑ i =1 xi2 ) 2 = σ 2 / ∑ i =1 xi2 .


n n n

The alternative estimator βˆ 2 = ∑ i =1Yi / ∑ i =1 xi has expectation and


n n
(ii) (a)
variance

E (βˆ 2 ) = ∑ i =1 E (Yi | xi ) / ∑ i =1 xi = ∑ i =1β xi / ∑ i =1 xi = β,


n n n n

(∑ )
2
V (βˆ 2 ) = nσ2 /
n
x
i =1 i
= σ2 /(nx 2 ).

(b) V (βˆ 2 ) ≥ V (βˆ 1 )

⇔ σ 2 / nx 2 ≥ σ 2 / ∑ i =1 xi2
n

⇔ ∑ i =1 xi2 − nx 2 ≥ 0
n

⇔ ∑ i =1 ( xi − x ) 2 ≥ 0
n

∴ The variance of β̂2 is at least as large as the variance of the least


squares estimator β̂1 .

⎛ n ⎞ n n n
(iii) (a) E (βˆ 3 ) = E ⎜ ∑ aiYi ⎟ = ∑ ai E (Yi | xi ) = ∑ aiβ xi = β∑ ai xi
⎜ ⎟
⎝ i =1 ⎠ i =1 i =1 i =1

Page 10
Subject 101 (Statistical Modelling) — September 2003 — Examiners’ Report

n
∴ E (βˆ 3 ) = β , i.e. unbiased, if ∑ ai xi = 1
i =1

⎛ n ⎞ n
V (βˆ 3 ) = V ⎜ ∑ aiYi ⎟ = ∑ ai2σ2
⎜ ⎟
⎝ i =1 ⎠ i =1

n
∑ xiYi n
∑ aiYi ,
xi
(b) βˆ 1 = i =1
n
= where ai = n
, i = 1,…, n.
∑ xi2 i =1
∑ xi2
i =1 i =1
n

n n ∑ xi2
∴ ∑ ai xi = ∑
xi i =1
n
xi = n
=1
i =1 i =1
∑ xi2 ∑ xi2
i =1 i =1

∑ i=1 ai xi = 1 is satisfied.
n
i.e. the condition

n
∑ Yi n
1
βˆ 2 = i =1
n
= ∑ aiYi , where ai = , i = 1, …, n.
nx
∑ xi i =1

i =1

n n
1
∑ ai xi = ∑
nx
∴ xi = =1
i =1 i =1 nx nx

∑ i=1 ai xi = 1 is satisfied.
n
i.e. the condition

∑ i=1 aiYi , the minimum variance


n
(c) Among estimators of the form
unbiased estimator of β is

n ∑ xiYi
βˆ 3 = ∑ aiYi = i =1
n
= βˆ 1
i =1
∑ xi2
i =1

i.e. the least squares estimator.

Page 11
Subject 101 (Statistical Modelling) — September 2003 — Examiners’ Report

15 (i) (a) Let Y = ratio for one week, and Y = eX

P(decrease in one week) = P(Y < 1)

= P(eX < 1) = P(X < 0)

0 − 0.0125
= P(Z < = − 0.227) = 1 − 0.5898 = 0.41
0.055
(b) P(decrease in next two weeks) = (0.41)2 = 0.17

S (2) S (2) S (1)


(c) We require P ( > 1) = P( . > 1)
S (0) S (1) S (0)

= P(Y2 .Y1 > 1) = P(X2 + X1 > 0)

where X2, X1 are independent N(µ,σ2)

∴ X 2 + X1 ~ N (2µ, 2σ2 )

0 − 2(0.0125)
∴ P = P( Z > = −0.321) = 0.63 from tables.
2(0.055)

(d) Extending the method of part (c):

20
∑ X i ~ N (20µ, 20σ2 )
i =1

0 − 20(0.0125)
∴ P = P( Z < = −1.016) = 0.155
20(0.055)

(ii) (a) The ratios are independent and identically distributed lognormal r.v.’s.

This defines a random sample from a lognormal distribution.

(b) For the 10 observed ratios y1, . . . , y10:

Σy = 10.192 ⇒ y = 1.0192

Σy 2 = 10.441562 ⇒ s 2 = 0.005986 ⇒ s = 0.0774

(c) For the method of moments:

solve the following equations for µ and σ2

Page 12
Subject 101 (Statistical Modelling) — September 2003 — Examiners’ Report

1
µ+ σ2
e 2 = 1.0192 (1)

2 2
e2µ+σ (eσ − 1) = 0.005986 (2)

2
(2) ÷ (1) 2 ⇒ eσ − 1 = 0.0057625

∴σ 2 = 0.005746 ∴σ = 0.0758
1
(1) ⇒ µ = log(1.0192) − σ2 = 0.0161
2

[Note: in MME candidates could use σ̂2 =0.005388 with divisor n not
(n−1) to obtain σ = 0.0719 and µ = 0.0164 ]

Page 13

You might also like