100% found this document useful (9 votes)
66 views81 pages

Introduction To Computer Intensive Methods of Data Analysis in Biology Derek A. Roff Download

The document introduces computer-intensive methods for data analysis in biology, focusing on techniques like Bootstrap, Monte Carlo, and Bayesian methods that allow for rigorous analysis without relying on traditional assumptions. It serves as a guide for graduate students and researchers, providing simple descriptions, examples, and software instructions for S-PLUS. The book emphasizes the importance of these methods in situations where traditional statistical approaches may fail.

Uploaded by

pzwwhosrf423
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (9 votes)
66 views81 pages

Introduction To Computer Intensive Methods of Data Analysis in Biology Derek A. Roff Download

The document introduces computer-intensive methods for data analysis in biology, focusing on techniques like Bootstrap, Monte Carlo, and Bayesian methods that allow for rigorous analysis without relying on traditional assumptions. It serves as a guide for graduate students and researchers, providing simple descriptions, examples, and software instructions for S-PLUS. The book emphasizes the importance of these methods in situations where traditional statistical approaches may fail.

Uploaded by

pzwwhosrf423
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Introduction to computer intensive methods of

data analysis in biology Derek A. Roff install


download

https://ebookname.com/product/introduction-to-computer-intensive-
methods-of-data-analysis-in-biology-derek-a-roff/

Get the full ebook with Bonus Features for a Better Reading Experience on ebookname.com
Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Symmetry studies an introduction to the analysis of


structured data in applications 1st Edition Marlos A.
G. Viana

https://ebookname.com/product/symmetry-studies-an-introduction-
to-the-analysis-of-structured-data-in-applications-1st-edition-
marlos-a-g-viana/

Introduction to Statistics and Data Analysis 5th


Edition Roxy Peck

https://ebookname.com/product/introduction-to-statistics-and-
data-analysis-5th-edition-roxy-peck/

Exploratory Data Analysis with MATLAB Second Edition


Chapman Hall CRC Computer Science Data Analysis Wendy
L. Martinez

https://ebookname.com/product/exploratory-data-analysis-with-
matlab-second-edition-chapman-hall-crc-computer-science-data-
analysis-wendy-l-martinez/

The Particulate Air Pollution Controversy A Case Study


and Lessons Learned 1st Edition Robert F. Phalen

https://ebookname.com/product/the-particulate-air-pollution-
controversy-a-case-study-and-lessons-learned-1st-edition-robert-
f-phalen/
Urban Sociology Critical Essays C.G. Pickvance

https://ebookname.com/product/urban-sociology-critical-essays-c-
g-pickvance/

Interacciones Sixth Edition Emily Spinelli

https://ebookname.com/product/interacciones-sixth-edition-emily-
spinelli/

Cost Estimation Handbook 2nd Edition Ausimm

https://ebookname.com/product/cost-estimation-handbook-2nd-
edition-ausimm/

Chemically Modified Electrodes Advances in


Electrochemical Sciences and Engineering 1st Edition
Richard C. Alkire

https://ebookname.com/product/chemically-modified-electrodes-
advances-in-electrochemical-sciences-and-engineering-1st-edition-
richard-c-alkire/

Allosteric receptor modulation in drug targeting 1st


Edition N. G. Bowery

https://ebookname.com/product/allosteric-receptor-modulation-in-
drug-targeting-1st-edition-n-g-bowery/
Handbook of Prebiotics 1st Edition Glenn R. Gibson

https://ebookname.com/product/handbook-of-prebiotics-1st-edition-
glenn-r-gibson/
This page intentionally left blank
Introduction to Computer-Intensive Methods
of Data Analysis in Biology

This guide to the contemporary toolbox of methods for data analysis will
serve graduate students and researchers across the biological sciences. Modern
computational tools, such as Bootstrap, Monte Carlo and Bayesian methods,
mean that data analysis no longer depends on elaborate assumptions designed
to make analytical approaches tractable. These new ‘computer-intensive’
methods are currently not consistently available in statistical software packages
and often require more detailed instructions. The purpose of this book therefore
is to introduce some of the most common of these methods by providing a
relatively simple description of the techniques. Examples of their application are
provided throughout, using real data taken from a wide range of biological
research. A series of software instructions for the statistical software package
S-PLUS are provided along with problems and solutions for each chapter.

DEREK A. ROFF is a Professor in the Department of Biology at the University of


California, Riverside.
Introduction to
Computer-
Intensive Methods
of Data Analysis
in Biology
Derek A. Roff
Department of Biology
University of California
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo

Cambridge University Press


The Edinburgh Building, Cambridge cb2 2ru, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521846288

© Cambridge University Press 2006

This publication is in copyright. Subject to statutory exception and to the provision of


relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.

First published in print format 2006

isbn-13 978-0-511-21980-1 eBook (EBL)


isbn-10 0-511-21980-6 eBook (EBL)

isbn-13 978-0-521-84628-8 hardback


isbn-10 0-521-84628-5 hardback

isbn-13 978-0-521-60865-7 paperback


isbn-10 0-521-60865-1 paperback

Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents

Preface vii

1 An Introduction to Computer-intensive Methods 1

2 Maximum Likelihood 9

3 The Jackknife 42

4 The Bootstrap 66

5 Randomization and Monte Carlo Methods 102

6 Regression Methods 157

7 Bayesian Methods 204

References 233

Appendix A – An Overview of S-PLUS Methods Used in


this Book 242

Appendix B – Brief Description of S-PLUS Subroutines Used


in this Book 249

Appendix C – S-PLUS Codes Cited in Text 253

Appendix D – Solutions to Exercises 316

Index 365

v
Preface

Easy access to computers has created a revolution in the analysis of


biological data. Prior to this easy access even “simple” analyses, such as one-way
analysis of variance, were very time-consuming. On the other hand, statistical
theory became increasingly sophisticated and far outstripped the typical
computational means available. The advent of computers, particularly the
personal computer, and statistical software packages, changed this and made
such approaches generally available.
Much of the development of statistical tools has been premised on a set of
assumptions, designed to make the analytical approaches tractable (e.g., the
assumption of normality, which underlies most parametric methods). We have
now entered an era where we can, in many instances, dispense with such
assumptions and use statistical approaches that are rigorous but largely freed
from the straight-jacket imposed by the relative simplicity of analytical solution.
Such techniques are generally termed “computer-intensive” methods, because
they generally require extensive numerical approaches, practical only with a
computer. At present, these methods are rather spottily available in statistical
software packages and very frequently require more than simple “point and
click” instructions. The purpose of the present book is to introduce some of the
more common methods of computer-intensive methods by providing a relatively
simple mathematical description of the techniques, examples from biology of
their application, and a series of software instructions for one particular
statistical software package (S-PLUS). I have assumed that the reader has at least
an introductory course in statistics and is familiar with techniques such
as analysis of variance, linear and multiple regression, and the 2 test. To relieve
one of the task of typing in the coding provided in an appendix to this book,
I have also made it available on the web at http://www.biology.ucr.edu/people/
faculty/Roff.html.

vii
1

An introduction to
computer-intensive methods

What are computer-intensive data methods?

For the purposes of this book, I define computer-intensive methods as


those that involve an iterative process and hence cannot readily be done except
on a computer. The first case I examine is maximum likelihood estimation, which
forms the basis of most of the parametric statistics taught in elementary
statistical courses, though the derivation of the methods via maximum
likelihood is probably not often given. Least squares estimation, for example,
can be justified by the principle of maximum likelihood. For the simple cases,
such as estimation of the mean, variance, and linear regression analysis,
analytical solutions can be obtained, but in more complex cases, such as
parameter estimation in nonlinear regression analysis, whereas maximum
likelihood can be used to define the appropriate parameters, the solution can
only be obtained by numerical methods. Most computer statistical packages now
have the option to fit models by maximum likelihood but they typically require
one to supply the model (logistic regression is a notable exception).
The other methods discussed in this book may have an equally long history as
that of maximum likelihood, but none have been so widely applied as that of
maximum likelihood, mostly because, without the aid of computers, the
methods are too time-consuming. Even with the aid of a fast computer, the
implementation of a computer-intensive method can chew up hours, or even
days, of computing time. It is, therefore, imperative that the appropriate
technique be selected. Computer-intensive methods are not panaceas: the English
adage “you can’t make a silk purse out of a sow’s ear” applies equally well to
statistical analysis. What computer-intensive methods allow one to do is to apply
a statistical analysis in situations where the more “traditional” methods fail. It is
important to remember that, in any investigation, great efforts should be put

1
2 An introduction to computer-intensive methods

into making the experimental design amenable to traditional methods, as these


have both well-understood statistical properties and are easily carried out, given
the available statistical programs. There will, however, inevitably be circum-
stances in which the assumptions of these methods cannot be met. In the next
section, I give several examples that illustrate the utility of computer-intensive
methods discussed in this book. Table 1.1 provides an overview of the methods
and comments on their limitations.

Why computer-intensive methods?

A common technique for examining the relationship between some


response (dependent) variable and one or more predictor (independent) variables
is linear and multiple regression. So long as the relationship is linear (and
satisfies a few other criteria to which I shall return) this approach is appropriate.
But suppose one is faced with the relationship shown in Figure 1.1, that is highly
nonlinear and cannot be transformed into a linear form or fitted by a polynomial
function. The fecundity function shown in Figure 1.1 is typical for many animal
species and can be represented by the four parameter (M,k,t0,b) model

F ðxÞ ¼ Mð1  ekðxt0 Þ Þebx ð1:1Þ

Using the principle of maximum likelihood (Chapter 2), it can readily be shown
that the “best” estimates of the four parameters are those that minimize the
residual sums of squares. However, locating the appropriate set of parameter
values cannot be done analytically but can be done numerically, for which most
statistical packages supply a protocol (see caption to Figure 1.1 for S-PLUS coding).
In some cases, there may be no “simple” function that adequately describes
the data. Even in the above case, the equation does not immediately “spring to
mind” when viewing the observations. An alternative approach to curve fitting
for such circumstances is the use of local smoothing functions, described in
Chapter 6. The method adopted here is to do a piece-wise fit through the data,
keeping the fitted curve continuous and relatively smooth. Two such fits are
shown in Figure 1.2 for the Drosophila fecundity data. The loess fit is less rugged
than the cubic spline fit and tends to de-emphasize the fecundity at the early
ages. On the other hand, the cubic spline tends to “over-fit” across the middle and
later ages. Nevertheless, in the absence of a suitable function, these approaches
can prove very useful in describing the shape of a curve or surface. Further,
it is possible to use these methods in hypothesis testing, which permits one
to explore how complex a curve or a surface must be in order to adequately
describe the data.
Why computer-intensive methods? 3

Table 1.1 An overview of the techniques discussed in this book

Parameter Hypothesis
Method Chapter estimation? testing? Limitations

Maximum 2 Yes Yes Assumes a particular statistical


likelihood model and, generally, large samples
Jackknife 3 Yes Yes The statistical properties cannot
generally be derived from theory
and the utility of the method
should be checked by simulation
for each unique use
Bootstrap 4 Yes Possiblea The statistical properties cannot
generally be derived from theory
and the utility of the method
should be checked by simulation
for each unique use. Very
computer-intensive.
Randomization 5 Possible Yes Assumes difference in only a
single parameter. Complex designs
may not be amenable to “exact”
randomization tests
Monte Carlo 5 Possible Yes Tests are usually specific to a
methods particular problem. There may
be considerable debate over the
test construction.
Cross-validation 6 Yes Yes Generally restricted to regression
problems. Primarily a means of
distinguishing among models.
Local smoothing 6 Yes Yes Does not produce easily
functions and interpretable function coefficients.
generalized Visual interpretation difficult
additive with more than two predictor
models variables
Tree models 6 Yes Yes Can handle many predictor
variables and complex interactions
but assumes binary splits.
Bayesian methods 7 Yes Yes Assumes a prior probability
distribution and is frequently
specific to a particular problem

a
“Possible”¼Can be done but not ideal for this purpose.
4 An introduction to computer-intensive methods
70

Fitted curve
Observations
60
Fecundity

50

40

30
2 4 6 8 10 12 14 16 18 20
Age (days)

Figure 1.1 Fecundity as a function of age in Drosophila melanogaster with


a maximum likelihood fit of the equation F(x)¼M(1ek(xt0))ebx. Data are
from McMillan et al. (1970).

Age (x) 3 4 5 6 7 8 9 10 13 14 15 16 17 18

F 32.1 51.8 66 58 60.5 57.2 49.1 49.3 51.4 45.7 44.4 35.1 35.2 33.6

S-PLUS coding for fit:


# Data contained in data file D
# Initialise parameter values
Thetas <- c(M¼1, k¼1, t0¼1, b¼.04)
# Fit model
Model <- nls(D[,2]~M*(1-exp(-k*(D[,1]-t0)))*exp(-b*D[,1]), start¼Thetas)
# Print results
summary(Model)
OUTPUT
Parameters:
Value Std. Error t value
M 82.9723000 7.52193000 11.03070
k 0.9960840 0.36527300 2.72696
t0 2.4179600 0.22578200 10.70930
b 0.0472321 0.00749811 6.29920
Why computer-intensive methods? 5
70

Observations
Loess fit
Cubic spline fit
60
Fecundity

50

40

30
2 4 6 8 10 12 14 16 18 20
Age (days)

Figure 1.2 Fecundity as a function of age in Drosophila melanogaster with two local
smoothing functions. Data given in Figure 1.1.
S-PLUS coding to produce fits:

# Data contained in file D. First plot observations # Plot points


plot (D[,1], D[,2])
Loess.model <- loess(D[,2]~D[,1], span¼1, degree¼2) # Fit loess model
# Calculate predicted curve for Loess model
x.limits <- seq(min(D[,1]), max(D[,1]), length¼50 # Set range of x
P.Loess <- predict.loess(Loess.model, x.limits, se.fit¼T) # Prediction
lines(x.limits, D.INT$fit) # Plot loess prediction
Cubic.spline <- smooth.spline(D[,1], D[,2]) # Fit cubic spline model
lines(Cubic.spline) # Plot cubic spline curve

An important parameter in evolutionary and ecological studies is the rate of


increase of a population, denoted by the letter r. In an age-structured population,
the value of r can be estimated from the Euler equation
X
1
1¼ erx lx mx ð1:2Þ
x¼0

where x is age, lx is the probability of survival to age x and mx is the number of


female births at age x. Given vectors of survival and reproduction, the above
equation can be solved numerically and hence r calculated. But having an
estimate of a parameter is generally not very useful without also an estimate of
6 An introduction to computer-intensive methods

the variation about the estimate, such as the 95% confidence interval. There are
two computer-intensive solutions to this problem, the jackknife (Chapter 3) and
the bootstrap (Chapter 4). The jackknife involves the sequential deletion
of a single observation from the data set (a single animal in this case) giving
n (¼ number of original observations) data sets of n1 observations whereas
the bootstrap consists of generating many data sets by random selection (with
replacement) from the original data set. For each data set, the value of r is
calculated; from this set of values, each technique is able to extract both an
estimate of r and an estimate of the desired confidence interval.
Perhaps one of the most important computer-intensive methods is that of
hypothesis testing using randomization, discussed in Chapter 5. This method
can replace the standard tests, such as the 2 contingency test, when the
assumptions of the test are not met. The basic idea of randomization testing is
to randomly assign the observations to the “treatment” groups and calculate
the test statistic: this process is repeated many (typically thousands) times
and the probability under the null hypothesis of “no difference” estimated by the
proportion of times the test statistic from the randomized data sets exceeded the
test statistic from the observed data set. To illustrate the process, I shall relate an
investigation into genetic variation among populations of shad, a commercially
important fish species.
To investigate geographic variation among populations of shad, data on
mitochondrial DNA variation were collected from 244 fish distributed over
14 rivers. This sample size represented, for the time, a very significant output
of effort. Ten mitochondrial haplotypes were identified with 62% being of
a single type. The result was that almost all cells had less than 5 data points
(of the 140 cells, 66% had expected values less than 1.0 and only 9% had expected
values greater than 5). Following Cochran’s rules for the 2 test, it was necessary
to combine cells. This meant combining the genotypes into two classes, the most
common one and all others. The calculated 2 for the combined data set was
22.96, which just exceeded the critical value (22.36) at the 5% level. The estimated
value of 2 for the uncombined data was 236.5, which was highly significant
(P < 0.001) based on the 2 with 117 degrees of freedom. However, because of the
very low frequencies within many cells, this result was suspect. Rather than
combining cells and thus losing information, we (Roff and Bentzen 1989) used
randomization (Chapter 5) to test if the observed 2 value was significantly larger
than the expected value under the null hypothesis of homogeneity among the
rivers. This analysis showed that the probability of obtaining a 2 value as large
or larger than that observed for the ungrouped data was less than one in a
thousand. Thus, rather than being merely marginally significant the variation
among rivers was highly significant.
Why S-PLUS? 7

Most of the methods described in this book follow the frequentist school in
asking “What is the probability of observing the set of n data x1,x2, . . ., xn given
the set of k parameters 1,2,. . ., k?” In Chapter 7 this position is reversed by the
Bayesian perspective in which the question is asked “Given the set of n data
x1, x2,. . . , xn, what is the probability of the set of k parameters 1,2,. . ., k?” This
“reversal” of perspective is particularly important when management decisions
are required. For example, suppose we wish to analyze the effect of a harvesting
strategy on population growth: in this case the question we wish to ask is “Given
some observed harvest, say x, what is the probability that the population rate of
increase, say , is less than 1 (i.e., the population is declining)?” If this probability
is high then it may be necessary to reduce the harvest rate. In Bayesian analysis,
the primary focus is frequently on the probability statement about the parameter
value. It can, however, also be used, as in the case of the James–Stein estimator, to
improve on estimates. Bayesian analysis generally requires a computer-intensive
approach to estimate the posterior distribution.

Why S-PLUS?

There are now numerous computer packages available for the statistical
analysis of data, making available an array of techniques hitherto not possible
except in some very particular circumstances. Many packages have some
computer-intensive methods available, but most lack flexibility and hence are
limited in use. Of the common packages, SAS and S-PLUS possess the breadth of
programming capabilities necessary to do the analyses described in this book.
I chose S-PLUS for three reasons. First, the language is structurally similar
to programming languages with which the reader may already be familiar
(e.g., BASIC and FORTRAN. It differs from these two in being object oriented).
In writing the coding, I have attempted to keep a structure that could be
transported to another language: this has meant in some cases making more use
of looping than might be necessary in S-PLUS. While this increases the run time,
I believe that it makes the coding more readable, an advantage that outweighs
the minor increase in computing time. The second reason for selecting S-PLUS is
that there is a version in the public domain, known as R. To quote the web site
(http://www.r-project.org/), “R is a language and environment for statistical com-
puting and graphics. It is a GNU project which is similar to the S language and
environment which was developed at Bell Laboratories (formerly AT&T, now
Lucent Technologies) by John Chambers and colleagues. R can be considered as a
different implementation of S. There are some important differences, but much
code written for S runs unaltered under R.” The programs written in this book
will, with few exceptions, run under R. The user interface is definitely better in
8 An introduction to computer-intensive methods

S-PLUS than R. My third reason for selecting S-PLUS is that students, at present,
can obtain a free version for a limited period at http://elms03.e-academy.com/
splus/.

Further reading

Although S-PLUS has a fairly steep learning curve there are several excellent text books
available, my recommendations being:
Spector, P. (1994). An Introduction to S and S-PLUS. Belmont, California: Duxbury Press.
Krause, A. and Olson, M. (2002). The Basics of S-PLUS. New York: Springer.
Crawley, M. J. (2002). Statistical Computing: An Introduction to Data Analysis using S-PLUS.
UK: Wiley and Sons.
Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S. New York:
Springer.
An overview of the language with respect to the programs used in this book is
presented in the appendices.
2

Maximum likelihood

Introduction

Suppose that we have a model with a single parameter, , that predicts


the outcome of an event that has some numerical value y. Further, suppose we
have two choices for the parameter value, say 1 and 2, where 1 predicts
that the numerical value of y will occur with a probability p1 and 2 predicts that
the numerical value of y w‘ill occur with a probability p2. Which of the two
choices of  is the better estimate of the true value of  ? It seems reasonable to
suppose that the parameter value that gave the highest probability of actually
observing what was observed would be the one that is also closer to the true value
of . For example, if p1 equals 0.9 and p2 equals 0.1, then we would select 1 over
2, because the model with 2 predicts that one is unlikely to observe y, whereas
the model with 1 predicts that one is quite likely to observe y. We can extend this
idea to many values of  by writing our predictive model as a function of the
parameter values, ’(i)¼pi, where i designates particular values of . More
generally, we can dispense with the subscript and write ’()¼p, thereby allowing
 to take on any value. By the principle of maximum likelihood we select the
value of  that has the highest associated probability, p.
The important element of maximum likelihood estimation (often contracted
to MLE) is that there is a definable probability function that can be used to
generate the likelihood of the observed event. The most frequently used prob-
ability functions are the normal distribution and the binomial distribution.
There are three areas to be considered:

(1) Point estimation. Given some statistical model with k unknown


parameters 1,2, . . . ,k how do we use MLE to obtain estimates of
these parameters, denoted as ^1 , ^2 , . . . , ^k ?
(2) Interval estimation. Having the set of estimates ^1 , ^2 , . . . , ^k is only
marginally useful, because we have no idea whether the estimates are

9
10 Maximum likelihood

likely to be close to or far from the true values. In conjunction with point
estimation we must, therefore, also estimate a confidence region for the
estimates, typically 95%.
(3) Hypothesis testing. In many instances, we are interested in testing
hypotheses about the parameter values: for example, given two data sets
we could test the hypothesis that they have a common mean. Maximum
likelihood provides a mechanism to both compare different parameter
values and to compare different statistical models.

Point estimation

Why the mean?


The underlying distribution of much of statistical estimation is the
normal distribution (Figure 2.1). Under this distribution, the probability of
observing a value, say x, is given by

1
’ðxÞ ¼ pffiffiffiffiffiffi e2ð  Þ
1 x 2
ð2:1Þ
 2

where ’(x) is called the probability density function of x. This function is


symmetrical and characterized by two parameters  and . Anyone who has had a
first course in statistics will recognize these two as the “mean” and the “standard
deviation,” respectively. The mean is a measure of central tendency, and the
standard deviation a measure of spread of the distribution (Figure 2.1). We
typically estimate the parameter  as the arithmetic average

1X n
^ ¼ xi ð2:2Þ
n i¼1

where n is the number of observations and xi is the ith observation. The “hat”
over  indicates that this is an estimate of the true value of : this is a general
symbol for the estimate of a parameter, but in the case of the average, we
frequently use the symbol x .
There are actually three measures of central tendency, the arithmetic average,
the mode (the most commonly occurring value), and the median (the value
that divides the sample into two equal portions). Why should we use the
arithmetic average as the estimate of ? The use of the arithmetic average
as the preferred estimate of  can be justified by the fact that it is the
maximum likelihood estimate of . Suppose we have a sample of n observations
Point estimation 11

Frequency, ϕ(x)

Figure 2.1 The normal distribution with ¼0 and  ¼1.

x1,x2,x3, . . .,xi,. . .,xn: the probability of observing this sequence assuming a


normal probability density function is

Y
n
L ¼ ’ðx1 Þ’ðx2 Þ’ðx3 Þ . . . ’ðxi Þ . . . ’ðxn Þ ¼ ’ðxi Þ ð2:3Þ
i¼1

where L is the likelihood of observing this particular sequence. We could consider


all possible arrangements of the set of observations, but, as will become obvious,
this does not change the final answer; so for notational convenience, we shall
ignore this minor complication. Writing out the probability density function in
full we have

Y
n Y
n
1
pffiffiffiffiffiffi e2ð  Þ
1 xi  2
L¼ ’ðxi Þ ¼ ð2:4Þ
i¼1 i¼1  2

Now, according to the maximum likelihood principle we should choose  such


that the likelihood, L, is maximized. To find this value, we could simply vary ^
and calculate the likelihood, selecting that value at which L is a maximum; how
close we can get to the “best” value depends only upon the step size of our
iteration. In many cases, and this is the reason for the computer-intensive nature
of maximum likelihood estimators, this numerical approach is the only one
available. However, in the present case, we can arrive at an exact solution
by means of the calculus. Recall that to find the maximum or minimum of
12 Maximum likelihood

a function, we set the derivative of the function equal to zero. It is very


inconvenient to work with derivatives of multiplicative functions. It is much
easier if we take the logarithm of the likelihood. This does not change the result,
because the turning point of the log transform is exactly the same as the
untransformed value, so, taking the natural logs we have
  X
1 n
1 xi  2
lnðLÞ ¼ n ln pffiffiffiffiffiffi  ð2:5Þ
 2 i¼1
2 

Differentiating

Xn    Xn
d lnðLÞ 1 2 1
¼0þ ðxi  Þ ¼ ðx  Þ
2 i
ð2:6Þ
d i¼1
2  2
i¼1


Setting the derivative equal to zero

d lnðLÞ 1 X n
¼0 when ðx  Þ ¼ 0 ð2:7Þ
d  2 i¼1

After some simple algebraic rearrangement, we arrive at

1X n
¼ xi ð2:8Þ
n i¼1

which is the arithmetic average or mean (note that  is irrelevant). At this point
you may be concerned that we have  exactly equal to the arithmetic average,
whereas previously we asserted that it was only an estimate of  (i.e., ). ^ The
reason for the discrepancy is that we have treated the likelihood function as if
it were exactly an algebraic relationship, whereas in any finite sample, the actual
probability of the observed sequence will not be invariably maximal when  is set
equal to the arithmetic average. Suppose we take the extreme lower limit of a
sample, that is, a sample of one; according to the above derivation the parameter
 (mean) is equal to the sample value, which is clearly nonsense, in general.
Consider what happens as the number of observations in the sample increases.
It is intuitively obvious that as n becomes larger and larger, the difference
between the arithmetic average and  becomes smaller and smaller, and
in the limit, when n equals infinity, the arithmetic average is equal to 
(i.e., ^ !  as n ! 1). This resolves the problem: the derivation above implicitly
assumes that the sample is very large, and, hence, for small samples the
arithmetic average is only an estimate of  (^ or x , depending upon your
symbolic preference). This is a very important result, because it means that we
cannot ignore the size of the sample. We shall return to this issue in the next
section, “Interval estimation.”
Point estimation 13

Nuisance parameters don’t always disappear


In the previous example, there were two parameters,  and , and we
were interested only in . The parameter  in this case is called a nuisance
parameter, because it is of unknown quantity and could make the estimate
uncertain if it does not drop out in the analysis. For the estimation of the mean,
the nuisance parameter does drop out and is thus irrelevant in this instance.
However, this is frequently not the case and we can be left with a joint estimation
problem. Such is the problem when we use maximum likelihood to derive the
best estimator for the second parameter of the normal distribution, the standard
deviation, , or its square, the variance ( 2).
Recall that the log-likelihood, ln(L) of the normal is
  X
1 n
1 xi  2
n ln pffiffiffiffiffiffi 
 2 i¼1
2 

(Eqn. (2.5)). Expanding this to make the differentiation more obvious gives

pffiffiffiffiffiffi Xn
1 xi  2
lnðLÞ ¼ n lnðÞ  n lnð 2Þ  ð2:9Þ
i¼1
2 

As before, we differentiate ln(L) and set the result to zero


!
d lnðLÞ n X n
2ðxi  Þ2 1 Xn
ðxi  Þ2
¼ þ ¼ n þ
d  i¼1 2 3  2
Pn
i¼1 ð2:10Þ
ðx  Þ2
¼ 0 when i¼1 2 ¼n

Upon rearrangement we have
1X n
2 ¼ ðxi  Þ2 ð2:11Þ
n i¼1

which is the readily recognized formula for the variance. As previously, the left-
hand side should be indicated as an estimator, ^ 2 , as it approaches the true value
only as n becomes large. Annoyingly, to estimate the variance we have to know
the exact value of the mean, which we do not. Thus, in this case, the nuisance
parameter, , inconveniently remains in the estimation formula for . What can
we do? One possibility is to substitute our estimate of the mean, giving
P
^ 2  ð1=nÞ ni¼1 ðxi  Þ
^ 2 . Because we know that, unless n is infinitely large, ^ is
not exactly equal to , I have denoted this estimate as an approximation. In fact,
it is a biased estimate, which could be problematical for small samples.
Fortunately, this bias can be readily removed by rewriting the formula as

1 X n
^ 2 ¼ ^ 2
ðxi  Þ ð2:12Þ
n  1 i¼1
14 Maximum likelihood

In many cases, the log-likelihood functions cannot be resolved into such simple,
single-parameter formulae and then one must use numerical methods to locate
the combination of point estimates that maximize the likelihood, and hence are
the maximum likelihood estimators.
These two examples show that our use of the “standard” formulae for the
mean and standard deviation are appropriate from the perspective of maximum
likelihood when the distribution is normal. If the distribution is not normal, then we
can still estimate the mean and standard deviation using these formulae, but
we have no guarantee that they correctly estimate any particular parameter in
the true probability density function.

Why we use least squares so much


Throughout a first course in statistics one comes across the use of “least-
squares” estimation. It is, for example, the foundation of estimation in linear
regression and analysis of variance. As with the commonly used estimators of
mean and variance of a normal distribution, we can justify the use of least-
squares by reference to maximum likelihood. To illustrate this, consider the
simple linear regression equation (Figure 2.2)

y ¼ 1 þ 2 x þ " ð2:13Þ

The parameters 1 and 2 are the intercept and slope, respectively. They are
frequently denoted as  and , with estimated values of a and b, respectively.

Figure 2.2 A regression line. The line is described by the equation y ¼ 1 þ 2 x.


At each point along the line the data are distributed normally as Nð1 þ 2 xi , Þ.
Point estimation 15

To avoid the proliferation of confusing symbols (particularly as  and  are also


used in context of type 1 and 2 errors), I shall use the symbol “” as the general
symbol for a parameter to be estimated, noting if in specific cases the parameter
typically has another symbol. The term " refers to the essential assumption of
linear regression, namely that the error about the line is normally distributed
with a mean of zero (i.e., ¼0) and an unspecified standard deviation (). I shall
denote the normal distribution with mean  and standard deviation  as N(, ).
Thus, in the present example, we would say that " is distributed as N(0, ). Using
this notation, we can write that y is distributed as N(1 þ2x,), which is to say
that y is a normally distributed variable with mean 1 þ2x (the value of the line)
and standard deviation  (Figure 2.2). We can now assign a probability to
observing some particular value yi:
y ½ 2
1 1 þ2 xi 
’ðyi Þ ¼ pffiffiffiffiffiffi e2
1 i
 ð2:14Þ
 2

Hence the probability or likelihood of observing some sequence y1,y2,y3, . . . ,


yi, . . . ,yn is

Y
n Y
n y ½ 2
1 1 þ2 xi 
pffiffiffiffiffiffi e2
1 i
L¼ ’ðyi Þ ¼  ð2:15Þ
i¼1 i¼1  2

As before, and as in general, it is more convenient to work with the natural


logarithms

pffiffiffiffiffiffi 1 Xn
lnðLÞ ¼ n lnð 2Þ  2 ð yi  ½1 þ 2 xi Þ2 ð2:16Þ
2 i¼1

According to the principle of maximum likelihood, the best estimates of 1 and 2


are those that maximize the log-likelihood, which will be those values at which
P
the summation SS ¼ ni¼1 ðyi  ½1 þ 2 xi Þ2 is minimized. This, of course, is the
least squares procedure. Note that, as with the estimation of the mean of the
normal distribution, the variance about the regression line does not enter into
the estimation of the slope (1) or intercept (2) of the line. For the sake of
completeness, let us calculate the least squares estimation equations for both
parameters. We have to carry out two differentiations, one with respect to 1
and another with respect to 2

dSS Xn
¼ 2 ðyi  ½1 þ 2 xi Þ
d1 i¼1
ð2:17Þ
dSS Xn
¼ 2 ðyi  ½1 þ 2 xi Þxi
d2 i¼1
16 Maximum likelihood
Pn
Setting these to zero, and noting that i¼1 1 ¼ n1 , we have the simultaneous
equations
X
n X
n
yi ¼ 1 n þ 2 xi
i¼1 i¼1
ð2:18Þ
X
n X
n X
n
xi yi ¼ 1 xi þ  2 x2i
i¼1 i¼1 i¼1

Pn
Multiplying the first by i¼1 xi , the second by n and subtracting gives the
estimate for 2
Pn
i¼1 ðxi  x Þðyi  y Þ
^2 ¼ Pn ð2:19Þ
i¼1 ðxi  x  Þ2

Note that the estimate is a function of the arithmetic means of x and y (for
convenience I have used their usual bar notation, but could have written them as
^ x and ^ y , respectively). We can use the two simultaneous equations to estimate
1 or more simply make use of the relationship 1 ¼ y  2 x . These equations
point out the fact that, because they use the same data, the estimates of the two
parameters are not independent of each other. This is a general statement about
multiple estimates from the same statistical model and usually poses no
problem. But suppose we estimate the same regression, say fecundity on body
weight, from a number of different populations and then calculate the
correlation between ^1 and ^2 , on finding a significant correlation between the
two, we might be persuaded to interpret this to be due to some biologically
meaningful cause, when it probably arises as a statistical artifact.

More on least squares


The maximum likelihood approach to the estimation of the parameters
of the simple linear regression indicates that the least squares approach is
justified. Further, in this instance, exact estimation equations for the two
parameters can be found. Here, I present a more complex example in which there
are several least squares solutions, depending upon the assumption of the error
structure.
Many organisms, particularly fish, grow continuously throughout their lives
but the rate of growth slows with age (Figure 2.3). A growth curve that captures
this behavior is known as the von Bertalanffy growth model (though the
physiological basis von Bertalanffy actually used to derive the curve is fallacious).
Ignoring any source of error or variation, the length at some age t is given by
the relationship

lt ¼ 1 ð1  e2 ðt3 Þ Þ ð2:20Þ


Point estimation 17
70

60

50
Length (cm)

40

30

20

10
0 2 4 6 8 10 12 14
Age (years)

Figure 2.3 Plot of average length at each age for female Pacific hake, with the
estimated von Bertalanffy curve (Data from Kimura 1980).

Age 1.0 2.0 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3 11.3 12.3 13.3

Length 15.4 28.0 41.2 46.2 48.2 50.3 51.8 54.3 57.0 58.9 59.0 60.9 61.8

where 1 is the asymptotic length (generally denoted as L1), 2 is the rate of


growth (generally denoted as k), 3 is the hypothetical age at which the length is
zero (generally denoted as t0). Now the above equation can also be written in
terms of the length at two consecutive ages, say t and tþ1

ltþ1 ¼ 1 ð1  e2 Þ þ e2 lt ð2:21Þ

which suggests a regression method of estimating 1 and 2. The natural


logarithm of the slope of the regression of lt+1 on lt is an estimate of 2 and
the intercept is an estimate of 1 ð1  e2 Þ. This method is known as the
Ford–Walford method and provides excellent initial estimates of the parameter
values (except for 3). The statistical problem with this method is that the length
at each age successively changes from being the dependent variable (lt+1) to being
the independent variable (lt), which invalidates the basic assumption of linear
regression.
18 Maximum likelihood

How can we use the maximum likelihood method to obtain parameter


estimates of the von Bertalanffy equation (Eq. (2.20))? To use this method we have
to decide how to incorporate ", the normally distributed error term. The simplest
method is to tack it onto the equation in the same manner as in the linear
regression model (for alternate models see Kimura 1980)

lt ¼ 1 ð1  e2 ðt3 Þ Þ þ " ð2:22Þ

Now following the method discussed in the previous section, the log-likelihood
function is

pffiffiffiffiffiffi 1 Xn
lnðLÞ ¼ n lnð 2Þ  2 ðlt  1 ½1  e2 ðt3 Þ Þ2 ð2:23Þ
2 t¼1

It should be readily apparent that the log-likelihood function will be maximized


when, as with the linear regression model, the sum of squares between the
observed (lt) and predicted ð1 ½1  e2 ðt3 Þ Þ values are minimized, i.e., the least
squares solution. Assuming that the variance,  2, is constant, there are two
scenarios to be considered. First, as assumed in the linear regression model, we
can posit that the distribution of individual observations, lt, is N(0,). Second, we
can focus upon the mean values and posit that the distribution of mean lengths
at a given age is N(0,). With either scenario, we estimate the parameters using
least squares, but in the first case, we use the individual values, whereas in the
second we use the means. In the above derivation, the variance dropped out:
we obtain the MLE of  2 in the usual manner by taking the partial derivative of
the log-likelihood function (Eq. (2.23)) with respect to  2 and setting this equal to
zero. It is left to the reader to verify that

n 
X 2 
^ ^
^ 2 ¼ lt  ^1 ½1  e2 ðt3 Þ  n ð2:24Þ
t¼1

i.e., the mean residual sum of squares of the best fitting model (the MLE).
However, as with the variance estimate of the normal distribution previously
discussed, the above estimate is biased by the use of the parameter estimates
rather than their true values. To remove this bias, we divide not by n but by
n minus the number of estimated parameters (i.e., n3 in the present case).
Unlike the linear regression model, there is no exact analytical solution for
the above model and hence one must resort to numerical methods. Fortunately,
virtually every statistical package has a nonlinear fitting routine using least
squares. The curve shown in Figure 2.3 was obtained using the “regression
wizard” in SigmaPlot. The routine failed to converge to a solution when I used
0
Eq. (2.20), but changing the equation to lt ¼ 1 ð1  e2 tþ3 Þ, which in no way
Point estimation 19

alters the structure of the equation (the parameter 30 is merely the product
23), produced the fit shown in Figure 2.3. A similar failure occurred with the
nonlinear routine in SYSTAT but not in S-PLUS (the equation can be fitted using
dialog boxes or the command: nls(LENGTH~b1*(1-exp(-b2*(AGE-b3))),data=D,
start=list(b1=50,b2=.1,b3=.1))).
The parameter estimates from S-PLUS when
using the altered equation were identical to those from SigmaPlot, indicating
that when convergence is obtained both packages do get the same solution,
which is not guaranteed, although solutions should always be similar. I could not
get convergence in the SYSTAT routine unless I deleted 3 entirely from the
equation. Because 3 is a very small value (0.057), its deletion from the equation
changes little, but the message to draw from the three analyses is that different
statistical packages perform differently (and I do not mean to imply that the
ranking in performance can be judged from a single example) and that small
changes to the equation to be fitted can make big differences in the ability of the
routine to converge to a solution.

Generalizing the MLE approach in relation to least squares estimation


Let y ¼ ’ð1 , 2 , . . . , k , xÞ be a model comprising k parameters (e.g., linear
regression has two, the von Bertalanffy function has three) and an independent
variable, x. Assume that the error term, ", is N(0,) and the observed value yi
can be predicted from

yi ¼ ’ð1 , 2 , . . . , k , xi Þ þ " ð2:25Þ

The log-likelihood function is thus

pffiffiffiffiffiffi 1 Xn
lnðLÞ ¼ n lnð 2Þ  2 ðyi  ’ð1 , 2 , . . . , k ,xi ÞÞ2 ð2:26Þ
2 i¼1

The maximum likelihood estimates of the k parameters are obtained by


minimizing the sum of the squared difference between the observed and
predicted values, i.e.,
X
n
Minimize ðObserved value  Predicted valueÞ2 ð2:27Þ
i¼1

(see, for example, exercise 2.5). But remember, you are making an assumption
about how the variability about the predicted value is distributed! Under the
assumption above, the residuals should be normally distributed with mean
zero and variance  2. Checking this assumption is standard practice in linear
regression analysis and the same methods apply to the general case.
20 Maximum likelihood

Leaving normality
The fundamental assumption of the maximum likelihood approach is
that the data are distributed according to a known distribution. This need not be
the normal distribution. Consider the situation in which there are two outcomes:
for example, in a mate selection experiment, a female might be presented with
two choices (e.g., in acoustically orienting animals, two different songs might be
played and the female choice recorded), or in a genetical study there might
be two alleles at a particular locus of interest. Let the probability of the first
outcome be p, in which case the probability of the second is 1  p. Given
two outcomes, the resulting distribution can be described by the binomial
probability function: the probability or likelihood of observing the first outcome
r times in n trials is thus

n!
L¼ pr ð1  pÞnr ð2:28Þ
r!ðn  rÞ!

Taking the natural logarithms gives a more easily handled equation


 
n!
lnðLÞ ¼ ln þ r lnðpÞ þ ðn  rÞ lnð1  pÞ ð2:29Þ
r!ðn  rÞ!

To find the maximum likelihood estimate of p we differentiate ln(L) with respect


to p and set the result to zero:

d lnðLÞ r n  r
¼ 
dp p 1p

d lnðLÞ r nr ð2:30Þ


¼ 0 when ¼
dp p 1p
r
i:e:, p^ ¼
n

which is the intuitively obvious estimate. Note that I have substituted p^ for
p in the final equation; it is very important to remember that our estimate
approaches the true value only as the sample size increases.

Multiple likelihoods: estimating heritability using the binomial and MLE


Heritability is a parameter that defines the degree to which offspring
resemble their parents due to the additive effect of genes. It is a parameter that
can be used to predict how much a quantitatively varying trait such as body
weight changes when selection is applied to a population. The predictive
equation, known as the breeder’s equation, is R¼S, where R is the response
to selection (the difference between the means of the population in each
Point estimation 21

Frequency

T
Liability

Figure 2.4 Illustration of the threshold model. The underlying trait, called the
liability, is normally distributed. Individuals above the threshold, T, display one
morph, whereas individuals below the threshold display the alternate.

generation),  is the heritability (invariably designated as h2: despite this, for


consistency, I shall still use ), and S is the selection differential (the difference
between the population and parental means). Heritability can be estimated after
one generation of selection by rearranging the breeder’s equation to give  ¼R/S.
In general, a single generation of selection is insufficient to produce a reliable
estimate, but this problem is ignored here for the purpose of clarity.
There is a class of traits, known as threshold traits, which are peculiar in that
they are manifested as dichotomous traits, but breeding experiments show them
to be determined by the action of many genes. Examples of threshold traits
include twins versus singletons in sheep, certain diseases such as schizophrenia,
the phenomenon of “jacking” in salmon (early maturation at a very reduced size),
wing and horn dimorphism in certain insect species. To account for the
quantitative inheritance pattern in these traits, a threshold model has been
proposed: according to this model there is a continuously distributed trait
termed the liability and a threshold of sensitivity. If the value of the liability lies
above the threshold, one outcome results, whereas if the value lies below the
threshold, the alternate outcome results (Figure 2.4). The liability is assumed to
22 Maximum likelihood

be normally distributed and thus the proportion, p, lying above the threshold
is given by
ð1
1
e2ð  Þ dx
1 x2
p ¼ pffiffiffiffiffiffi ð2:31Þ
 2 T

where the liability is distributed as N(,) and T is the threshold value. Without
loss of generality, we can rescale the above by setting  ¼1 and T¼0, leaving us
only a single parameter, the mean liability () to calculate. If p¼0.5 the mean
liability equals 0, whereas if p¼0.8 the mean liability is equal to 0.84. Suppose
we subject a threshold trait to selection by taking as parents only those of a
designated morph (e.g., only winged individuals in a wing-dimorphic species). We
can arbitrarily designate these individuals as lying above the threshold, in which
case their mean liability is the mean of a truncated normal distribution and is

e2
1 2
’ðÞ
 þ pffiffiffiffiffiffi ¼  þ
p 2 p

Letting the number of the designated morph in the parental generation be r0


and the total sample size be n0, we can write the likelihood using the binomial
formula given in Eq. (2.28), which I shall designate as L0.
Using the breeder’s equation we can predict the mean liability in the offspring
generation as
 
’ð0 Þ
1 ¼ 0 ð1  Þ þ  0 þ ð2:32Þ
p

and hence the predicted proportion of the designated morph. From the selection
experiment, we observe the proportion of the designated morph in each
generation but directly observe neither the mean liability nor the parameter
we wish to estimate, the heritability of the liability. We thus have two parameters
to estimate, the one of interest (heritability) and a “nuisance” parameter (mean
liability of the initial population). As in the parental generation, the likelihood of
obtaining the observed number, r1, of the designated morph from the observed
offspring sample, n1, is given by the binomial example (Eq. (2.28)). Designate this
likelihood L1. We have two likelihoods, L0 and L1. The overall likelihood, L01, is
simply the product of the two likelihoods (sum of the log-likelihoods). Therefore,
we find the combination of 0 and  that maximize ln(L01).
Suppose in the first sample we observe 50 of the designated morph out
of a total of 100 individuals. We then use the designated morph as parents for the
next generation obtaining 68 offspring of the designated morph out of a total
sample of 100 offspring. Thus r0 ¼50, n0 ¼100, r1 ¼68, n1 ¼100. Estimates of p and
Point estimation 23

 can be obtained using the S-PLUS routine nlminb, which allows for restriction
of parameter values, in this case between 0 and 1 (it is possible to use the
unrestricted minimization function nlmin but a warning is issued as the search
routine takes parameter values below zero causing an error in the routine qnorm).
Most packages minimize a function rather than maximize and hence we use the
negative log-likelihood function. Because constants in the likelihood function do
not affect the value of the estimates, as a simplification, these are generally
dropped from the analysis. Suitable S-PLUS coding to estimate p and  is given
in Appendix C.2.1.
The heritability used to generate the numbers for the offspring generation
was 0.6 (giving an expected number of 68 as actually used): the output from
S-PLUS is 0.5000000 for p (which is simply r0/n0) and 0.5861732 for the herit-
ability ( or h2).

Logistic regression: connecting the binomial with regression


Suppose that the probability of an event occurring is a function of some
other variable x. For example, the proportion of insects killed by an insecticide
would be expected to increase with the dosage to which they are exposed
(Figure 2.5). We might be inclined to model this relationship using the simple
additive model, p ¼ 1 þ 2 x. The problem with this model is that it does not
restrict p within the required range of 0 to 1, and as can be seen in the example,
the shape of the curve is sigmoidal. This shape is a natural form (but not the only
possibility) when the upper and lower limits are bounded. What we require is a
model in which there is a lower bound to p that is equal to zero and an upper
bound to p of 1. A model that satisfies these requirements is the logistic equation

e1 þ2 xi
pi ¼ ð2:33Þ
1 þ e1 þ2 xi
This equation can be linearized by the transformation
 
pi
ln ¼ 1 þ 2 xi ð2:34Þ
1  pi

The left-hand side is termed the logit, and is a contraction of the phrase “logistic
unit.” It is also known as the log odds. Equation (2.34) provides a simple means of
graphically representing the data and crudely estimating the parameter values.
For obvious reasons, the method of maximum likelihood is to be preferred.
To obtain the log-likelihood function, we first note that Eq. (2.29) can be
rearranged as
   
n! p
lnðLÞ ¼ ln þ n lnð1  pÞ þ r ln ð2:35Þ
r!ðn  rÞ! 1p
24 Maximum likelihood
1.2

1.0

0.8
Proportion killed (p)

0.6

0.4

0.2

0.0

1.65 1.70 1.75 1.80 1.85 1.90


Dose (x)

Figure 2.5 Plot of beetle mortality vs. dose of gaseous carbon disulphide. Solid
line shows fitted curve (see Appendix C.2.2 for coding. Data from Dobson 1983
after Bliss 1935).

Dose 1.69 1.72 1.76 1.78 1.81 1.84 1.86 1.88

N 59 60 62 56 63 59 62 60
R 6 13 18 28 52 53 61 60

The log-likelihood function for the logistic is then

X
N  
ni !
lnðLÞ ¼ ln  ni lnð1 þ e1 þ2 xi Þ þ ri ð1 þ 2 xi Þ ð2:36Þ
i¼1
ri !ðni  ri Þ!

The summation sign is required because we have N observations, the ith


observation consisting of ri “successes” in ni “trials” (e.g., ri is the number of
individuals killed when ni individuals are subjected to dose i of insecticide).
Estimates of the parameters 1 and 2 are obtained by minimizing ln(L).
Logistic regression can include more than one explanatory variable (e.g., in the
insecticide example we might include body size as a second variable), and is so
widely used that most statistical packages include logistic regression as a specific
option and it is only necessary to give the linear component of the model
(note that in SYSTAT the dialog box for logistic regression is named “logit”).
Point estimation 25

But most programs expect the data to be in the form of one row per individual
with the dependent variable (e.g., mortality) being categorical: in the case of the
beetle data, we would code an individual as 0 if alive and 1 if dead. This is a
convenient method of coding if there are several explanatory variables (e.g., dose
and body size) but is definitely a nuisance for the present example. The data can
still be analyzed in the tabulated form but it is necessary to specify the
log-likelihood function. Programs frequently minimize rather than maximize
functions and hence it is necessary to use minus log-likelihood. SYSTAT calls
the function to be minimized the LOSS function. Because the term
PN
i¼1 lnðni !=ðri !ðni  ri Þ!ÞÞ is a constant it can be omitted. In SYSTAT one supplies
the model function (e.g., DEATHS=SAMPLE*exp(b1+b2*DOSE)/(1+exp(b1+b2*DOSE)),
where DEATHS is ri, SAMPLE is ni, DOSE is xi, and b1, b2 are 1, 2, respectively)
and the loss function (e.g., LOSS = -(DEATHS*(b1+b2*DOSE)-SAMPLE*LOG(1+exp(b1
+b2*DOSE)))),both of which can be done via dialog boxes. In S-PLUS it is necessary
to write a function and use the nonlinear minimizing routine (see Appendix C.2.2
for coding and for an alternative approach that uses 0,1 data and the general
linear model routine glm, see Appendix C.2.8).

From binomial to multinomial


In many cases, there are more than two possible outcomes (e.g., several
alleles at a locus). Such a distribution is said to be multinomial. Suppose we have
a sample consisting of a set of categories, such as an age sample of animals: using
the multinomial distribution the likelihood for the sample is

n! n! Y k
L¼ px11 px22 . . . pxkk ¼ Qk pxi i ð2:37Þ
x1 x2 . . . xk i¼1 x i i¼1

where xi is the number of observations in the ith category (e.g., age class) and
pi is the true proportion in the ith class. The log-likelihood function is

X
k X
k
lnðLÞ ¼ lnðn!Þ  lnðxi Þ þ xi lnðpi Þ ð2:38Þ
i¼1 i¼1

To find the maximum likelihood estimates ð^p1 , p^ 2 , . . . , p^ k Þ, we can proceed by


differentiating and setting the result to zero, but an easier approach is as
follows: the probability of an animal being in age class i is pi and hence the
probability that it is not in age class i is 1  pi. Thus from this perspective
we have a simple binomial distribution, all age classes except age class i being
collapsed into one. Hence the maximum likelihood estimator for age class i is
simply xi/n.
26 Maximum likelihood

Combining simulation and MLE to estimate population parameters


Hooded seals are commercially harvested and hence it is essential to be
able to predict the result of particular harvesting strategies on population rates
of increase or decline. To do this analysis, five population parameters are
required:

(1) The number of pups produced at the starting point of the projection
(1945),
(2) The instantaneous rate of natural mortality, M (i.e., probability of
surviving each year is eM), and
(3) The proportions of 4, 5, and 6-year-old females that breed (none breed
earlier than 3 years and all breed by 7 years).

The available data on hooded seals consisted of age distributions collected


each year from 1972 to 1978. Jacobsen (1984) constructed a simulation model of
hooded seal population dynamics which generated age distributions for the years
1945–1986. Now consider the year 1972: for this year there is an observed
distribution of ages. Taking the simulation to be the model we can construct the
log-likelihood function using the predicted and observed age distributions
X
k X
k
lnðL1972 Þ ¼ lnðn1972 !Þ  lnðxi,1972 Þ þ xi,1972 lnð pi,1972 Þ ð2:39Þ
i¼1 i¼1

where n1972 is the number of seals sampled in 1972, xi,1972 is the number of
seals of age i in the 1972 sample and pi,1972 is the proportion of seals of age i
in 1972 predicted by the simulation model. The first two terms are constant
and hence maximizing the log-likelihood is accomplished by maximizing the
P
third term ki¼1 xi,1972 lnðpi,1972 Þ. Each simulation run with a given combination
of the five unknown parameters will produce a likelihood value. The preferred
combination of parameter values is that which maximizes the log-likelihood.
As previously noted, likelihood values are multiplicative: thus the likelihood
for all the years 1972 to 1978 is L19721978 ¼L1972L1973 . . . L1978, and the preferred
P Pk
set of population parameters is the set that maximizes 1978 j¼1972 i¼1 xi,j lnðpi,j Þ.
Whereas the general principle set out above is correct, the execution
proved troublesome, because different combinations of initial pup production
and natural mortality rate produce almost identical age distributions but
widely divergent population projections (Figure 2.6, Table 2.1). The reason is that
population size is not itself constrained: the observed changing age distribution
can be modeled by an initially small population (1945 pup production) and a low
mortality rate or by a very large initial population and a high mortality rate
(Table 2.1). There is thus a ridge of likelihood values corresponding to a positive
relationship between 1945 pup production and natural mortality. Unfortunately,
Point estimation 27

Table 2.1 Maximum likelihood estimates of the five parameters (pup production in 1945, natural mortality,
partial recruitment to the breeding stock of females aged 4, 5, and 6 years). Note that the partial recruitment
values change little but that there is a positive correlation between pup production and natural mortality.
Taken from Jacobsen (1984)

Partial recruitment at
Pup production in 1945 Natural mortality 4 years 5 years 6 years Log-likelihood

64350 0.08 0.40 0.74 0.98 8308


79250 0.10 0.40 0.74 0.97 8306
98900 0.12 0.41 0.75 0.96 8304
125500 0.14 0.42 0.75 0.95 8302
162500 0.16 0.42 0.74 0.94 8301
Pup production (10−3)

Year

Figure 2.6 Predicted pup production estimates of hooded seals in the West Ice for the
years 1945–1986 for three combinations of natural mortality (M) and initial pup
production that have virtually identical, and maximum, likelihood (Table 2.1).
Redrawn from Jacobsen (1984).

future population trajectories vary very widely (Figure 2.6), from increasing
(hence a sustainable catch rate) to decreasing (hence an unsustainable catch rate).
To constrain the estimates we need at least one population estimate either in the
early years (e.g., between 1945 and 1960) or the later years (e.g., between 1975 and
28 Maximum likelihood

1985): but note that an estimate around 1968 would be of no use, because all the
trajectories converge at that year. Although the maximum likelihood method
in this instance cannot give good estimates of the parameter values, it does
very clearly demonstrate the problem with the available data and indicates the
type of information that is required.

Interval estimation

Method 1: an exhaustive approach


Having an estimate is of little use if we do not also have a measure of
confidence in the estimate. The usual measure is the 95% confidence interval.
How can we calculate this interval for the likelihood estimate? Consider the
likelihood function for a single variable, , plotted against . Each likelihood
represents the relative support for the particular value of  that generates this
likelihood. Thus, for example, if the likelihood at i is 0.1 and that at j is 0.05, we
can state that i has twice the support of j. If we divide throughout by the area
under the distribution, we arrive at a distribution whose total area is equal to 1,
and the 95% confidence region can be approximated by the values of  that cut off
the tails of the distribution at 0.025 and 0.975. Suppose, for example, we wished
to estimate the confidence limits for the mean, the likelihood for a given
value of , say j, is
Y
n
1 xi j 2
pffiffiffiffiffiffi e2
1
Lðj Þ ¼  ð2:40Þ
i¼1  2

Because  is the same for all values of j, we can assign it any value we choose (1 is
pffiffiffiffiffiffi
the simplest value). Similarly, 2 is a constant and hence can be dropped.
Therefore, our modified function, L ðj Þ is
Y
n
L ðj Þ ¼ e2ðxi j Þ
1 2
ð2:41Þ
i¼1

Now we calculate L ðj Þ between limits that enclose most of the probability
distribution (this can be done by trial and error; it is better to have a very large
range rather than a small range) and iterate using some pre-assigned step length,
so we have a series of equally spaced parameter values, 1,2, . . ., N, where
1 is the smallest value and N is the largest. We then divide each L ðj Þ by the
sum of all values, giving
L ðj Þ
LS ðj Þ ¼ PN ð2:42Þ

i¼1 L ði Þ
Interval estimation 29
Pk 
We calculate the cumulative sum of the above Lcum,k ¼ j¼1 LS ðj Þ, where k
ranges from 1 to N. The upper and lower confidence values are then the values
of  at which Lcum,k ¼0.025 and Lcum,k ¼0.975, respectively.
The above method is numerically intensive but rigorous in that it makes no
assumption about the actual distribution of the likelihood. It can be extended
to any number of parameters. For example, if there are two parameters to
be estimated (1,2), we would vary both parameters and the result would be
a bivariate confidence region.

Method 2: the log-likelihood ratio approach


For large samples, we can make use of the fact that the sampling
distribution of log-likelihood function is approximately

2ðLLð^1 , ^2 , . . . , ^k Þ  LLð1 , 2 , . . . , k ÞÞ  2k ð2:43Þ

where LLð^1 , ^2 , . . . , ^k Þ is the log-likelihood at the maximum likelihood estimators
and LL(1,2, . . ., k) is the log-likelihood at the true parameter values. Confidence
regions can be approximated by the set of parameter combinations that lie
2k =2 units distant from LLð^1 , ^2 , . . . , ^k Þ. Thus, for a model with one param-
eter, the confidence range is given by the two log-likelihoods that give a value
of 1221 ¼ 1:92. To illustrate the procedure, consider the problem of estimating
the heritability of a threshold trait discussed previously. For simplicity, we
shall assume that the initial proportion is 0.5 and we have a single parameter,
 (=h2), for which to provide confidence limits. To obtain the lower and upper
confidence values, a simple approach is as follows (see Appendix C.2.3 for
S-PLUS coding):

Step 1: Find the MLE for  ð¼ Þ^


^
Step 2: Calculate the log-likelihood (i.e., LLðÞ)
Step 3: Iterate over a range of i (e.g., 0.01–0.99) and for each
calculate LL(i)
^  LLði Þ  0:52
Step 4: Calculate Diff ¼ LLðÞ 1
Step 5: Find value of i at which Diff is equal to zero. There will be
two values, corresponding to the upper and lower confidence
limits. These values can be found graphically, as shown in
Figure 2.7, or numerically, as shown in Appendix C.2.3 (also see
exercise 2.8).

For the von Bertalanffy equation, there are four parameters (three s and ).
With more than two parameters the confidence region cannot be visualized.
To obtain a visual picture, we can proceed as follows, using the von Bertalanffy
30 Maximum likelihood

Log-likelihood

-Diff
Heritability, θi

Figure 2.7 Plot of log-likelihood vs. heritability. Dotted line shows the negative of
^  LLði Þ  0:52 , where i is a heritability estimate. The values at which
Diff ¼ LLðÞ 1
Diff = 0 demark the lower and upper 95% confidence interval.

function as an example (see Appendix C.2.4 for coding). The two parameters
of most interest are 1, the asymptotic length, and 2, the growth rate.

Step 1: Find the MLEs for all four parameters, which I shall refer to as
the global MLEs
Step 2: Iterate over a range of values of the two parameters of interest
(1 and 2)
Step 3: At each iteration find the MLE for 1 and 2, keeping the other
two parameters at their global MLE. Designate these new MLE
values as 1 , 2 .
Step 4: Calculate 2ðLLð^1 , ^2 , ^3 , Þ
^  LLð1 , 2 , ^3 , ÞÞ
^  22 , where, for the
95% region, 2 ¼ 5:991
2

Step 5: Use a numerical method to construct the contour line


corresponding to zero. This line demarcates the 95% confidence
region (Figure 2.8).

Method 3: a standard error approach


A simple but approximate method of assessing the variability in a
parameter value is to examine the standard errors of the estimates; roughly
speaking, the 95% confidence limits are 2 the standard errors. The variance of
Interval estimation 31

Lmax (=θ1)

k (= θ 2)

Figure 2.8 The 95% confidence ellipse for the estimates Lmax (¼1) and k (¼2)
generated by conditioning on t0 (¼3) and . Dot shows MLE combination. Coding
that generated the data matrix from which the contour was estimated is given
in Appendix C.2.4.

a parameter is approximately equal to the negative of the inverse of the second


derivative at the maximum likelihood point
 2 1
@ LL
2 ¼  ð2:44Þ
@2

For example, in the case of the mean of a normal distribution, the variance
of the mean, 2 (obtained by taking the derivative of Eq. (2.6)) is
!1
X
n
1 2
2 ¼  2 ¼ ð2:45Þ
i¼1
 n
pffiffiffi
and the standard error is thus = n, which is the well-known formula. When
there are several estimated parameters, the estimation is somewhat more
tricky as it is necessary to invert the matrix of second partial derivatives.
Fortunately, the standard errors of parameter estimates are typically given in the
output from statistical packages, along with an approximate t-test for  ¼0,
32 Maximum likelihood

computed as =^ ^ . The output from S-PLUS for fitting the von Bertalanffy growth
function is shown in Appendix C.2.5. In addition to the standard errors, the
output also includes the correlation matrix: this can be useful to examine the
independence of the parameter estimates. In the present example, the parameter
k(¼2) is highly correlated with both L1(¼2), and t0(¼3) and thus variation in
L1 and k cannot be considered independently, a point made clear by the bivariate
contour interval (Figure 2.8).

Hypothesis testing

There are two basic questions we have to answer having fitted a model:
first, is the model a poor fit to the data, and, second, does the model explain
significantly more variation than a model with fewer parameters? In both cases,
we make use of the chi-square distribution introduced in the previous section.

Testing model fit


The adequacy of a model is defined in relation to a model that has the
same number of parameters as observations and thus completely describes the
data. This model is known as the maximal or saturated model. For example,
consider the log-likelihood function for the mean
  X
1 n
1 xi  2
lnðLÞ ¼ n ln pffiffiffiffiffiffi  ð2:46Þ
 2 i¼1
2 

In the saturated model, there are n parameters; that is, each observation has a
different mean equal to the observation (i ¼xi). The log-likelihood for this model
is thus
 
1
lnðLÞ ¼ LLSat ¼ n ln pffiffiffiffiffiffi ð2:47Þ
 2
More generally, define LLSat as the log-likelihood of the saturated model
with n observations and LLMLE as the log-likelihood at the maximum likelihood
estimated values. Now

D ¼ 2ðLLSat  LLMLE Þ  2nk ð2:48Þ

where k is the number of parameters estimated. D is known as the scaled


deviance (or simply the deviance). If the model fits the data well D will be
smaller than the critical value of 2nk .
To illustrate, D for the von Bertalanffy function is
n  h i2
2 X ^2 1  e^2 ðt^3 Þ
D¼ lt    2n3 ð2:49Þ
2 2 t¼1
Hypothesis testing 33

Because  2 is unknown, we cannot use D directly to test for lack of fit. For the
present data set, in which we have only a single observation per age, we could go
no further. Of course, we can always examine the residual sums of squares to
assess how much variation is accounted for by the fit.
If there are several observations per age, we can approximately test for lack of
fit using a method suggested by Draper and Smith (1981), which is exact for linear
models. We divide the residual sums of squares into a pure error component
(SSPE) and a lack of fit component (SSLOF)

X
n
SSPE ¼ ðmt  1Þ^ t2
t¼1 ð2:50Þ
SSLOF ¼ SSð^1 , ^2 , ^3 Þ  SSPE

where mt is the number of observations in age group t, ^ t2 is the estimated


variance within age group t, and SSð^1 , ^2 , ^3 Þ is the sums of squares at the
maximum likelihood estimators. With no lack of fit

SSLOF =n  3
 Fn3, Nn ð2:51Þ
SSPE =N  n

Pn
where N is the total sample size (¼ t¼1 mt ). Thus if the right-hand side exceeds
the critical point of Fn3,Nn the model is deemed a poor fit, even though it may
still be a better fit than competing models.
For the models based on the binomial, we are generally in a better situation.
For the logistic model, D can be shown to be (Dobson 1983, p. 77)

X
N  
Obsi
D¼2 Obsi ln  2Nk ð2:52Þ
i¼1
Expi

where Obsi is the observed numbers in the ith category (i.e., observed ri and niri),
Expi is the expected numbers and Nk is the degree of freedom, which is equal to
the number of subgroups minus the number of estimated parameters. For the
beetle data, N¼8 and k¼2, and hence 2Nk ¼12.59. The estimated value of D is
13.66, which indicates that the model does not fit the data very well, although
visually the fit certainly appears quite adequate (Figure 2.5, see Appendix C.2.6
for S-PLUS coding), and clearly better than the simpler alternative of a constant
proportion (an issue discussed in the next section). When the expected value is
equal to zero, D is undefined (ln(0) undefined), and in the present example I added
a very small number to avoid this problem (Appendix C.2.6), though the existence
of the problem suggests that the sample sizes are too small or the model is
inadequate.
34 Maximum likelihood

Comparing models
I shall only consider models that have the same structure but differ in
the number of parameters. For example, in the case of the von Bertalanffy
function, we might wish to compare a model with 3 with one without 3

lt ¼ 1 ð1  e2 ðt3 Þ Þ vs: lt ¼ 1 ð1  e2 t Þ ð2:53Þ

To do so we can make use of the chi-square property of the deviance. For these
two models the deviances are

n  h i2
1X ^ ^
Dn3 ¼ lt  ^F,2 1  eF,2 ðtF,3 Þ  2n3
 t¼1
2

ð2:54Þ
n  h i2
1X ^
Dn2 ¼ 2 lt  ^R,2 1  eR,2 t  2n2
 t¼1

where the subscripts F and R stand for “Full” and “Reduced” model, respectively.
Now, by the additive nature of chi-square, we have Dn2  Dn3  21 . But we still
have the problem of the nuisance  2. By construction of the following ratio,
we can both eliminate this parameter and produce the F-statistic

Dn2  Dn3
 F1,n3 ð2:55Þ
Dn3 =ðn  3Þ

For models in which the maximum likelihood estimates are found by


minimizing the sums of squares, we can write a general formula for comparing
two models:

ðSSR  SSF Þ=ðF  RÞ


 FFR,nF ð2:56Þ
SSF =ðn  FÞ

where SSF is the sums of squares of the full model with F parameters, SSR is the
sums of squares of the reduced model with R parameters (F4R). Appendix C.2.7
shows coding to compare the von Bertalanffy model with three vs. two
parameters. The analysis shows that the three parameter model does not fit
significantly better than the two parameter model (F1,10 ¼0.09, P¼0.767). This is
also evident from the standard error of 3 (t0) given in Appendix C.2.5.
For models other than those for which the maximum likelihood estimates
are obtained by least squares, we can employ the deviances directly

DR  DF
 2FR ð2:57Þ
FR
Hypothesis testing 35

Suppose, for example, we wished to compare the logistic fit with a constant
proportion model. The latter model is equivalent to the logistic model in which 2
is set equal to zero ði:e:, pi ¼ e1 =ð1 þ e1 Þ ¼ constantÞ, that is, a one parameter
model (see Appendix C.2.8 for coding to compare these models). The deviance for
the two parameter model is 13.63 and for the one parameter model it is 287.22:
thus D1D2 ¼273.59, to be compared to 21 ¼ 3:84, which is obviously highly
significantly different (P40.0001).
In some instances one might wish to compare several samples: for example,
do the means from two separate populations come from the same statistical
population (i.e., the null hypothesis of 1 ¼2 versus the alternate hypothesis of
1 6¼ 2 ). This is conceptually and mathematically equivalent to comparing
a two parameter model with a one parameter model:

One parameter model i ¼ 


ð2:58Þ
Two parameter model i ¼  þ di 

where di is a “dummy” variable that takes the value 0 for population 1 and 1 for
population 2. The statistic  is estimated by minimizing sums of squares and
hence we can use Eq. (2.56) to compare the two models. The two deviances are

1X 2 X n
D1 ¼ ðxij  x Þ2 ¼  2 SS1
 j¼1 i¼1
2

ð2:59Þ
1X2 X n
D2 ¼ 2 ðxij  x j Þ2 ¼  2 SS2
 j¼1 i¼1

where, for simplicity, I have assumed equal sample sizes. In the two parameter
model there are 2n data points and 2 parameters and thus “nF” is equal to 2n2,
and we test the hypothesis that the two parameter model explains significantly
more variance than the one parameter model (i.e., is a better fit to the data) with
the F-statistic

D1  D2 SS1  SS2
¼  F1,2n2 ð2:60Þ
D1 =ð2n  2Þ S1 =ð2n  2Þ

which the reader will no doubt recognize as the calculation used in a one-way
analysis of variance.
A more complex example is the comparison of two functions that have several
parameters. Consider the problem of comparing two growth curves fitted using
the von Bertalanffy function (Figure 2.9), the two shown corresponding to male
and female curves. The curves could differ in several ways; all three parameters
36 Maximum likelihood
70

60

50
Length (cm)

40

Female
30 Male
Predicted Female
Predicted Male
20

10
0 2 4 6 8 10 12 14
Age (years)

Figure 2.9 Plot of average length at each age for male and female Pacific hake, with
the estimated von Bertalanffy curves. Data modified from Kimura (1980).

Age 1.0 2.0 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3 11.3 12.3 13.3

Female 15.4 28.0 41.2 46.2 48.2 50.3 51.8 54.3 57.0 58.9 59.0 60.9 61.8
Male 15.4 26.9 42.2 44.6 47.6 49.7 50.9 52.3 54.8 56.4 55.9 57.0 56.0

might differ between the populations or only one. Suppose we wish to test the
hypothesis that a model in which all three parameters differ fits the data better
than one in which none differ between populations: we proceed in the same
manner as above (Appendix C.2.9 shows the coding using the fitting function
nlmin and C.2.10 shows the coding using the supplied function nls, which fits
a function using least squares. Both methods give identical results and are
presented to illustrate that several routes may be taken to achieve the same test.
Interestingly, nlmin failed to fit the function with dummy variables). First, we fit
the curves separately for the two samples and calculate the combined sums of
squares. Second, we fit a single curve using the combined data. Third, we apply
Eq. (2.60). For the example data set, we have F3,20 ¼4.7, P¼0.01, indicating that a
model with all three parameters different is to be preferred over a model with
common parameters. This does not indicate that a model with only one or two
common parameters does not fit the data equally as well. From the previous
Summary 37

analyses, we might suspect that the parameter 3 (¼t0) does not differ between
populations. Therefore, it is reasonable to compare the full model with one
that incorporates a dummy variable for sex but not 3. The full model has
six parameters and the reduced model has four parameters (coding in
Appendix C.2.11). The full model is not significantly better than the reduced
model (F2,22 ¼0.48, P¼0.63).

Summary

(1) The method of maximum likelihood presumes that one can assign
to a set of observations a probability, or likelihood (L) that is a function
of one or more parameters 1,2, . . ., the values of which are to be
estimated. The parameter values that maximize the probability of
obtaining the observed data are the maximum likelihood estimates.
It is frequently most convenient to work with the log-likelihood.
(2) In many cases, the probability distribution is based on the normal
distribution, leading to the maximum likelihood estimates being
obtained by minimizing the residual sums of squares, that is
P
Minimize ni¼1 ðObserved value  Predicted valueÞ2 . Another commonly
occurring situation is that in which there are two outcomes (e.g., alive
or dead) and the likelihood is based on the logistic model.
(3) The two most commonly used methods of estimating confidence limits
are the log-likelihood ratio approach and the standard error approach.
The first method involves five steps:

Step 1: Find the MLE for 


^
Step 2: Calculate the log-likelihood (i.e., LLðÞ)
Step 3: Iterate over a range of  (e.g., 0.01 to 0.99) and for each
calculate LL()
^  LLðÞ  0:52
Step 4: Calculate Diff ¼ LLðÞ 1
Step 5: Find value of  at which Diff is equal to zero. There will be two
values, corresponding to the upper and lower confidence limits.

(4) The second method of assessing the variability in a parameter value is to


examine the standard errors of the estimates; roughly speaking, the 95%
confidence limits are 2 the standard error. The variance of a parameter
is approximately equal to the negative of the inverse of the second
derivative at the maximum likelihood point 2 ¼ ð@2 LL=@2 Þ1 . When
several parameters are estimated, the matrix of second derivatives must
be inverted to obtain the variance–covariance matrix.
38 Maximum likelihood

(5) To examine a model for how well it conforms to the observed data, define
LLSat as the log-likelihood of the saturated model with n observations and
LLMLE as the log-likelihood at the maximum likelihood estimated values.
The saturated model is that in which the log-likelihood is equal only
to the constant component of the log-likelihood (e.g., for a normal
pffiffiffiffiffiffi
distribution it would be n lnð 2Þ). Now D ¼ 2ðLLSat  LLMLE Þ  2nk ,
where k is the number of parameters estimated. D is known as the scaled
deviance (or simply the deviance). If the model fits the data well D will be
smaller than the critical value of 2nk .
(6) To compare two models that have the same structure but differ in the
number of parameters we construct either an F or 2 statistic. For models
in which the maximum likelihood estimates are found by minimizing
the sums of squares, a general formula for comparing two models is

ðSSR  SSF Þ=ðF  RÞ


 FFR,nF
SSF =ðn  FÞ

For models other than those for which the maximum likelihood
estimates are obtained by least squares, we can generally employ the
deviances directly

DR  DF
 2FR
FR

Further reading

Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. London: Chapman and Hall.
Cox, D. R. and Snell, E. J. (1989). Analysis of Binary Data. London: Chapman and Hall.
Dobson, A. J. (1983). An Introduction to Statistical Modelling. London: Chapman and Hall.
Eliason, S. R. (1993). Maximum Likelihood Estimation. Newbury Park: Sage Publications.
Kimura, D. K. (1980). Likelihood methods for the von Bertalanffy growth curve. Fishery
Bulletin, 77, 765–76.
Stuart, A., Ord, K. and Arnold, S. (1999). Kendall’s Advanced Theory of Statistics: Classical
Inference and the Linear Model, Vol. 2A. London: Arnold.

Exercises

(2.1) Using the 10 values of x given below, and assuming a normal


distribution with  ¼1, plot the log-likelihoods from 3 to þ3, using a step
interval of 0.1. Compare the estimate of  with the arithmetic average.

0.793 0.794 0.892 0.112 1.371 1.417 1.167 0.531 0.921 0.577
Exercises 39

Hint: if using S-PLUS, consider the following routines: mean, seq, length, for, max,
plot
P
(2.2) Show that ð1=nÞ ni¼1 ðxi  x Þ2 is a biased estimator of  2 and that an
P
unbiased estimator is ð1=n  1Þ ni¼1 ðxi  x Þ2 . Hint: There is no loss in generality
in assuming that ¼0, which makes the proof simpler.
(2.3) A frequently used distribution used for sparsely spatially distributed
data (e.g., the distribution of a rare organism) is the Poisson distribution, which
has probability density function pðrÞ ¼ e ðr =r!Þ, where p(r) is the probability of r
events occurring (e.g., probability of a sampling unit containing r individuals).
Show that the maximum likelihood estimate of  is equal to (Total number of
individuals counted)/(Total number of sampling units).
(2.4) Generate 20 regression lines using the same probability distribution
and estimate the correlation between ^1 and ^2 . Assume 1 ¼0, 2 ¼1, the error
term is N(0,1), and there are 10 x values evenly spaced from 1 to 10 (i.e., 1,2,3,. . .,
9,10). Hint for coding: consider using the following routines: for, seq, rnorm, lm,
cor.test
(2.5) Egg production in many organisms follows a triangular pattern, first
increasing with age and then decreasing. A function suggested by McMillan et al.
(1970) to describe this pattern in the fruitfly is y ¼ 1 ð1  e2 ðx3 Þ Þe4 x , where y is
eggs laid on day x. Assuming that errors are normally distributed (as discussed
for the von Bertalanffy function), estimate the four MLE parameters for the
following data set

Day 1.5 3 4 5 6 7 8 11 14

Eggs laid 21.6 63.7 61.6 59.9 53.8 55.5 50.8 31.5 24.4

(2.6) A mate choice experiment is run twice, the first time with a sample
size n1 and the second with a sample size n2. Show that the maximum likelihood
estimate is (r1 þr2)/(n1 þn2) rather than 12[r1/n1 þr2/n2].
(2.7) Using the 10 data points from N(0,1) given in question 1 construct a
95% confidence interval using the exhaustive approach method for the mean.
Compare the results with those obtained by the usual method (i.e.,  t*SE¼
2.262*SE). Use a range from 2 to þ2 and a step length of 0.01. Hint for coding:
consider using the following routines: rnorm, mean, var, seq, length, prod, for,
sum, cumsum.
(2.8) Using the above data, estimate the 95% confidence limits using the
log-likelihood ratio approach. Hint: check Appendix C.2.3.
40 Maximum likelihood

(2.9) Consider the von Bertalanffy function lt ¼ ð1  ekðtt0 Þ Þ, where lt


is length at age t, k and t0 are known constants, and  is an unknown param-
eter to be estimated. Show that the maximum likelihood estimate of the
P
standard error of  is 1= 2 nt¼1 ð1  ekðtt0 Þ Þ2 . Hint: make use of the second
derivative.
(2.10) The following mean length at age are measured for a particular
species of fish.

Age 1 2 3 4 5 6 7 8 9 10

Length 23.61 43.10 57.54 68.24 76.16 82.03 86.38 89.60 91.99 93.76

Assuming a von Bertalanffy growth curve as in question 7, with k¼0.3 and t0 ¼


0.05, estimate  and the nuisance parameter  2 using the nls routine in S-PLUS (or
other statistical package). Estimate the standard error of  using the result from
question 7 (Note that  2 is estimated as described in the main text).
(2.11) The table below shows egg production in a second strain of
Drosophila melanogaster. Fit the function y ¼ 1 ð1  e2 ðx3 Þ Þe4 x to these data
and test the hypothesis 3 ¼0

Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Eggs laid 54.8 73.5 78 71.4 75.6 73.2 65.4 61.9 61.7 60.1 55.1 50.4 44.3 42.3

List of symbols used in Chapter 2

Symbols may be subscripted


" Error term
 Parameter to be estimated
^ Estimate of 
’() Function of 
 Standard deviation
 2
Variance
 Mean
^ Estimate of 
 Pi (¼3.14 . . .)

D Deviance
F Number of parameters in the full model
L Likelihood
Exploring the Variety of Random
Documents with Different Content
324 REVUE DU MONDE MUSULMAN organisme vivant, sans
cesse en transformation, et qui doit, avant tout, être facilement
intelligible (i). Al-Karmel « Le Carmel » de Haïfa est dans la
deuxième année de son existence. C'est un organe démocratique,
qui défend, en toute occasion, la cause du prolétariat et de la liberté,
et qui s'efforce de répandre, dans le peuple, les idées de progrès et
de civilisation. Paraît actuellement le samedi, sur huit pages petit in-
folio ; il espère devenir quotidien. Directeur-propriétaire : Nadjîb
Nassâr(2). Un journal dont la création est un peu postérieure à celle
d\Al-Karmel, est Djirâb al-Kourdi « La Besace du Kurde », feuille
humorisAL-KOUDS (JÉRUSALEM) BJ-HEBDOMADAIRE IV»**
PROPHIITAIRX Georges t. Habib Manama. ABONNEMENT Jérrodem
uo an!».. Medjidiea Turquie un an 1 Etranger un «n 20 francs.
Inspruont et annonça à I* 1™ page la ligne 3 Pias i la »•» page . S .
PAYABLE DEVANCE. LL-mi.^ tique hebdomadaire mettant en
application le principe Castigat ridendo mores et, sous une forme
plaisante, travaillant à l'amélioration de la société. De petit format
(in-8), la Djirâb al-Kourdî paraît sur seize pages. Directeur-rédacteur
en chef : Mitrî Hallâdj ; administrateur, Basil Al-Djad'; gérant, Djamîl
Efendi Ramadan (3). Depuis le mois de juin paraît à Beyrouth un
organe médical mensuel, At-Tabîb al-1 Amil « le Médecin-praticien»,
dont le rédacteur en chef, en même temps gérant, est le docteur
Théophile Dabana. Imprimé sur quatre pages in-4, il accepte des
articles sur toutes les (1) Abonnements annuels : Hama et Homs, 1
medjidié, Turquie, 1 medjidié 1/2; Amérique, 10 francs. (2)
Abonnements annuels : Turquie, 2 medjidiés ; étranger, 10 francs.
(3) Abonnements annuels : Haïfa, 1 medjidié ; extérieur, 1 medjidié
1/4.
LA PRESSE MUSULMANE 325 questions de médecine, de
pharmacie et d'hygiène. S'adressant au grand public tout autant
qu'aux médecins, il a obtenu, dès son apparition, un vif succès. C'est
l'imprimerie Al-Idjtihâd qui l'édite. Homs, l'ancienne Emesse,
possède depuis 1909 une revue actuellement hebdomadaire, qui a
pris pour titre le nom même de la ville, Homs, et paraît sur seize
pages petit in-4. Elle publie des articles sur des questions très
variées : la politique y tient beaucoup de place, ainsi que la
littérature, les sciences, l'hygiène, etc., sans parler des nouvelles
intéressant la région. C'est l'évèque Athanase 'Attâ Allah qui la
dirige; administrateur-rédacteur en chef, Constantin-Yannis (1). On
trouve encore à Homs une feuille humoristique hebdomadaire, 1" H I
E E ». 8 t ^J JV- i~^ ^ l^biy. ^ y^t ç.^, jj>^> "J- i^Jj t^J\ A >*&
EUE K. ZACÇA Propriétaire 4 RocUcteui JERUSALEM (Palestine
IBOIIIIEVEITS umxxz Pour tontes annonces rt artJdes p«rt)caf\eres,
on ddl s'adresser I Monsieur te Directeur de I Armaflre i_ R. fltCC*
JsmtalM ANNAFIRE Dâ'at at-Tâsa « l'Écuelle est perdue », qui a un
programme semblable' à celui de Hatti bïl-Khourdj, et de Djirâb al-
Kourdî, dont nous parlons ailleurs. Son premier numéro porte la date
du 2 juillet : il est précédé de cette épigraphe : Al-'Adlas as al-Moulk
« La justice est la base du royaume ». Directeur-gérant : Yoûsouf
Khâlid Al-Masadî (2). Djadida Mourdji'iyoûn a une feuille
hebdomadaire de grand format, Al-Mardj, politique, littéraire et
économique, dont la devise est : Liberté, Égalité, Fraternité, qui
entrera, dans quelques mois, dans la troisième année de son
existence. Le numéro du 3o juin dernier contient un intéressant
article sur la diffusion des journaux en pays ottoman, et ses
conséquences : l'auteur de l'article démontre que la presse (1)
Abonnements annuels : Homs, 1 medjidié 1/2; Turquie, 2 medjidiés ;
étranger, 10 francs. (2) Abonnements annuels : Homs, 3/4 de
medjidié ; Turquie, 1 medjidié ; étranger, 8 francs.
32Ô REVUE DU MONDE MUSULMAN devra vaincre deux
ennemis redoutables : l'ignorance et l'avarice, pour pouvoir remplir
le rôle qui lui est assigné. Directeur-propriétaire : docteur As'ad
Rahhâl ; le Chéïkh Soleïman Dâhir collabore régulièrement à ce
journal (i). Liban. Al-Bourdaounî est l'organe spécial de Zahla,
organe hebdomadaire qui voudrait devenir mensuel. Son premier
numéro est du 25 juin; il débute par ces mots : « La vie est une
lutte. Dans la lutte est la vie. Lutter est le propre de la jeunesse... »
C'est dire qu' Al-Bourdaounî a pour rédacteurs des hommes d'action,
désireux de voir leur pays, le Liban, se relever et entrer résolument
dans la voie du progrès. Propriétaire : Sâlim Ar-Rayyâchi; directeur-
rédacteur en chef: Iskender Ar-Rayyâchî (2). Al-Mouhadhdhib « Le
Censeur », de Zahla également, date de quatre ans ; il espère
devenir quotidien, mais ne paraît, provisoirement, que deux fois par
semaine, le jeudi et le samedi. « Littéraire, scientifique et industriel
», dit son titre. C'est un journal intéressant et bien informé, pour le
Liban comme pour le reste de la Turquie (3). Palestine. Annafire « La
Trompette », dont le directeur-propriétaire est M. Elie K. Zacca, est
maintenant dans la quatrième année de son existence. (1)
Abonnements annuels : Djadida et environs, 2 medjidie's ; Turquie,
2 medjidiés 1/2 ; étranger, 1/2 livre anglaise. (2) Abonnements
annuels : Liban, 1 medjidié 1/4 ; extérieur, 9 francs. (3)
Abonnements annuels : Liban, i5 francs ; extérieur, 20 francs.
LA PRESSE MUSULMANE 327 Elle paraît provisoirement,
deux fois par semaine, le mardi et le vendredi, sur quatre pages in-
folio, mais deviendra quotidienne dans un avenir prochain. Organe
politique, économique et social, Annafire donne des nouvelles de
toute la Turquie (i). Al-Insâf « l'Équité » date de deux ans. Elle est
dirigée par M. Dendli Elias Machhour; elle définit ainsi son
programme: « feuille politique, scientifique, littéraire, d'informations
et humoristique». Paraît le mardi et le samedi ; même format que la
précédente (2). Al-Kouds « Jérusalem », journal bi-hebdomadaire
datant aussi de deux ans, est à recommander à ceux qui
s'intéressent à la vie de la LISSAN-EL-CHARK Palestine. Grâce à ses
correspondants, il est informé de tout ce qui se passe dans la région.
Propriétaire : M. Georges J. Hanania. Paraît le mercredi et le
vendredi (3). An-Nadjâh « Le Salut » est une feuille politique,
littéraire, scientifique et agricole, de grand format, paraissant le
vendredi de chaque semaine. Directeur-gérant : 'Alî Rimâwî. Les
questions politiques y tiennent une large place; les nouvelles de la
région et celles de la Turquie suivent un article de fond de caractère
politique. Comme les précédents, An-Nadjâh est dans la deuxième
année de son existence (4). (1) Abonnements annuels: Jérusalem et
Turquie, 10 francs; étranger, i5 francs. (2) Abonnements annuels :
Turquie, 10 francs; étranger, 12 francs. (3) Abonnements annuels :
Jérusalem, 3 medjidiés et demi ; Turquie, 4 medjidiés ; étranger, 20
francs. (4) Abonnements annuels : Liva de Jérusalem, 2 medjidiés ;
extérieur, 2 medjidiés et demi.
328 REVUE DU xMONDE MUSULMAN Mésopotamie. Ar-
Rakîb « L'Observateur », fondé à Bagdad le 6 moharrem 1327, est
une « feuille arabe-turque servant les intérêts de la patrie, en toute
indépendance, et paraissant provisoirement deux fois par semaine ».
Il est probable qu' Ar-Rakîb, qui achève la première année de son
existence, deviendra plus tard quotidien. Bien que le titre porte «
feuille arabe-turque », il est presque entièrement rédigé en arabe. A
côté des événements politiques et des nouvelles de Turquie, il fait
une part — souvent large — à la littérature. Directeur-gérant :
'Abdul-Latîr Thanyân (1). Ar-Riyâd « Les Prairies », autre journal de
Bagdad, hebdomadaire cette fois, se dit littéraire et commercial. Il
est de fondation récente, et préconise l'union entre Arabes et Turcs.
Les questions économiques y tiennent une très large place ;
beaucoup de nouvelles locales, aussi. Directeur et rédacteur en chef:
Soleïmân Ad-Dakhîl ; gérant, Djâr Allah Ad-Dakhîl (2). (i)
Abonnements d'un an et de six mois: Bagdad, 3o et 17 piastres;
extérieur, 35 et 20 piastres. Pour l'Inde et le golfe Persique, 8
roupies l'année; pour l'Europe, la Perse et l'Egypte, 11 francs. (2)
Adresse; Khan Al-'Adliya Al-Djadîd, Bagdad. — Abonnements
annuels; Bagdad, 1 medjidié ; extérieur, 1 medjidie et demi ;
étranger, 8 francs.
LA PRESSE MUSULMANE 329 Nous trouvons encore à
Bagdad Ar-Rousâfa, journal provisoirement hebdomadaire, politique,
littéraire, scientifique, commercial, se donnant pour mission de
combattre les abus, à la tête duquel est placé Mohammed Sâdîk Al-
A'radjî. Journal surtout local, il présente de l'intérêt en raison des
nombreuses informations qui lui parviennent de tous les points de la
Mésopotamie (i). Al-îkâdh « Le Réveil », de Basra, a été fondé
l'année dernière, et parait chaque semaine, sur quatre pages de
grand format. « Feuille littéraire, d'informations, patriotique,
humoristique », lisons-nous sur le titre, et l'examen d'un numéro
à'Al-Ikddh montre que ce programme est rempli. Les trois premières
pages sont rédigées en arabe, et la quatrième en turc. Directeur
responsable et rédacteur pour la partie turque : Mektoûbîzâdè
rEumer Fevzî ; gérant, Soleïman Faïdî AlMausili (2). Journaux de
vilayets. Edirnè. Selon l'usage, on a donné au journal officiel du
vilayet d'Andrinople, comme titre, le nom même de la ville où il
paraît, en turc Edirné. Ce journal, dont une aimable intervention
nous a assuré le service, n'est pas, comme on pourrait le croire, une
simple énumération des actes administratifs. Il contient, avec des
renseignements d'intérêt général, des articles de fond d'un réel
intérêt. Edirnè est une feuille hebdomadaire, datant de trente et un
ans et (1) Abonnements d*un an et de six mois : Bagdad, 20 et 12
piastres ; Turquie, 25 et 40 piastres; étranger, 40 piastres l'année.
(2) Abonnements annuels: 40 piastres ; Inde et golfe Persique, 6
roupies ; Europe, Amérique, Egypte et Perse, 10 francs.
33o REVUE DU MONDE MUSULMAN paraissant chaque
jeudi, sur quatre pages grand in-folio. Il est composé, d'ordinaire, de
la manière suivante : Après le titre, le calendrier de la semaine,
donnant la concordance de l'ère de l'hégire avec l'année
administrative, les heures du lever et du coucher du soleil, etc. On
donne ensuite, s'il y a lieu, le texte des i*1 , ol i_A—
LA PRESSE MUSULMANE 33 I vons-nous élever nos enfants
? La bravoure et l'audace leur sont nécessaires. Article de fond,
signé Fakhir ud-Dîn eAlî. —La Société ottomane d'assistance aux
enfants indigents d'Andrinople : sa situation (177 enfants secourus,
1 1 5 garçons, 62 filles). Avec un éditorial insistant sur la nécessité
de fonder partout des associations semblables. — Le mildew:
comment il vient, et comment le faire disparaître. Appel aux
agriculteurs. Par Huseïn 'Avnî, directeur de l'agriculture (à suivre). N°
i525, 3o djoumâdhâ II i328 (24 /Juin 17 juillet 19 10). — L'éducation
physique: son importance, nécessité de rompre avec les errements
qui l'ont fait négliger jusqu'ici, par Fakhîr ud-Dîn rAlî. — Ouverture
d'une souscription en faveur des inondés. — Précautions d'hygiène.
— Éphémérides. — Pensées philosophiques (de Sénèque, Locke,
etc.) N° 1526, 7 redjeb i328 (ier/ 14 juillet 1910). — L'égalité existe
dans un gouvernement constitutionnel, par Fakhîr ud-Dîn fAlî. — La
Société ottomane contre les ennemis de l'intérieur (l'alcool, le tabac,
etc. ; nous avons précédemment parlé de cette Société). — Les
efforts des habitants de Drama (questions agricoles). — La Société
ottomane d'assistance aux enfants indigents d'Andrinople (liste de
souscription s'élevant à 738 piastres). — Loi sur les églises et les
écoles. N° 1527, 14 redjeb i328 (8/21 juillet 1910). — Les idées de
progrès des habitants de Djisr Moustafà Pacha (sur l'achat et
l'emploi des machines agricoles). — Pensées philosophiques. — Loi
sur le recrutement des officiers de réserve. — Communications
officielles, etc. N° i528, 21 redjeb i328 (15/28 juillet 1910). — La
fête nationale ottomane. — Le Club commercial hongrois-ottoman
(Association fondée à Buda-Pesth, pour favoriser les rapports
économiques des deux nations, et faisant appel à la fraternité
résultant d'une communauté d'origine). — Distribution des prix. —
Pose de la première pierre d'une école normale d'instituteurs. —
Construction de routes.
332 REVUE DU MONDE MUSULMAN Na 1529, 28 redjeb
i328 (22 Juillet/4, août 1910). — Le temps favorable à l'éducation;
mportance de celle-ci, par Fakhîr ud-Dîn eAlî. — La Revue du Monde
musulman, par le même (sur le relèvement des peuples musulmans
à la suite des victoires japonaises, et compte rendu d'un article que
leur consacre M. A. Le Chatelier dans la Revue économique
internationale). — Pensées philosophiques. — Souscription 1- \xt\
jiii>f b*l JJ >)** en faveur des enfants pauvres. — Commission
d'assistance à la flotte ottomane (prélèvements volontaires faits en
sa faveur par les commerçants). N° i53o, 6 cha'bân 1 328 (29 juillet
ji 1 août 1910). — Construction de nouvelles routes ; il faudra
profiter de l'expérience acquise. — Conférence patriotique de Edhem
Roûhî Bey, rédacteur au Balkan de Philippopoli et défenseur de
l'Islam en Bulgarie, venu à Andrinople sous les auspices du Comité
Union et Progrès (compte rendu par Fakhîr udDîn €Alî). —
Souscription en faveur des enfants pauvres. — Souscription en
faveur de la flotte. N° i53i, i3 cha'bân i328 (5/ 1 8 août 1910). — Les
Chambres de Commerce et leurs avantages, par Fakhîr ud-Dîn 'Ali
(rôle et importance de ces établissements dans le développement
économique d'une nation). — Pensées philosophiques. — Fondation
d'écoles. — Souscriptions, etc. N° i532, 20 cha'bân 1 328 (12/25
août 1910). — Les prédicateurs du ramadan, par Fakhîr ud-Dîn 'Alî
(sur la nécessité, que reconnaît le Mechyakhat, d'envoyer dans les
campagnes des prédicateurs qui gagneront les habitants à la cause
de la Constitution et de la liberté). — Tuez les corbeaux ! par le
même, (à propos d'un article du Kieuylu de Smyrne ; dégâts causés
par les oiseaux; une prime de 10 paras par corbeau abattu sera
donnée par l'administration). — Fondation d'une école ruchdié. —
Pensées philosophiques.
LA PRESSE MUSULMANE 333 N> i533, 20 chabàn 1 328 (19
août/ier septembre 1910). — Le service militaire est un devoir sacré
pour tous, par Fakhîr ud-Dîn 'Alî. — Vaccinez vos moutons, par le
même (à propos d'un article du Kieuylu relatif aux épizooties). —
Pensées philosophiques. — Lois et décrets. Kh 0 u ddve ndikiâr.
Khoudâvendikiâr est le journal publié en turc, depuis quarantedeux
ans, par le gouvernement du vilayet de ce nom, dont le cheflieu est
Brousse. Mais il ne faut pas conclure de là que ce journal soit un
organe exclusivement officiel. Il accepte la collaboration de toute
)rrv personne pouvant lui donner des articles d'intérêt général ; de la
sorte, il est à la fois le bulletin des actes officiels (1) et un journal
ordinaire. C'est le lundi que paraît le Khoudâvendikiâr. Ses numéros,
de quatre pages in-folio, ont la composition suivante : Actes et
communications du gouvernement local ; Partie officielle (actes du
gouvernement ottoman). Partie non officielle comprenant les actes
intéressant le vilayet : décisions prises par les autorités locales,
mouvement du personnel, etc. Articles sur des sujets variés ;
Annonces officielles ; Nouvelles du vilayet et faits divers. Parmi les
articles les plus importants parus au cours de ces derniers mois,
nous devons relever les « Leçons du régime constitutionnel », Me(1)
Rédaction : Gouvernement du vilayet, à Brousse. — Direction :
Imprimerie officielle. — Abonnements: un an, 20 piastres ; six mois,
i5 piastres. Tout numéro dont la semaine de publication est écoulée
se vend 3 piastres.
334 REVUE DU MONDE MUSULMAN chroûtiyèt brèslèri,
série d'études sur les avantages de ce régime et les qualités qu'on
est en droit d'exiger de lui, la justice par exemple. « Que veulent-ils
? » Ne istèyorlar ? est consacré à la Société secrète auteur de
projets réactionnaires, qui a été découverte à Constantinople, et
dont les agissements peuvent être l'œuvre de fonctionnaires
mécontents d'avoir perdu leurs emplois. Cette question des agents
mis hors cadre, par mesure d'économie, est des plus graves ; il y a
des droits acquis méritant d'être respectés, et le gouvernement doit
non seulement s'abstenir de toute injustice, mais encore ne pas faire
de mécontents. Le journal cite avec éloge l'exemple de Emîr Agha,
de la tribu de Karaketchili, qui, devant à l'État unesommede i.3i6
piastres 3o paras, est venu l'acquitter de lui-même, en une seule
fois. Exemple trop rare, et qui devrait être suivi par tous les
débiteurs du Gouvernement, conclut-on. Comme chez nous, les
annonces légales sont chose fort appréciée dans le monde de la
presse. Et le Khoudâvendikiâr blâme un de ses confrères, qui, pour
s'assurer quelques profits, demande à en avoir le monopole,
promettant de publier ces annonces à demi-tarif. Nous relevons la
fondation d'une Société ottomane industrielle de Brousse, présidée
par Sarrafian Garabet Efendi, et destinée à faire le commerce des
objets manufacturés de provenance ottomane. La Société est fondée
pour cinq ans; son capital sera fourni par l'émission de 7.5oo actions,
d'une livre ottomane chacune. A citer aussi les observations
météorologiques faites à l'observatoire de l'École pratique
d'Agriculture. Mossoul. Mossoul (arabe Mansil, turc Mévsil) est le
journal publié par le gouvernement du vilayet de ce nom. Sa
fondation remontant à vingtsix ans, il est donc l'un des plus anciens
organes de la presse provinciale paraissant actuellement. Rédigé en
langue turque, il paraît, chaque semaine, sur quatre pages petit in-
folio, et insère, dit un avis placé au-dessous du titre, les travaux
profitables aux intérêts de tous (i). En tête de chaque numéro sont
placées les plus importantes des communications officielles. Les
nouvelles du vilayet viennent ensuite. (i) Pour la rédaction,
s'adresser au gouvernement local; c'est l'imprimerie du Vilayet qui se
charge de toutes les questions administratives. Abonnement annuel :
40 piastres (2 medjidiés).
LA PRESSE MUSULMANE 335 Une rubrique spéciale est
réservée aux mouvements du personnel. Puis, après les actes de
l'autorité locale, viennent des articles spéciaux, consacrés à des
sujets divers d'intérêt général. On trouve enfin les avis officiels. Dans
un pays vivant surtout de l'agriculture, le journal officiel, comme de
juste, traite de préférence des sujets agricoles. Mossoul, comme ses
confrères ottomans, mène une vive campagne en faveur de
l'adoption des méthodes et des appareils modernes, sans lesquels on
ne pourra faire de progrès. Il donne aussi des conseils aux
agriculteurs, surveille l'état des cultures, etc. Nous relevons, parmi
les informations officielles, un arrêté du directeur de l'instruction
publique relatif aux fouilles archéologiques. Il rappelle les mesures,
si rigoureuses, prises pour assurer au Gouvernement la propriété des
objets découverts et empêcher que leurs inventeurs n'en disposent.
Des emprunts, Mouktabasât, aux autres journaux mettent sous les
yeux du lecteur des faits intéressants d'ordres divers. Mais, dans cet
organe ottoman et local, on rencontre peu de nouvelles concernant
l'étranger. L. B.
336 REVUE DU MONDE MUSULMAN PRESSE PERSANE
Questions politiques. Les événements se sont rapidement succédé,
depuis deux mois, écrivait en août dernier le Habl oul-Matîn. Un
nouveau cabinet, présidé par Mostooufi'l-Memâlek, a remplacé celui
qui avait été formé au lendemain de la chute de Mohammed cAlî, et
dont les chefs de l'armée victorieuse, Serdâr-é As 'ad et Sipahdàr-é
A'zam, avaient pris la direction, et qui, modifié dans sa composition
à plusieurs reprises, a dû abandonner le pouvoir, à la suite de
dissentiments survenus entre les deux chefs. Mostooufi'l-Memâlek,
homme intelligent et capable, a déjà été ministre à plusieurs
reprises, et ses collaborateurs sont des hommes instruits,
connaissant notre langue et favorables à nos idées. Nous devons
souhaiter que les circonstances leur permettent de mener à bien la
tâche difficile qu'ils ont assumée. Les difficultés avec l'extérieur
subsistent toujours; à l'intérieur, les embarras n'ont pas disparu, et,
dès son arrivée au pouvoir, le nouveau ministère a dû avoir recours à
la force, comme on le sait, pour réduire des Modjâhids turbulents,
qui menaçaient l'ordre dans la capitale. Dans le combat, deux héros
de la Révolution, Sattâr Khân et Bagher Khân, ont été blessés. Mais,
quelle que soit l'attitude des nouveaux ministres, elle ne devra pas
faire oublier les immenses services rendus à la cause de la liberté
par leurs prédécesseurs; si des inimitiés personnelles les ont
empêchés de continuer leur œuvre, la Perse constitutionnelle ne leur
en doit pas moins une grande reconnaissance. Voilà l'opinion du
Habl oul-Matîn : le grand périodique de Calcutta, en saluant les
nouveaux ministres, rend justice à leurs éminents prédécesseurs, «
les deux généraux victorieux... ». Il engage leurs successeurs à
travailler à délivrer la Perse du joug étranger, à rester unis, à ne
prendre, comme agents, que des hommes sûrs, partageant leurs
idées, et à se débarrasser de l'ancien personnel attaché à la réaction
et à l'absolutisme. L'article où ces opinions sont émises est suivi
d'une étude due à un modjtehed qui est en même temps un homme
politique, Cheikh Mohammed Takî, fils du <& Quatrième Martyr »,
autre docteur célèbre en Perse. Cheikh Mohammed Taki voit dans la
religion en général,.
LA PRESSE MUSULMANE ZZj dans la religion musulmane en
particulier, une des conditions essentielles de la civilisation et du
progrès, et ne repousse aucun des moyens d'action dont les faits ont
démontré l'utilité. Personne ne déplore autant que lui que la Perse
manque d'hommes instruits pour conduire ses affaires. En attendant
qu'on en ait formé, il faudra avoir recours aux étrangers pour
organiser les tribunaux, les finances, la police, la gendarmerie. Les
conseillers étrangers qu'on engagera seront pris dans les grandes
puissances européennes autres que la Russie et l'Angleterre. Il serait
imprudent de les choisir dans de petits États, car ils seraient soumis
fatalement à l'influence des Anglais et des Russes. Mais surtout, que
l'on écarte avec la plus grande vigilance les espions des égations ;
qu'on ne tolère plus que, grâce à la complaisance, à la cupidité ou à
la maladresse des fonctionnaires, ils puissent être renseignés sur ce
qui se fait ou doit se faire en Perse. Les Persans ne doivent pas non
plus se décrier eux-mêmes et crier que leurs maux sont sans
remède, pareils au Juif qui, lors du siège de Jérusalem par les
Romains, ne cessait de crier : Malheur au peuple d'Israël ! Une
dernière réforme s'impose : la création d'un Sénat, élément
indispensable à l'équilibre politique. Mais ce Sénat ne devra pas être
un élément rétrograde, ne défendant que les intérêts de
l'aristocratie, comme il le fait trop souvent à l'étranger. Ses
membres, dont la moitié sera nommée par le Gouvernement, et
l'autre moitié élue, traiteront avec la même impartialité toutes les
classes de la société. Depuis la publication de cet article, le prince
'Azod ol-Molk, régent de l'Empire, est mort. On lui a aussitôt donné
pour successeur un homme politique en vue, Nâser ol-Molk, qui
voyageait en Europe au moment de sa nomination, et n'est pas
encore rentré en Perse. Nous souhaitons que Nâser ol-Molk, dont on
vante les mérites, puisse faire sortir sa patrie des embarras où elle
se trouve. Le « parti social-démocrate » adresse aux Persans une
longue proclamation. Il commence par rappeler les services qu'il a
rendus au pays en organisant la résistance contre l'ex-Chah,le«
prince» Mohammed 'Ali, bourreau de la Perse, en défendant Tauris,
en préparant la prise de Téhéran, à une époque où l'absolutisme
triomphait partout et où les amis de la liberté étaient plongés dans le
désespoir. Depuis, il ne s'est jamais écarté de la ligne de conduite
qu'il s'était tracée. Mais qu'ont fait les hommes possédant le pouvoir,
ou cherchant à s'en emparer? Rien que de contraire aux intérêts de
la Perse. L'un aspire à la dictature, l'autre voudrait monter sur le
trône. Tel autre, malgré son
338 REVUE DU MONDE MUSULMAN incapacité notoire,
cherche à devenir ministre de la Guerre, et l'on a vu reparaître, dans
l'armée, les scandales de l'ancien temps, où les plus hauts grades
étaient donnés à des hommes ignorant tout du métier militaire. Le
danger extérieur n'a pas disparu. Les voisins de la Perse songent à
l'envahir ; s'ils parviennent à le faire, toute liberté disparaîtra. Et il se
trouve des journaux pour soutenir cette politique d'iniquité ! Que
dire des mœurs politiques ? Des comités, des partis de toute sorte, y
compris le parti des « adorateurs de l'or », ^èrpèrèst, se sont
formés. Ils font de détestable besogne, poursuivant la satisfaction
d'intérêts personnels, semant la discorde et empêchant toute oeuvre
utile. Le parti social-démocrate ne leur ressemble pas. N'ayant en
vue que l'intérêt public, la justice, la liberté, l'égalité, la défense de
la patrie et de l'Islam, il accueille tous les hommes de bonne volonté.
Aussi fait-il, avec confiance, appel à tous les Persans patriotes. «
Échos des Enfers. » — Nous empruntons à la partie française du
Chark les lignes humoristiques qui suivent, et qui concernent la
tentative réactionnaire de Dârâb Mîrzâ, ce prince persan devenu
officier d'artillerie dans l'armée russe, contre la ville de Zendjan(i). «
Sa Majesté magnanime des Enfers, prise d'un subit accès de spleen
et ne sachant comment s'en défaire, se rappela du T
èléphonophotographe qu'une maison d'Amérique venait de lui
envoyer à titre gracieux... « Il est notoire que les Américains sont de
grands maîtres dans l'art de la réclame. « — Voyons, qu'est-ce que
cet engin ? dit Sa Majesté Diabolique. « Et le roi des Ténèbres,
s'asseyant sur son trône, fit fonctionner la nouvelle machine. « Cet
appareil merveilleux était la plus récente création du génie humain :
non seulement il imitait à perfection la parole, mais il vivait,
reproduisant avec une réalité surprenante ce qui se passait dans les
quatre coins du monde. « Quoique d'humeur très ombrageuse, le
premier coup-d'œil que le potentat jeta sur l'appareil le dérida, et sur
sa sinistre figure un sourire vraiment diabolique se dessina... « —
Tiens ! tiens ! c'est l'histoire ancienne qui recommence sur le plateau
de l'Iran. (i) Nous donnons cet extrait et le suivant pour montrer à
nos lecteurs français l'intérêt qu'ils pourront trouver à la lecture du
Chark.
LA PRESSE MUSULMANE 33q « Qu'ils sont bêtes ces êtres
humains pour se chamailler et se manger entre eux !... « Ces
pauvres malheureux oublient certainement que dans mon vaste
empire il n'existe absolument aucun parti politique, aucune crise
ministérielle, qu'il n'y a aucune distinction entre députés et ministres,
orgueilleux ou concussionnaires, intrigants ou imbéciles, et que
l'enfer est rempli des uns et des autres.... « A ce moment, un petit
diablotin, bien gentil et discipliné, s'approcha du trône et, saluant
bien bas Pluton, lui dit très respectueusement : Majesté, on vient de
nous envoyer de là-haut (parlant de la terre) un drôle de
personnage, un prince persan, du nom de Darab Mirza, qui était
colonel au service de la Russie. Il est inculpé d'avoir voulu provoquer
une rébellion en Perse, près de Zendjan. « Il est accompagné de
plusieurs Cosaques russes, et on a trouvé sur lui, lors de son
arrestation, des proclamations, un drapeau et d'autres objets, dont
l'usage aux Enfers n'a pas encore été bien déterminé... « Notre
magistrature ne sait qu'en faire, car on n'a jamais eu de pareils
précédents, cet extraordinaire délit, qui s'appelle sur la terre
provocation, n'étant pas prévu dans notre Code pénal. « L'Impériale
Majesté, diablement intriguée, fit demander le procureur général de
son Empire. « Ce digne magistrat — personnage expérimenté ayant
fait un long stage dans tous les pays où la Justice est en désaccord
flagrant avec dame Thémis, — ne tarda pas d'accourir. « — Votre
opinion sur le cas ? « — Majesté! répondit-il avec une profonde
courbette, nos Codes ne prévoient pas de cas analogues à celui-ci,
car les provocateurs sont un produit exclusif fabriqué spécialement
en Russie. De pareilles atrocités n'ont jamais eu lieu aux Enfers. « Il
est vrai que, pour créer un précédent, on pourrait bien administrer à
l'inculpé une cinquantaine de coups de bâton, mais je ne saurais
conseiller pareil acte de barbarie, car nous sommes au vingtième
siècle et même ici, aux Enfers, nous devons compter avec l'opinion
publique afin de ne pas devenir la risée de l'Univers. Et puis, que
dirait-on de nous au Ciel ?... « — Que faire, alors ? « — Sire, si Votre
Majesté daigne suivre mon respectueux conseil, je prendrai la liberté
de lui exposer... « — Parlez ! « — Majesté ! faites transporter le
prince sur la terre, en Moscovie. Dans ce pays-là les provocateurs
sont bien vus. Vous rendrez par là un
340 REVUE DU MONDE MUSULMAN grand service à la
Russie, et puis vous témoignerez de votre magnanimité. « — Parfait
! dit Safar. Et Darab Mirza fut transporté en Russie où on l'éleva en
grade. » A. Le Chark a mené une vive campagne contre les
agissements de la presse et de la police russes. C'est en ces termes
que commence la partie française de son n° 101 : « Messieurs les
correspondants russes, « Certains « Messieurs » de la presse russe
n'y vont pas de main morte. « Leur fantaisie malsaine majeurement
nuisible à notre pays prend de jour en jour une plus grande
extension. « Nouvelles tendancieuses, informations alarmantes,
mensonge le plus cru, tout, enfin, est mis en marche pour nous
discréditer aux yeux de l'Europe, et faire accroire à l'opinion publique
du monde entier que notre pays est un brasier ardent de révoltes et
d'insurrections. « Toutes ces manœuvres sont mises en jeu pour
prouver que nous ne sommes pas, soi-disant, en mesure d'établir
l'ordre, et que, par conséquent, le pays étant en danger, il est
nécessaire d'intervenir dans ses affaires intérieures et d'y garder les
troupes russes. « Après avoir fait circuler des kyrielles plus atroces
les unes que les autres au compte du nord de la Perse, ces «
Messieurs » peu scrupuleux commencèrent à s'attaquer maintenant
au sud du pays... » Le fait qui a motivé cet article est le suivant. Une
dépêche venue de Saint-Pétersbourg, et reproduite par les journaux
du monde entier, annonçait qu'Ispahan venait d'être pris par 3. 000
Kachkaïs révoltés : les Kachkaïs qui gardaient la ville, mécontents de
leur chef, Serdâr As 'ad, alors ministre de l'Intérieur, n'avaient rien
fait pour s'opposer à ce coup de force. En réalité, deux chefs
kachkaïs, accompagnés, selon l'usage, d'une escorte armée, étaient
venus rendre une visite amicale au gouverneur d'Ispahan. Après
avoir donné libre cours à son indignation, le Chark conclut en disant
que les correspondants des journaux russes « feraient bien mieux
d'informer la presse étrangère des provocations et des incitations
que les représentants du Gouvernement russe sèment partout ici
d'une main trop prodigue ».
LA PRESSE MUSULMANE 3qi Et nous lisons, dans le n° 102
: « La bureaucratie russe ne trouve plus, à ce qu'il parait, notre
journal à sa convenance, surtout depuis le jour où nous publions une
page en français... » Ceux des nos 92, g3 et 94 adressés aux
abonnés de Russie avaient été retournés à l'administration du
journal; mais on avait eu soin d'enlever des bandes les adresses des
destinataires... sans doute pour les remettre à la police, pensait le
Chark, qui juge de semblables procédés bien dignes du pays qui
employait les services des Rahîm Khân et des Dârâb Mîrzâ. Peu de
temps après, le 18 septembre, le Chark était suspendu par arrêté
ministériel. Des attaques contre le prince Farmânfarmâ et contre
Kavâm os-Saltanè, certaines appréciations sur des faits qui s'étaient
passés en province, ont amené cette mesure de rigueur. Questions
économiques ('). Les questions économiques tiennent une grande
place dans les derniers numéros du Habl oul-Matîn et du Medjlis, qui
réclament, pour la Perse, une législation commerciale, dont le besoin
se fait impérieusement sentir, attendu que rien, jusqu'ici, ne peut
être considéré comme en tenant lieu, et des hommes capables,
connaissant à fond * la science du commerce » (2). Les hommes
font défaut. Il ne manque pas de Persans, dans leur pays et à
l'étranger, rompus à la pratique des affaires ; mais rien n'a été fait,
jusqu'ici, pour former les théoriciens nécessaires. D'un côté comme
de l'autre, le Gouvernement actuel a dû accepter la succession des
siècles passés. Le commerce se fait sans loi, sans méthode, sans
sécurité non plus ; les chefs de satrapies, mouloûkattawdïf, existent
toujours : de Bender-Bouchir à Chiraz, ils sont une dizaine qui
rançonnent les caravanes, rendant les transactions toujours difficiles
et dangereuses. Pour en venir là, il faudra que la Perse s'adresse à
l'étranger; mais elle ne lui demandera que le nécessaire, et ne le
fera qu'en vue de (1) Rappelons qu'il s'agit ici de la presse persane.
(2) Une loi sur les effets de commerce a été promulguée, et le
Medjlis en a publié le texte. Désormais la circulation de ce genre de
valeurs s'effectuera, en Perse, à peu près de la même manière qu'en
Europe. Le délai accordé, pour protester les eflets est fixé à
quarante-huit heures, et la formalité devra s'accomplir devant une
Chambre de commerce. En cas de difficultés, les Tribunaux de
commerce apprécieront.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebookname.com

You might also like