Dependency Parsing
Dependency Parsing
1
ProQuest Number: 10106699
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
uest.
ProQuest 10106699
ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 48106-1346
A b stract
A cknowledgem ents 13
A bbreviations 15
1 Introduction 16
1.1 Scope of the t h e s i s ...................................................................................16
2 D ependency grammar 23
2.1 O verview ...................................................................................................... 23
2.2.1 D efinitions.......................................................................................24
2.6 S u m m a r y .................................................................................................. 58
3 D ependency parsers 60
3.1 Dependency in com putational lin g u is tic s ...........................................61
3
3.1.1 M achine translation s y s te m s .................................................... 61
3.2.2 E x p r e s s io n s ....................................................................................... 71
3.3 S u n u n a r y ........................................................................................................75
4 T he R A N D parsers 76
4.1 O verview ........................................................................................................... 76
4.4 S u m m a r y ........................................................................................................88
5 H ellw ig ’s P L A IN sy stem 90
5.1 O verview ........................................................................................................... 90
6.4 S u m m a r y ....................................................................................................132
7.5 S u m m a r y ....................................................................................................149
10 C o v in g to n ’s parser 217
10.1 O verview ........................................................................................................ 217
10.2 Early dependency g r a m m a r ia n s ............................................................ 217
10.5 S u m m a r y .....................................................................................................228
6
11.3.5 T he sequential p a r s e r .................................................................. 243
11.4 S u m m a r y ...................................................................................................251
13 C onclusion 292
L ist o f F igures
2.2 tree diagram (D -m arker) for Sm art people dislike stupid robots . 33
2.7 F irst phrase stru ctu re analysis of They are racing horces . . . . 39
2.8 Second phrase stru ctu re analysis of They are racing horces . . . 39
2.9 D ependency stru ctu re for They are racing horses. T he sentence
root is racing....................................................................................................40
7.2 dependency analysis of the sentence Whom did you say it was
given t o ? ........................................................................................................139
7.6 a dependency link network for the sentence You can remove the
9.8 the use of visitor links to bind an extracted elem ent to th e m ain
v e r b ...............................................................................................................189
9.9 the use of the visitor link to relate th e extracted elem ent to the
s e n te n c e ........................................................................................................191
9.11 sem antic stru ctu re is very similar to syntactic stru ctu re in W G . 192
12.1 PSG and DG analyses of the sentence Tall people sleep in long
beds ...............................................................................................................258
10
List o f T ables
2.4 Subtrees and com plete subtrees in the DG analysis of the sen
tence They are racing horses shown in Figure 2.9. Only com
11
10.1 m ain features of Covington’s first two dependency parsers . . . . 229
12
A ck n ow led gem en ts
This thesis m ay bear one nam e on its title page but it represents an in
vestm ent of tim e and effort, of wise advice and honest criticism , of practical
support and unfailing love on the p art of many people. I am grateful to them
all.
F irst m ention m ust go to Dick Hudson, who has been so much m ore th a n
ju st a thesis supervisor. Over the years he has selflessly given me his tim e,
and to his family, Gay, Lucy and Alice, who have never failed to respond
well during my tim e in their m idst. Special thanks are due to M ark Huckvale,
Monika Pounder, and a num ber of members of the Word G ram m ar sem inar,
including Billy Clark, John Fletcher, and And Rosta. I have also benefited
enorm ously from the support and encouragem ent I have received as a m em ber
of the Social and C om puter Sciences Research G roup a t the U niversity of Sur
rey. I am grateful to all members of the group, and especially to Nigel G ilbert
for enabling me to fit thesis-w riting into a hectic research schedule, and to
Scott M cG lashan for his expert assistance w ith the HTjgX/ ty p esettin g pack
age. I have gained much from discussions w ith other people a t th e U niversity
Peckham for his persistent belief in the value of NLP research and for his
practical support, and to Nick Youd, Simon T hornton, Trevor Thom as and
13
A significant portion of this thesis is devoted to dissecting other people’s
dependency parsers. I would not have been able to do so w ithout the help of
me. M any of them have read drafts of parts of the thesis, and their comments
have been invaluable. They include Doug Arnold, Paulo Baggia, M ichael Cov
ington, P e te r Hell wig, G erhard Niederm air, Claudio R ullent, Klaus Schubert,
I have lost track of the num ber of friends and relations who have helped
making me laugh. T he generous gift of Ian and M air B unting, who provided
the perfect retreat in which to work w ithout fear of interruption, has hastened
family have provided the sort of long-distance support which always feels close
a t hand.
M ost of all, I want to thank Sarah for p u ttin g up w ith my no ctu rn al w riting
habits, for believing th a t I really would finish this thing, and for being my
friend.
14
A b b reviation s
15
C h ap ter 1
In tro d u ctio n
1.1 S c o p e o f th e th e sis
There are, in contem porary linguistic theory, two different views of gram m at
entities. T he second view denies gram m atical relations basic statu s, instead
seeing them as being derived from more fundam ental stru ctu res, such as con
stitu e n t structures. This la tte r view has predom inated th roughout most of
ing m ajo rity of proposals which posit a distinct syntactic layer assum e th a t
This asym m etry can not legitim ately be a ttrib u te d to any established results
16
gram m atical dependency is alm ost as old as the study of gram m ar, it has, for
over th irty years ago (see Gaifm an 1965), a few years after th e first form aliza
tion of the class of PSGs (Chomsky 1956). By the tim e th e formal definition of
of PSG had been in the public dom ain for a decade, w ith large in ternational
strong) w ith context-free PSG (C FPSG )^, there was little incentive to ab an
don th e now fam iliar and w ell-understood formalism in favour of the unfam iliar
in the sta te it was in around the mid-1960s, w ith only a handful of groups
around the world making any (m odest) advances since then (hardly any of
though still m odest by PSG standards — num ber of theoretical linguists con
ture. U nfortunately, alm ost all linguistic theories based on DG have departed
to some extent from the terra firm a of formal definition.^ Since th e choice
to some lengths to argue the case for DG rath er th an PSG (for example,
erally not found: proponents of theories based on PSG do not typically support
th e choice of PSG w ith argum ents for the superiority of PSG over DG (but
17
see the d ebate in H udson 1980a; D ahl 1980; H udson 1980b; H ietaran ta 1981;
and H udson 1981b for some responses to argum ents against PSG ).
This is to present the issues as being neatly polarized. In fact, m ost lin
and constituency in a single stru ctu re, albeit one which owes more to the
stitu e n t. However, th ere are com plications here since a num ber of syntactic
theories have been charged w ith uncritically adopting unform alized versions of
X theory (P ullum 1985; K ornai and P ullum 1990) — th e very charge laid a t
less argued against. In the sm all num ber of cases in which it achieves passing
m ention, the sam e reasons for not using DG are employed: first, th e only
no incentive to work w ith the less fam iliar system; second, alm ost n othing
else is known form ally ab o u t DG so until such tim e as additional solid results
th a t framework.
18
Let us consider these points in turn. F irst, then, the equivalence of DG
and C FPSG . In their m onograph Linguistics and Inform ation Science Sparck
are trivial from alm ost every point of view (see G aifm an 1965).
which has hardly ever been raised in the literature. H ays’ claim th a t “a phrase-
stru c tu re parser can be converted into a dependency parser w ith only a m inor
a lteratio n ” (Hays 1966b: 79) is presented w ithout argum ent or illu stratio n so
its statu s is, a t best, uncertain. A sem inal text in com puter science bears the
if th e net effects of the program are to rem ain co n stan t. “T h e developm ent of
19
tu re ” (Goldschlager and Lister 1982: 65). Thus it cannot be taken for granted
a priori th a t fam iliar phrase stru ctu re parsing algorithm s will m ap effortlessly
W inograd writes:
Once again, this claim is presented w ithout fu rth er argum ent or evidence.
dependency parsing systems in existence is severely lim ited in com parison with
th e num ber of phrase stru ctu re parsing system s. It is also th e case th a t those
circulated privately. Some accounts have been terse to th e point of leaving most
20
in dependency parsing compare w ith those which are widely used and well-
understood in phrase stru ctu re parsing. This stu d y focuses on two hypotheses:
H y p o th esis 1
It is possible to construct a fully functional dependency parser
guarantees some m easure of stru ctu ral correspondence a t each point in the DG
and PSG parse trees (see C hapter 2 below). However, it is not the strongest
H y p o th esis 2
It is possible to construct a fully functional dependency parser using
DG rules encode inform ation, as com pared w ith the way in which PSG rules
ceeded beyond th e limits of w hat has been defined in a m athem atically rigor
ysis on the parsing of the context free backbone of these theories (i.e. th a t
which can be m apped onto a Gaifm an gram m ar). I shall not be concerned
in this thesis to make any qualitative judgem ents between DG and PSG qua
descriptive devices.
21
1.2 C h a p te r o u tlin e
through to the present day are charted in th e la tte r p art of the chapter.
the oldest parser is presented first and the m ost recent parser is presented
last. Needless to say, the developm ent phases of some parsers overlapped
dency parsers. C hapter 12 sets out some elem ents of a first taxonom y of
parsing algorithm s.
22
C h ap ter 2
D e p en d en cy gram m ar
2.1 O v e r v ie w
w ith a notion like gram m atical dependency is th a t it can come to m ean all
theoretical linguistics.
defined w ith full m athem atical rigour. Accordingly, these system s are taken as
m ars are defined, together w ith a decision procedure for determ ining w hether
23
PSG , the equivalence relation employed is described and scrutinized.
In practice, very few — if any — linguists have used G aifm an’s system
in the description of n atu ral language w ithout making use of various aug
Section 2.3. Those which m ust necessarily be exam ined in th e course of the
chapters. Section 2.4 charts the origins and developm ent of DG in linguistic
theory.
DG are identified, namely Case G ram m ar, C ategorial G ram m ar, and Head-
D riven P hrase S tructure G ram m ar. A lthough a full description of these fram e
works is not appropriate here, their basic concepts are introduced and some
2.2 G a ifm a n g ra m m a rs
2 .2 .1 D e fin itio n s
D efinition
A d e p e n d e n c y g r a m m a r A is a 5-tuple
A = (T,C,X,7e,^)
where
24
2. C is a finite set of category symbols. For the purposes of exposition,
Every word belongs to a t least one category and every category m ust
th a n one category.
which m ay derive directly from it w ith their relative positions. For each
the sequence. A rule of the form X{*) allows X to occur w ithout any
dependents.
E x a m p l e
dislike, sm art, stupid} , {N, V, A ) , {(people, N), (robots, N), (dislike, V),
(sm art, A), (stupid. A) } , {N(*), N(A ,*), V(N,*,N), A(*) }, {V} ).
C o n v e n t io n
thus: *(% ).
25
Following this convention, G of A i may be represented as *(V).
C o n v e n t io n
words X such th a t ( t , X ) is in A.
C o n v e n t io n
defined in A .
*(V)
N(*)
N(A,*)
V(N,*,N)
A(*)
N:{people, robots}
V:{dishke}
A :{sm art, stupid}
26
said to be of category X .
D EFINITIO N
are true:
tween P and Q.
For every d we define another relation d* where Pd*Q iff there is a se
0 < 2 < n — 1.
(c) If Pd*Q and R is betw een P and Q in sequence (i.e. either S { P) <
6"(P) < 5'(Q) or 5"(P) > 5 '(P ) > S (Q )), then Pd-'Q.
1 ’ "71, and the order in which these words occur in the sentence is
of R.
27
4. T he occurrence which governs th e sentence (i.e. which depends on no
OÎÇ.
D efinition
D EFINITION
dependency tree exists for a sequence of words which is not a sentence. A lan
tion, and only for these, there are corresponding dependency trees.
gram m ar of type A:
1. one and only one occurrence is independent (i.e. does not depend on any
other);
28
To aid discussion, I shall adopt the following terminology. All occurrences of
word in a sequence (i.e. the word which depends on no other) shall be called
E x a m p l e
Given these definitions, the sentences in (1) belong to the language defined
tion, sequences which are not well-formed in respect of a particu lar gram m ar
E xam ple (2a) is ill-formed because dislike is a V, and Vs require two depen
dents, one preceding and one following. In this case, no following dependent is
present. Exam ple (2b) is ill-formed because all of the words are not connected
29
category N for dislike). None of the words in (2c) is missing a dependent. How
V, whereas here they bo th occur before it. Exam ple (2e) is ill-formed because
G ra m m a rs.
follows:
[a,h] shall occupy all the positions from Pa to Pb, where 1 < a < 6 <
M ax.
30
3. For each word Wi in the string retrieve all th e classes X i to assigned
4. For each word class X a t cell [j, j] in the table (1 < j < M a x ) determ ine
w hether a rule of the form % (*) exists in A. If so, insert %(*) in the
table a t cell
6. Consider each sequence of V adjacent cells in the table. For each se
quence which consists of exactly one word class symbol X and V -1 trees,
arranged in th e order
Fi ) X^ Yj , . . . , Y y —1
follows:
31
Hays presents his algorithm informally, so it has been necessary to recon
Hays also outlines a generative procedure for enum erating all the strings
h a y s _ g e n e r a to r .p l in A ppendix A.3.
2 .2 .3 R e p r e s e n tin g d e p e n d e n c y s t r u c tu r e s
T here are a t least three conventions for presenting dependency structures di-
for exam ple). Dependencies between word occurrences are signalled by links
betw een nodes. By convention, heads are located nearer the top of th e diagram
represented are long and involve a lot of alternation betw een left-pointing and
right-pointing dependencies.
a lower node to a higher node then th e sym bol corresponding to th e lower node
32
dislike
robots
sm art stupid
Figure 2,2: tree diagram (D-m arker) for Sm art people dislike stupid robots
33
'V '
m eans of directed arcs. I shall adopt the convention of directing arcs from
Figure 2.3 is equivalent to Figures 2.1 and 2.2 in th e inform ation it expresses.
Some authors (such as M atthew s 1981) draw arc diagram s w ith the arcs
below the symbols in the sentence rath er th an above them as shown here.
tion appear below th e sentence symbols, whilst th e rest appear above them
head robots by d islik e which depends on neither stu p id nor robots^ neither
which connects a word w ith its node is called its p r o je c tio n . N ote th a t in
Figure 2.2, links and projections do not intersect. Such tree diagram s and their
detected. D iagram s like Figure 2.4, and the corresponding syntactic structures
34
sm art people stupid dislike robots
Figure 2.5: arc diagram for *Sm art people stupid dislike robots
a d ja c e n t.
35
Old sailors tell tall tales
Figure 2.6: D ependency stru ctu re of Old sailors tell tall tales
lexical category) th ere is a strongly equivalent DG. His proof is too lengthy
Pickering and B arry (1991) have recently called a ‘dependency co n stitu en t’.)
in Figure 2.6 includes the subtrees shown in Table 2.1. O f these, only those
subtree and every com plete subtree is coextensive w ith a co n stitu en t. Two
stru ctu ral entities are c o e x te n s iv e if they refer to exactly th e sam e elements
in a string.
36
Old
Old sailors
Old sailors tell
Old sailors tell tall tales
sailors
sailors tell
tell
tell tall tales
tell tales
tall
tall tales
tales
Old
Old sailors
Old sailors tell tall tales
tall
tall tales
37
LABEL SUBTREE
Old Old
sailors Old sailors
tell Old sailors tell tall tales
tall tall
tales tall tales
which depends on no other word in the same subtree. Labels for th e com plete
subtrees of the dependency tree shown in Figure 2.6 are given in Table 2.3.
Let each phrasal constituent in a PSG also have a label, where the label
given as ‘N P ’, etc).^
corresponding com plete subtree. In phrase stru ctu re theory, a string is said to
correspond relationally and (ii) every com plete subtree has a label which is
m inal alphabet, and (ii) for every string over th a t alp h ab et, every stru c tu re
other.
phrase stru c tu re interpretations of which are shown in Figures 2.7 and 2.8.
^All subtree and phrase labels must be unique within each sentence. If necessary this
can be effected by providing labels with unique integer subscripts.
38
p VP
AuxP V P
Aux
Figure 2.7: F irst phrase stru ctu re analysis of They are racing horces
P P
V NP
A djP N
Adj
Figure 2.8: Second phrase stru ctu re analysis of They are racing horces
39
They are racing horses
Figure 2.9: D ependency structu re for They are racing horses. T he sentence
root is racing.
LABEL SUBTREE
they they
are are
they racing
they are racing
racing they are racing horses
are racing
are racing horses
racing
racing horses
horses horses
Now consider th e dependency stru ctu re in Figure 2.9. This includes the
ure 2.9 and every com plete subtree in Figure 2.9 is coextensive w ith a con
every com plete subtree has a label which is substantively equivalent to the
40
LABEL CO N S T I T U E N T
NP they
s they are racing horses
AuxP are
VP are racing
VP are racing horses
NP horses
Table 2.5: C onstituents in the phrase stru ctu re analysis of the sentence They
are racing horses shown in Figure 2.7
However, only Figures 2.7 and 2.9 share substantively equivalent labellings so
2.3 B e y o n d G a ifm a n g ra m m a rs
In presenting his work on PSG s, Chom sky frequently and explicitly represented
them as a form alization of the stru ctu ralist Im m ediate C onstituent model (e.g.
Chom sky 1962). This claim has recently been contested by M anaster-R am er
and Kac (1990), thus highlighting some of the difficulties inherent in trying to
T he issues are som ewhat clearer in the case of DG, since Gaifman, as
translation program . Hays, on the other hand, represents G aifm an’s work as
and dependency theory in his 1964 Language paper, his sum m ary of w hat is
sis).
41
I have been unable to find any discussions anyw here in th e literatu re which
theories of language have m ade use of G aifm an’s formalism. This contrasts
sharply w ith the uptake of Chom sky’s PSG formalism, and particularly C F
PSG. T he only DGs which incorporate a more or less intact version of Gaifman
gram m ar are those which use it as the base com ponent in a transform ational
being allowed to m anipulate features in arb itrary ways (e.g. S taro sta 1988;
rule for intransitive verbs which enforces subject-verb agreem ent (ad ap ted from
Its single dependent m ust be a preceding nom inative case noun, also of person
‘X’ and num ber ‘Y ’. ‘X’ and ‘Y’ are variables over feature values.
G aifm an’s definition of DG. So long as the feature stru ctu res are sim ply a r
rangem ents of symbols draw n from a finite set, th e generative pow er rem ains
42
w hat happens when a PSG is augm ented by th e addition of feature structures.
G azdar has sum m arized this as follows:
feature systems and (ii) rules are perm itted to m anipulate th e fea
U nfortunately, all of the DGs which introduce feature structures also introduce
other extensions, whose effects on the generative capacity of th e gram m ars are
com plete subtrees (e.g. prepositional structures in English) have two roots, or
rather, a single root which is the union of th e features of two of th e words in
heads (H udson 1990: 117), while Pericliev and llarionov (1986), Sgall et al.
A thesis of this kind can not proceed w ithout giving some atte n tio n to
gram m ars, for exam ple those of th e Greek scholars of th e A lexandrian School,
and especially Dionysius T h rax (c.lOO B.C.) whose work drew heavily on the
Stoic tra d itio n of linguistic studies. T h rax ’s T éch n t gram m atikë was th e in
43
lus (second century A.D.) whose work “foreshadowed the distinction of sub
ject and object and of later concepts such as governm ent...and dependency”
(Robins 1979: 37). The work of T h rax and Apollonius was further developed
by a num ber of L atin gram m arians, m ost notably Priscian (c. 450 A.D.). An
gram m arians, m ost notably Panini (some tim e betw een 600 and 300 B.C.). In
P a n in i’s gram m ar “the verb, inflected for person, num ber, and tense, was
th e verb, and of these the most im p o rtan t were th e nouns in their different
those of the B asra and Baghdad schools. In A rabic gram m ar, a governor
was said to govern {^amila lit. ‘do, o p erate’) a governed (m a^m iil).
M any of the details of m odern DG are m ade explicit for th e first tim e in the
writings of Ibn A l-Sarraj (died 928A.D.). For example, a word m ay not depend
This finds support in the writings of Ju rjan i (died 1078), who insists th at:
m any dependents, although dependents could have only one head. D ependency
"*A11 quotations use Owen’s translation and reference Owens (1988) rather than the orig
inal sources.
44
was unidirectional and there was no interdependence. T he m ediaeval A rabic
th e unm arked word order. A detailed guide to mediaeval A rabic gram m ar can
as the m odistic and speculative gram m arians, and especially, in the work of
M artin of D acia and Thom as of E rfurt (more details of their work can be
found on page 217ff below). According to H erbst et al. (1980; 33) who quote
Engelen (1975: 37), some of the central ideas of DG were used in G erm any
developm ent of DG was m ade in the 1950s by the Frenchm an Lucien Tesnière.
Tesnière was the first person to develop a semi-formal ap p aratu s for describing
ra th e r program m atic volume (Tesnière 1953) which was not very well received
by reviewers (for example, see G arey 1954). Tesnière died in 1954 but Jean
Fourquet edited his unpublished works into a single volume — Élém ents de
Tesnière’s posthum ous volume consists of three parts labelled ‘la connex
ion’ (dependency), ‘la jonction’ (coordination) and ‘la tran slatio n ’ (word class
not. T his is now the stan d ard view am ongst dependency gram m arians.^ (A
‘Connexion’ section of the book presents in axiom atic fashion m any of the
®Brief descriptions of some approaches to coordination in DG can be found on
pages 139, 168, and 188. For a useful overview see Hudson (1988b).
45
principles which have come to define and to distinguish DG. For exam ple (my
1929. B arkhudarov and Princip had done likewise in 1930, and Kruckov and
Svetlaev had used stem m as in a book published in early 1936. In spite of this
46
DG as an explicit system for linguistic description. C ertainly it was Tesnière’s
work which did more th a n anyone else’s to publicize DG. Had his volume been
published any tim e other than in the im m ediate afterm ath of th e publication
of C hom sky’s Syntactic Structures^ DG m ight have been taken seriously by a
valency. T h e valency of a verb is its potential for having other words depend
on it. T hus, an intransitive verb takes one dependent, a tran sitiv e verb takes
two, a ditransitive three, etc. In addition to these com plem ents which m ust
adjuncts. Com plem ents subcategorize the verb, whereas adjuncts m odify it.
(1987: 61), it can be found in the earlier writings of K acnel’son (1948: 132)
B aum (1976: 32) claims th a t R ockett (1958: 249) uses the term ‘valence’
independently of Tesnière.
hand, concentrate on the pivotal role played by the m ain verb in n a tu ra l lan
th e general sense of Fillmore (1968)—of the verb. Two largely disjoint re
47
valency gram m ar [valenzbibliographie] which includes 2377 entries, only 294
difficult to see why these separate com munities still exist since a num ber of
T he influence of Tesnière’s work has reached into alm ost every p a rt of the
world where language is studied, b u t th e effects have not always been the same.
words:
has not been active in the DG field since the early 1960s when he briefly exam
ined dependency gram m ars from a com putational point of view (Gross 1964).
Tesnière’s work had more influence in G erm any (E ast and W est),
ble PSG s available in the 1960s and 70s. O ne of the first large-scale
(Helbig and Schenkel 1969), adjectives (Som m erfeldt and Schreiber 1974),
and nouns (Som m erfeldt and Schreiber 1977). T h e M annheim school under
48
Ulrich Engel and Helmut Schumacher began by producing an alternative va
lency dictionary of G erm an verbs (Engel and Schum acher 1976) b u t they pro
E ngel’s gram m ar of G erm an (Engel 1977) was possibly the first a tte m p t to
describe all of the m ajor phenom ena of a single language w ithin a dependency
framework. O ther G erm an dependency theorists include Jiirgen Kunze and his
colleagues in E ast Berlin (e.g. Kunze 1975) and Heinz V ater who developed a
In G reat B ritain, John Anderson (also a G erm anist) developed ‘Case G ram
m a r’, a com bination of DG and localist case (A nderson 1971; A nderson 1977).
grew out of his earlier research in Systemic G ram m ar (H udson 1971) and was
DDG in favour of a new theory, ‘Word G ram m ar’ (H udson 1984), which is
tion to these dependency theories, at least two B ritish scholars have used DG
in syntax textbooks (M atthew s 1981 and Miller 1985) and R odney H uddleston
has published two gram m ars of English which incorporate insights from DG
tialov and Igor M el’cuk — used dependency as the basis of m achine tra n s
49
has been developing his dependency-based ‘M eaning-Text M odel’ of language
(M el’cuk and Zolkovkij 1970; Mel’cuk 1979; M el’cuk 1988). P e tr Sgall’s group
basic and constituent stru ctu re plays no p a rt (Sgall et al. 1986). I am aware
of some ongoing dependency research in Bulgaria b u t I have not seen any En
glish papers other th an those by Pericliev and Ilarionov (1986) and Avgusti-
nova and Oliva (1990). A num ber of slavists working in th e W est have also
used some version of DG in their work, e.g. David K ilby (A tkinson et al. 1982)
Jan e Robinson, and Stanley S taros ta. Hays worked for th e RAND C orporation
explored the uses of DG for m achine translation and also investigated the
In the late 1950s, H aim G aifm an, had collaborated w ith Bar-Hillel and Sham ir
in a study of C ategorial G ram m ars and PSG s which proved for th e first tim e
th e early 1960s, while he was based in th e M athem atics D epartm ent of the
50
G aifm an carried out the work described at the beginning of this chapter. His
sem inal paper Dependency system s and phrase structure system s appeared as
a RAND internal report in May 1961, although it was not published in a m ajor
journal u n til 1965, a year after H ays’ article m aking G aifm an’s work accessible
was carried out at the end of the 1960s while she was employed a t IBM ’s
W atson Research Center. The m ain objective of her work was to explore
the ways in which Fillm orean case gram m ar could fit into a transform ational
fram ework. Her conclusion was th a t a transform ational gram m ar should have a
DG ra th e r than a PSG base com ponent (Robinson 1969; Robinson 1970). The
work of V ater m entioned above (Vater 1975) was a developm ent of R obinson’s
years has been m ade by S tarosta a t the U niversity of Hawaii. Since th e early
in C hapter 8.
(K odam a 1982). However, very little theoretical DG work has so far been
done in Japan.
last few decades. Three of these m erit special a tte n tio n here, nam ely Case
51
2 .5 .1 C a se g r a m m a r
A lthough these sentences vary considerably in their surface forms, th e sem antic
relationships they express rem ain constant. Punch is the agent of th e hitting
action; Judy is on the receiving end of th e hittin g action; the club is the
(Fillm ore 1968), formalizes these relationships. The sem antic deep structure of
T he m odality com ponent carries features of tense, m ood, aspect, and negation
deep case relations in the sentence. Typically these cases are associated w ith
th e m ain verb. In Fillm ore’s original version of case gram m ar there were six
OBJECTIVE. (This num ber has varied widely between different instantiations
of case gram m ar). T he case fram e for the verb hit would include agentive,
objective, and in stru m en tal case slots, where each slot can be filled by phrases
form at, or giving a graphic representation of case structures using arc dia
gram s. Fillm ore him self acknowledges his debt to Tesniere and o ther depen
dency gram m arians (Fillm ore 1977: 60). However, I believe there are good
pendency stru c tu re has been found, one option is to use it as a guide to assign
a case structure. However, gram m atical relations and case relations are n o t
52
necessarily coextensive. In (4a), the o b je c t iv e case is realized by th e object
by the fact th a t while some dependency gram m arians make extensive use of
case in their theories (e.g. Anderson 1971, 1977; S tarosta 1988), others make
use of alternative sem antic frameworks (e.g. Covington 1990b; Hudson 1990).
O ur concern here is w ith the construction of syntactic structures and not se
m antic structures. The question of which sem antic framework is m ost appro
the question we are tackling here. Dependency and case — though superficially
relations holding between events and p articipants, is also outw ith th e scope
of this thesis. T he presence of the word ‘dependency’ in its title should not
2 .5 .2 C a te g o r ia l g r a m m a r
C ategorial gram m ars (CGs) trace their origins from a num ber of devices devel
oped in th e field of logical sem antics, specifically Lesniewski’s theory of sem an
tic categories (Lesniewski 1929) which brought together insights from H usserl’s
logical types (W hitehead and Russell 1925). Lesniewski’s theory was refined
in which operators/ functors are w ritten im m ediately to the left of their argu-
53
m erits). In a gram m ar of the sort envisaged by Ajdukiewicz, there are two
a //3
w here a and (3 can be either variety of category, prim itive or derived. W hen
states th a t, given any string of two category symbols a / ^ and /?, replace the
3.
analysed as follows:
X y z
A /B B /C C
(5)
B
A
to form a com posite category and the resulting category label is w ritten below
th a t line. The sim ilarities to phrase stru ctu re should be obvious; in this case
we can generate th e same string and th e same constituent stru ctu re (7) w ith
(6) A A /B B
B B /C C
A /B —>X
B/C-^y
C z
54
A
A/ B B
B C
(7)
(Bar-Hillel 1953): (i) assignm ent of words to more th a n one category was
allowed, (ii) a new kind of complex category — a \/3 — was introduced, and
(iii) a new com position rule was introduced to deal w ith the new kind of
category: given a string of any two symbols a and a \/? , replace the string
w ith /).
A CG is unidirectional if its complex categories are all either of th e form
In his seminal paper The m athem atics o f sentence structure, Joachim Lam-
and raising (Lam bek 1958). These rules — or m inor variants of th em — have
A pplication
X/YY X
YY\X X
These are the rules of com bination we have already encountered. If a noun were
55
C om m u ta tiv ity
(X \Y )/Z X\(Y/Z)
C om position
X /Y Y/Z -4 X/Z
X \Y Y\Z X\Z
Raising
X - , Y /(X \Y )
X - , Y \(Y /X )
th a t it can only occur in object position. The raising rule allows an unm arked
object position.
Interest in CG greatly increased during the 1970s due to the influ
m ât ics and, in particular, his P T Q gram m ar (Thom ason 1974; see also
due to the influence of David Dowty (Dowty 1982, 1988) M ark Steedm an
(Ades and Steedm an 1982; Steedm an 1985, 1986), and others. M any different
variants of Lam bek’s rules are currently in circulation. Steedm an’s C om bina
tory Categorial G ram m ar (CCG ) offers one of the m ost interesting examples.
CCG analyses have been offered for particularly difficult non-context-free con
other claim for CCG is th a t it allows increm ental (i.e. strict left-to-right) struc
This is understandable, since there is an obvious sim ilarity betw een a DG rule
such as the one shown in (8a) and a CG category such as th e one shown in
(8b).
56
(8)
a S(N,*,N)
b N \S /N
However, behind these surface similarities lie the rules of CG. The basic
raising.
parsers lies not in their parsing strategy so much as in the p articu lar form
and effects of the com bination rules they employ. Later I shall note in passing
2 .5 .3 H e a d -d r iv e n p h r a se s t r u c t u r e g r a m m a r
sem antics developed by Carl Pollard and Ivan Sag (Pollard and Sag 1988). It
differs from stan d ard PSG in the extent to which inform ation is stored in the
(9)
This states th a t love is a verb which, in its base (infinitival) form subcategorizes
obliqueness, w ith more oblique argum ents appearing to th e left of less oblique
argum ents. This is further illustrated by th e following exam ple, which shows
(10)
57
(a) X (b) XP
From the point of view of this survey, th e identification of argum ents by posi
of inform ation stored at the right (i.e. lexical) level to make it into a DG. It
makes use of a small num ber of phrasal rules. These effectively add an ex tra
(phrasal) layer of stru ctu re above each head word. For every phrase of type X
of X are copied to XP. Thus, where DG would build the stru ctu re shown in
Figure 2.10 (a), HPSG would build the stru ctu re shown in Figure 2.10 (b).
close is a question which I shall not address further here. For inform ation on
term ‘dependency gram m ar’ as used in this thesis. It has done so first by
58
rest on a foundation which is expressible in term s of a G aifm an gram m ar.
which is n ot, three examples of gram m atical theories which lie ju st outside DG
59
C h ap ter 3
D ep en d en cy parsers
In the la tte r p a rt of the last chapter I traced the origins and developm ent of DG
in theoretical linguistics. In this chapter I chart th e origins and developm ent
T he designer of a PSG parser has at his or her disposal th e whole com puta
tional linguistics literature which describes a host of tried and tested techniques
sibilities for PSG parsing. For exam ple, th e designer of a parser m ust decide
w hether to build syntactic stru ctu re top-dow n, bottom -up or some com bina
tion of the two. Not only is there a serious lack of published descriptions of
parsing and PSG parsing are slight variations on a single them e. This may
parsing when there are no non-term inal nodes in a tree.^ O ne of th e m ain ob
problem space.
^The extensive bibliography of N a tu ra l Language P rocessing in the 1980s compiled by
Gazdar et al. 1987 includes only 9 entries indexed under ‘dependency’ (excluding non-DG
senses of the word).
^As we shall see below, a top-down/bottom-up distinction can be made in connection
with dependency parsers but it is not exactly the same as the more familiar PSG distinction
(cf. Chapter 12).
60
This chapter begins with an introduction to dependency in com putational
which I shall use for the sake of clarity in the survey of existing dependency
ber of system s have been developed from th e late 1950s onwards. These can
tem s. The next eight chapters present some of these systems in more detail.
3 .1 .1 M a c h in e tr a n s la tio n s y s t e m s
T h e early 1960s
In the early 1960s, there were two m ajor dependency-based M T projects. The
first of these was based at the Moscow Academy of Sciences. Amongst the
scholars associated w ith the project were Sergei Fitialov, O.S. K ulagina and
Igor M el’cuk. Very few — if any — docum ents describing th e project in detail
b er of the project papers (Hays 1965). Since these papers are not discussed
circulation, those most im m ediately relevant to our present concerns are re
produced below.
61
T he cod in g o f words for an algorithm for syn tactic analysis
(M artem ’yanov 1961)
Word classes ADr, ADI, AG, PG r, and PG l are defined, where
A =active, P=passive, D = depend, G =govern, r= rig h t, l=left. An
active governor sweeps up passive dependents. A parsing routine
is discussed in p a rt, including the effect of English inflections on
word class.
based N LP systems. (T he abstracts tell us, for exam ple, th a t a t least three
scholars were involved in the developm ent of a t least three parsing algorithm s).
62
The second m ajor dependency-based M T project was sponsored by th e
the work of Tesniere. Instead he learned about DG from the Soviet scholars.
It seems th a t there was a surprising am ount of com m unication betw een the
group in P rague in the early 1960s (Sgall 1963). No inform ation on this work
T he m id 1980s
(Johnson et al. 1985; Arnold 1986; Arnold and des Tombe 1987) which has
a t the Universities of Essex and U trecht (Arnold and Sadler forthcom ing). The
cussed in C hapter 7 of this thesis. Sgall’s group in P rague has recently been de
tion (K irschner 1984; H ajic 1987; Sgall and Panevova 1987; H ajicova 1988).
3 .1 .2 S p e e c h u n d e r s ta n d in g s y s t e m s
In at least two projects, dependency parsers have been used to process lattices
th a t its rules and structures are word-based and can readily be associated w ith
th e basic units of recognition in lattices, nam ely w ord hypotheses; and second,
63
th a t DG is well-suited to combining top-down and bottom -up constraints in a
Niederm air and his colleagues a t Siemens in M unich (N iederm air 1986) is
brid of PSC and basic valency theory. A first-phase augm ented context free
phrase stru ctu re parser identifies and builds the m ajor phrases in a sentence.
3 .1 .3 O th e r a p p lic a tio n s
language interface which — in theory a t least — can sit on top of any database
w ith m inim al custom ization. T he project has been running since 1982. The
64
3 .1 .4 I m p le m e n ta tio n s o f th e o r ie s
So far we have considered only DG-based NLP system s which were designed
or database access. However, a num ber of systems have been built in order
theories which have most obviously spawned this kind of activity are Lexicase
(S tarosta 1988) and W ord G ram m ar (Hudson 1984, 1990a). Their im plem en
tation has taken place only in recent years. T he fact th a t more dependency-
based theories have not been im plem ented reflects th e fact th a t there has
based on Lexicase have been produced so far (S taro sta and N om ura 1986;
W ord G ram m ar have been im plem ented (e.g. Fraser 1989a; Hudson 1989c).
Some of the work done by Sgall’s group in Prague is directed tow ards imple
3 .1 .5 E x p lo r a to r y s y s t e m s
in m ind. Some of the m ost interesting and useful results have em anated from
CETIS Research C entre in Ispra, Italy.^ O ther research was carried out by
65
at th e U niversity of Brussels. A lthough the work these groups carried out is
bibliography.^
A u to m a tic analysis
(Lecerf 1960)
The ‘conflict’ program tests each item against th e adjoining, al
ready constructed phrase and either subsumes it as an additional
dependent or makes it the governor of a new, extended phrase. The
result is a chameleon, looking like both a phrase stru ctu re diagram
and a dependency diagram.
For a t least a decade and a half, Jürgen K unze’s group in F a st Berlin has
been developing a version of DG for use in com puter applications. This work
U nfortunately, very little of K unze’s m aterial has been available for inspection
at th e tim e of w riting.
Since the early 1970s P eter Hellwig has been developing his PLA IN system ,
66
During the last few years, Michael Covington at th e U niversity of Georgia
the parsing of free word order languages. C ovington’s most recent parser is
A simple dependency parser has been designed and im plem ented by Bengt
Sigurd at the University of Lund. This work was inspired by Sigurd’s reading
of Schubert (1987).
Nagao 1990).
based NLP. This is sum m arised in Figure 3.1. Projects identified by heavy lines
are discussed in detail in C hapters 4-11. Notice how th e early enthusiasm and
alm ost nothing in the late 1960s and throughout th e 1970s. It is interesting to
see how interest has picked up throughout the 1980s and, at the s ta rt of the
search order (left to right, right to left, etc), num ber of passes (single pass,
m ultiple passes, etc), search focus (w hat is being searched for?), and am bi
guity m anagem ent (how are choice points and m ultiple analyses handled?).
Verbal descriptions of the algorithm s are presented b u t these can not always
67
^Moscow (MT)
RAND (MT)
EURATOM
? Sgall__________ _ _
Hellwig
Kunze
Kielikone
EUROTRA
DLT
Lexicase
Word Grammar
CSELT
NTT
Covington
MiMo
Sigurd
IBM Tokyo
\ I I I I I I
1960 1970 1980 1990
68
task to do this thoroughly. First of all, it could involve th e design of a whole
new representation language whose syntax and sem antics would have to be
each parser in the kind of detail which would make the task com parable to
3 .2 P A R S : P a r sin g A lg o r ith m R e p r e s e n ta
t io n S ch em e
3 .2 .1 D a t a s tr u c tu r e s
C on stan ts
Integers, and lower case identifiers are allowed. Two list-related constants are
69
V ariables
th a t they all begin w ith an upper case character. All variables are global unless
thing pointed to. The context ought to make the in terp retatio n clear in each
case.
O ther nam ing conventions include List (a global list). Stack (a global stack),
(assigns) operator.
Stack. T he action
pop(Stack)
push(Element)
elem ents onto stacks other th an the default stack by means of th e action
70
push(Stackl, Element)
T he action
empty(Stack)
top(Stack)
a list elem ent, then C-1 is the previous elem ent and C+1 is the next elem ent.
append(Element) or append(Listl,Element)
and an elem ent can be removed from the list by means of th e action
remove(Element) or remove(Listl,Element).
first(Listl)
and last(Listl).
71
T he length of a list can be found by m eans of th e action
length(Listl)
3 .2 .2 E x p r e s s io n s
either be simple, consisting of one or more actions, or stru ctu red condition-
(11)
I F condition(s)
T H E N expression(s)
(E L S E expression(s))
( 12)
N: expression
where ‘N ’ is an integer.
C o n d itio n s
table below.
O p e ra to r N am e
= equals
depends
U unifies
E q u a lity T he = (equals) operator is used to test two item s for identity. The
72
D ep en d en cy The (depends) operator is used to test for dependency. The
test succeeds if the elem ent on the RHS of the operator already depends on
the elem ent on the LHS of the operator or if it can be m ade to depend on the
LHS elem ent (i.e. there is nothing in the gram m ar or th e sentence to prevent a
D isjunctions of conditions are possible using the ‘V’ (or) operator. For ex
ample:
(conditionl V condition2)
A ction s
73
C:=l
C thus
C:=6
or thu s
C:=C+1
re tu rn to a prior state once a goto action has been executed. Expressions are
goto(3)
74
Succeed a n d fail succeed signals th a t a parse has succeeded, fail signals
that parsing has failed. Both actions term inate the parse immediately.
O th e rs As noted above, other actions include the stack-related pop and push,
3.3 S u m m a ry
In this chapter I have charted the rise of ‘applied’ DG, i.e. DC in service of
NLP. I have shown how an increasing num ber of NLP system s are being based
explore novel parsing algorithm s and to im plem ent linguistic theories. Lack of
me) renders it impossible to include here a detailed exam ination of every sys
tem nam ed in the survey. The following eight chapters describe those parsers
for which most inform ation is currently available. A t least one representative
75
C h ap ter 4
T h e R A N D parsers
4.1 O v e r v ie w
RAND C orporation, S anta Monica, USA and reported, for th e m ost p art, by
D avid Hays. M ost of the n atu ral language work at RAND was centered on
the developm ent of a Russian-English M T system , of which a parser was con
for variable word order languages like Russian — especially as the RAND
work preceded developm ents in PSG for handling variable word order such as
scram bling transform ations (Ross 1967; Saito 1989) or the ID /L P formalism
(cf. G azdar et al. 1985: 44-50). However, in 1961 — when RAND was just
was far from being the ‘n a tu ra l’ choice. Hays claimed th a t “P h rase structure
certainly have known if there had been any other dependency systems in ex
istence. For an overview of the NLP work carried out a t RAND in the early
76
1960s, see Hays (1961c).
It is hard to over emphasize the im portance of th e RAND work in the
Shum ilina and Volotskaya. U nfortunately, nothing has been found describing
their DG work except H ays’ abstracts presented on page 62 above. T heir work
RAND system set an agenda for future systems to follow. Almost all authors
of the o ther system s described in this thesis acknowledge their debt to Hays
th irty years ago from its present-day condition. Firstly, there were hardw are
and software lim itations which im paired prototyping and which, inevitably,
nowadays taken for granted, were in 1960 still in their infancy or even w ait
ing to be invented. For example, the RAND systems would alm ost certainly
have looked different if their designers had been able to m ake use of complex
problem s and w hat constituted easy problems were m arkedly different from
present day views. These were days of great optim ism in M T. Hays w rote in
77
1961:
funds to M T projects drying up, and w ith them th e dream of constructing fully
In this chapter I present two parsing algorithm s. One of these was imple
clear w hether it was ever im plem ented. It could loosely be described as a ‘top-
4 .2 T h e b o tto m -u p a lg o r ith m
The bottom -up algorithm was em bodied in th e RAND SSD (‘Sentence S truc
tu re D eterm ination’) program . The principle references are Hays and Ziehe
4 .2 .1 B a s ic p r in c ip le s
There may, in fact, have been several distinct versions of th e parser described
here. Hays points to the fact th a t work centred around two ‘basic principles’
78
B asic principle 1: separate w ord-order and agreem ent rules
agreem ent rulesd This principle led to the developm ent of two sub-program s.
The first program selected pairs of words which could serve as candidates to
The second sub-program tested to see w hether a dependency relation was pos
sible on the basis of the gram m atical features and dependency requirem ents of
the linking program succeeded then the pair-selection program would try to
find a new pair of candidates for linking. If the linking program failed then
4 .2 .2 T h e p a r sin g a lg o r ith m
P a ir selectio n
parser. It works by attem pting to link any two words which are im m ediate
^Hays (1961a: 368) states that “this principle has been invented, lost, and re-invented
several times.”
79
neighbours in the input string. Search for im m ediately adjacent pairs pro
ceeds from left-to-right. An attem p t is made to link th e current word w ith its
the two words, the dependent drops out of sight, thus creating a new pair
of the newly created pair becomes the current word. If a dependency is not
established, the next word in the string becomes th e current word. Leftside
neighbours are only checked after a change of current word resulting from a
80
IN IT IA L IZ A T IO N : read input words into a list;
C =1.
1. IF C + l = e
T H E N halt
E L S E IF C C+1
T H E N record(C C + l).
remove(C+l),
goto(l)
E L S E IF C + l C
T H E N record(C+l C),
remove(C),
goto(l)
E L S E C:=C+1,
goto(2).
2. IF C=e
T H E N halt
E L S E IF C C+l
T H E N record(C -v C + l),
remove(C+l),
goto(2)
ELSE IF C + l C
T H E N record(C+l C),
remove(C),
C:=C+1,
goto(3)
E L S E C:=C+1,
goto(2).
3. IF C=1
T H E N goto(l)
E L S E IF C-1 -> C
T H E N record(C-l C).
remove(C),
C:=C—1,
goto(2)
81
E L SE IF C C-1
T H E N record(C C-1),
rem ove(C-l),
goto(3).
actly one word rem ains visible in the input list at the end of th e parse. This
implies th a t all the other words have been successfully linked into th e stru ctu re
ous sentence. This was a lim itation im posed by the then existing technology.
T he parser favours closer attachm ents over more d istan t ones. Hays suggested
parser getting the attachm ents right first time. (A pparently this was vital:
words a p artial ordering for the establishing of th eir dependency relations. For
exam ple, in one trial version of th e RAND SSD system a preposition could not
be linked to its head until its object had been attached to it.
82
A ss ig n ‘u r g e n c y ’ s c o re s Dependency relations could be assigned ‘urgency’
scores. W henever more th an one possible link existed, the one w ith th e highest
urgency score was allowed to ‘w in’. This was a simple weighting system . Hays
available, for example, from a hand-analyzed corpus (see C hapter 7 for more
on this approach to dependency parsing). Hays does not report th e results of
any trials which m ade use of ‘urgency’ scores and it seems unlikely th a t his
L in k in g
to unify two complex feature structures, one associated w ith each word to be
tested. If the test succeeds then unification has already built the new com posite
stru ctu re, otherw ise a simple failure is returned. However, in th e early 1960s
no such luxuries were available and so-called ‘agreem ent te sts’ co n stitu ted a
m ajor p a rt of the parsing problem . At least one of H ays’ papers (Hays 1966a)
above would be im plem ented in the agreem ent testing m echanism. T he de
tails of the various kinds of agreem ent testing are m ostly of little relevance to
m odern readers. However, two of the strategies still hold some interest.
inventory covering all of the various distinctions possible in a gram m ar. Now
im agine converting every possible feature p erm u tatio n into a distinct atom ic
symbol. This is effectively w hat was done in the RAND SSD system . Each
83
word form was assigned to one of these symbols (or a disjunction of these
symbols). For convenience the symbols used were integers. Assume th a t there
were n distinct integer symbols. An n x n array was set up. In order to find
out w hether a dependency could be established betw een a word form of type i
and another of type j it was necessary to look in the (z, j) - th cell of the m atrix.
This would indicate w hether it was possible to link th e words and, if so, w hat
kind of dependency relation was involved and which word was th e head. In
th e RAND system a 4000x4000 cell array was used and it was projected th a t
agreem ent testing came to be viewed as such a significant com ponent of the
parsing problem.
bek had recently developed (Lam bek 1958). Hays’ suggestion was to replace
the atom ic symbols in a category symbol (usually N and S', e.g. S / N ) with
In Russian, nouns and adjectives agree in num ber, gender and case;
there are six cases, and the following gender-num ber categories:
masculine singular, feminine singular, neuter singular, and plural.
Let each bit-position of a 24-digit binary num ber correspond to a
case-num ber-gender category, and use the ap p ro p riate num ber as
a com ponent of the gram m ar-code symbol of adjective or noun.
Agreem ent is tested by taking the ‘intersection’.. .If th e intersec
tion is zero, the occurrences do not agree. This m ethod is faster in
operation and requires no stored agreem ent tables; it is alm ost cer
tain to be the m ethod of future operational system s. (Hays 1961a:
373-4).
84
4 .3 T h e to p -d o w n a lg o r ith m
In this section we exam ine Hays’ other dependency parsing algorithm . It is not
clear w hether it was ever im plem ented at the RAND C orporation. Hays de
4 .3 .1 T h e p a r s in g a lg o r ith m
does not describe the rule system employed by the parser so I shall assum e
(13)
b
c 4 ;^.)
where (13a) shows the case where X i has dependents X j ^ - X j^ . (13b) is the
case w here X{ can appear in a sentence w ithout dependents. (13c) notates the
case w here X{ can appear in a sentence w ithout depending on any other word,
i.e. it is th e sentence root.
can serve as the sentence root, i.e. for which there is an entry of ty p e (13c)
root of a dependency tree. N ext, the gram m ar is searched for a rule of type
(13a) listing possible dependents for th e root, or a rule of type (13b) showing
that the root can occur w ithout dependents. For example, suppose th a t the
is fDund, it is m atched against th e words of the sentence. For exam ple, if the
85
present in the input sentence. If there is a m atch then th e fact th a t these
for in the gram m ar. The same is done for every word in th e in p u t string when
word X then no more rules of type % (...) are searched for. A sentence has
been successfully parsed if all leaves in th e dependency tree have been m atched
against rules of type (13b) and no words rem ain in the in p u t string which are
I shall say th a t a word X for which a rule of type % (...) is found and
list, th en expansion replaces one symbol w ith more th a n one symbol. For
(14)
(15)
(16)
([V: eats])
(17)
no greater th an the num ber of symbols in the input string. In this resp^ect,
86
top-dow n dependency parsing differs crucially from top-dow n PSG parsing: in
top-down dependency parsing an expansion can not add a symbol which does
not appear in the input string. In top-dow n PSG parsing, of course, ex tra
free languages (recall G aifm an’s result) but unlike a top-down C FPSG parsing
algorithm which has not been heuristically constrained, it can never enter
a t the mercy of the gram m ar w ith which it works. If the C FPSG contains any
left recursive rules then parser can expect, sooner or later, to blunder into an
infinite loop.
search. If the rightm ost available leaf were always to be expanded it would lead
to a right-to-left depth-first search. If all nodes a t distance d from the root were
search would be im plem ented. This could also be set up to progress left-to-
However, these labels describe the ways in which the branches are added to
built into the trees. For example, a left-to-right depth-first parser would add
th e words of sentence (18) into the tree in th e order: like, giants, jolly, green,
corn, golden.
(18)
87
Table 4.1: main features of H ays’ bottom -up dependency parser
4 .4 S u m m a ry
the sentence ra th e r than from the rules in the gram m ar. Heads do not search
for dependents; neither do dependents search for heads. Instead, th e parser
searches for potential head-dependent pairs and an agreem ent m atrix (‘belong
searching for the other member. T he parser produces at m ost a single analysis
The m ain features of H ays’ first parser are sum m arized in Table 4.1 (the
exact m eaning of entries in sum m ary tables will be discussed in C h ap ter 12).
a dependency tree is constructed although the resulting tree does not depend
is offered.
ble 4.2.
89
C h ap ter 5
H e llw ig ’s P L A IN sy ste m
suite of NLP com puter programs developed by P eter Hellwig a t the University
tow ards his dissertation (Hellwig 1974). Since then he has continued to develop
his system . A lthough the PLAIN system has been im plem ented in several
different locations around the world (e.g. Cam bridge, Hawaii, Heidelberg, Kiel,
Paris, Pisa, Surrey, Sussex, Zurich) and custom ized for at least three different
languages (English, French and G erm an), Hellwig rem ains the only au th o r on
Basically, the PLAIN system is a parser. I shall not describe any of its
(D RL). I shall examine the way in which the parser uses unification to build
90
5.2 D e p e n d e n c y R e p r e s e n ta tio n L a n g u a g e
Hellwig’s prim ary m otivation for basing his parser on dependency is his be
(Hellwig 1986: 198). This can be done w ithin a single representation lan
guage and a single structure. Hellwig contrasts this w ith, for exam ple, LFG
all, no ‘co n stitu en ts’ to be ‘discontinuous’ in DG. As we shall see, this claim
5 .2 .1 T h e fo rm o f D R L e x p r e s s io n s
91
Figure 5.1: stem m a showing a simple dependency stru ctu re
The parser is described in the next section. Here, I pursue the question of
where the bracketing represents a tree w ith nodes and directed arcs. Arcs
an expression representing the stem m a shown in Figure 5.1 has the form shown
in (19).
(19)
(D (A) (B ( O ) (E))
words b u t they are not expressed by atom ic symbols. R ather, they consist
a ttrib u te s are grouped together in each DRL term , nam ely a role, a lexeme,
( 20 )
92
(21)
(ILLOCUTION: assertion: else typ<l>
(PREDICATE: like: verb fin<l> num<l> per<3>
(SUBJECT: cat: noun num<l> per<3>
(DETERMINER: the: dete))
(OBJECT: fish: noun)
This exam ple shows one term per line w ith indentation m arking the hierar
chical stru ctu re of the tree represented. The three different types of attrib u te
in each term are separated by single colons. Roles are listed first. These are
tree. So, for example, cat is the SU B JEC T of like and fish is th e O B JE C T
of like. Lexemes are listed next. Roles and lexemes express, respectively, the
w ord’s syntagm atic and paradigm atic relations. Together they constitute a
sem antic representation of the sentence. The th ird p art of each term describes
the surface properties of the associated word. This consists of a m ain category
are, by convention, three-character strings. Values are coded as num bers inside
angle brackets.
The analysis employed in PLAIN does not make use of any non-term inal
constituents. N either does it use em pty categories. Every node in a depen
dency tree m ust correspond to an actual word in the sentence — w ith one
exception. Hellwig argues th a t
In order to te th e r this ‘clause’ item to som ething which actually occurs in the
all, serves to m ark the ending of a m ain clause and it can — if so desired —
93
clauses?) Hellwig is aware of these but he argues th a t th e advantage of treatin g
of clausal sem antics” (Hellwig 1986: 196). However, he does not carry his
5 .2 .2 W o rd o r d e r c o n s tr a in ts
person and num ber, a DRL term can also include positional features which
features are reported in the literature: ‘seq’, ‘a d j’, and ‘lim ’. These constrain
1. D precedes H
2. D follows H
1. D im m ediately precedes H
2. D im m ediately follows H
lim This feature delimits the outerm ost dependents of a word and th u s can
word m ay not be ‘m oved’. Once again, this feature has two values:
1. D is th e leftmost dependent of H
94
Hellwig presents th e DRL term in (22) to illustrate th e use of these word order
(22)
(23)
T he mouse th a t the cat th a t likes fish chased squeaks.
lishing a dependency between a head and an ‘unm oved’ word and establishing
a dependency betw een a head and a word which has ‘m oved’ out of its parent
‘co n stitu en t’. C ovington’s system works w ithout any positional constraints at
all whereas Hellwig’s system can use as m any or as few positional constraints
95
Hellwig’s suggestions are fairly ten tativ e since he proceeds to say th a t “It is
likely th a t appropriate attributes can also be defined for more difficult cases
5 .2 .3 T h e b a se le x ic o n
left of the assignm ent arrow, and a DRL term to th e right of th e arrow. The
following examples (from Hellwig 1986: 197) show some entries in th e base
lexicon.
(24)
a CAT -> (* cat: noun num<l> per<3>);
b CATS -> (* cat : noun num<2> per<3>);
c LIKE -> (* like: verb per<l,2>);
d LIKES -> (* like: verb num<l> per<3>);
e LIKE -> (* like: verb num<2> per<3>);
f FISH -> (* fish: noun per<3>);
None of the entries has been assigned a role. This can only occur during
parsing. E n try (24a) has a singular num ber feature num< 1 > distinguishing
it from the plural num< 2 > in (24b). T he person feature p e r < 1,2 > of (24c)
has a disjunction of values. Entries (24d) and (24e) are required for subject-
verb agreem ent. E ntry (24f) has no num ber feature since fish can be either
5 .2 .4 T h e v a le n c y le x ic o n
96
(25)
variables are knows as ‘slots’ since they can be filled by dependents. The
th a t the object be a noun which occurs somewhere to the right of its head.
tive b ut it fails to capture generalizations. O ther forms of the verb like will
have very sim ilar slots and m any other third person singular verbs will have
inform ation in ‘com pletion p a tte rn s’ and setting up a distinct ‘valency lexicon’
which associates com pletion p attern s w ith words. For example, the following
(26)
i.e. (26a) says th a t the subject will agree w ith its head in person and num ber.
(27)
T hese state th a t squeak just has a subject slot (it is intransitive) whereas like
lexicon control the unification of term s from the base lexicon w ith stored com
97
pletion p attern s. Unification is not confined to this task; it is th e principal
tures, and to describe Hellwig’s gram m ar as one variety of DUG (for exam ple
(28)
We have seen how words in the input string can be associated w ith role, lexeme
and m orphosyntactic inform ation in DRL term s. We have also seen how words
can be given slots into which dependents can fit. In th e next section we shall
see how potential dependencies are turned into actual dependencies by the
parser.
5 .3 T h e p a r sin g a lg o r ith m
1986 COLING paper and from personal com m unication w ith Hellwig.
tence.
2. A queue indicating the order in which words are to be exam ined. The
98
queue contains an explicit end-of-queue m arker. The parser begins at
the left and works towards the right of the sentence so for a sentence
w ith n words (including the period), the queue looks like this: (1, 2, . . . ,
n, end-of-queue).
The parsing algorithm uses these two d a ta structures in the following way:
2. Try to find a slot in another word w ith which th e current word can unify.
O nly adjacent words are tried. There are two possible outcomes:
(a) A slot is found for the current word. In this case th e current word is
the queue.
(b) A slot is not found for the current word. In this case the pointer to
4. Try to find a slot in another word w ith which th e current word can unify.
Only words a t one remove (i.e. n — 2 or n -f 2 are tried. T here are two
possible outcomes:
(a) A slot is found for the current word. In this case the current word is
the queue.
(b) A slot is not found for the current word. In this case th e pointer to
99
5. G oto 4 until end-of-queue is reached. W hen this happens move end-of-
T he process term inates when steps 2 and 4 are b o th executed w ith no change
to th e queue.
around word ‘islands’ in the sentence. T he object of step 4, which looks beyond
constituent.
This is a m ulti-pass parser. D ependents search for heads b u t not vice versa:
heads do not search for dependents. Hellwig makes no claims for th e validity of
algorithm .
100
I N I T IA L I Z A T I O N : initialize two lists: Pcinter_L and Term.L;
Term_L is an ordered list of DRL terms
corresponding to the words of the sentence;
Pointer.L is an ordered list of pointers
to these DRL terms;
C is a pointer;
CÎ is the term pointed to by C;
C|:Slot is any valency slot in C|;
X and Y are variables;
e is not an absolute end-of-list marker
initialize an empty stack: Stack.
1. IF C=e
T H E N IF top(Stack) = Term_L
T H E N IF length (Term_L) = 1
T H E N succeed
E L S E fail
E L S E push(Stack,Term_L),
remove(Pointer_L,C),
append(Pointer_L,e),
C:= f irst( Poi nte r_L),
goto(2)
E L S E IF CÎ U CT-l:Slot
T H E N remove(Term_L,CT),
remove(Pointer_L,C),
C:=first(Pointer_L),
goto(l)
E L S E IF CÎ U CT+l:Slot
T H E N remove(Term_L,Cî),
remove(Pointer_L,C),
C:=first(Pointer_L),
remove(Pointer_L,C),
append(Pointer_L,C),
C:=first(Pointer_L),
goto(l)
E L S E remove(Pointer_L,C),
append(Pointer_L,C),
C:=first(Pointer_L),
goto(l).
101
2. IF C=e
T H E N remove(Pointer_L,C),
append(Pointer_L,C),
C:=first(Pointer_L),
goto(l)
E L S E IF CÎ U CT-2:Slot
T H E N remove(Term_L,CT),
remove(Pointer_L,C).
C:=first(Pointer_L),
X:=last(Pointer_L),
remove(Pointer_L,X),
Y:=last(Pointer_L),
remove(Pointer_L,Y),
append(Pointer_L,X),
append(Pointer_L,Y),
goto(2)
E L S E IF CÎ U CT+2:Slot
T H E N remove(Term_L,CT),
remove(Pointer_L,C),
C:=first(Pointer_L),
X:=C+2,
remove(Pointer_L,X),
append(Pointer_L,X),
goto(2).
E L S E goto(l)
5 .4 T h e w e ll-fo r m e d su b str in g ta b le
One of the m ost interesting and innovative aspects of Hellwig’s parser is his use
102
edges. To begin w ith, there are as m any edges as there are readings for the
Hellwig’s W FST is very like this except th a t his edges are labelled w ith
W hen a word becomes a filler for another word’s slot, th e two are unified and
a new edge is inserted in the W FST spanning w hat was previously spanned
by the two edges. Hellwig’s W FST for th e globally am biguous sentence Flying
constituent. This is not sufficient for Hellwig’s parser, which advertises as one
is discontinuous, simply m arking its left and right boundaries does not serve
to identify its com ponents since, by virtue of the discontinuity, some of the
each word in th e input sentence. Each bit string in an n-w ord sentence consists
of one ‘1’ and n-1 ‘O’s. T he zth word is represented by a bit string w ith th e ‘1’
between two words, th eir bit strings are added. If the addition involves any
even before the slot features have been checked. If no ‘carry ’ operations are
discontinuous constituents.
For example, th e words of sentence (29) would be assigned the initial bit
strings shown in (30) (trailing zeros in bit strings and features in DRL slots
103
ILLOC assertion
(PRED can verb fin
(MV be verb inf
(PA dangerous adje))
(SUBJ flying noun
(OBJ planes noun))
ILLOC assertion
(PRED can verb fin
(MV be verb inf
(PA dangerous adje))
(SUBJ planes noun
(ATR flying adje))
[planes noun
(ATR flying adje))
104
are suppressed for readability).
(29)
(30)
BITSTRING TREE
1 (what pron)
01 (do verb fin
(SUBJECT: _ )
(MAINVERB: _ ))
001 (Danforth noun)
0001 (say verb inf
(DIRECTOBJECT: _ )
(INDIRECTOBJECT: _ ))
00001 (to
( _ )
000001 (George noun)
0000001 (ILLOCUTION question
(PREDICATE: _ )
In (29), What is the direct object of say. T he discontinuous tree rooted in ‘say’
(31)
BITSTRING TREE
100111 (say verb inf
(DIRECTOBJECT: what )
(INDIRECTOBJECT: to
(George)))
use of a W FST in the m anagem ent of ambiguity. We shall retu rn to this topic
105
Table 5.1: main features of Hellwig’s dependency parser
5.5 S u m m a ry
ture structures. His parser has a bottom -up island-driven control strateg y
the precise use of the lim feature is required before th e system can be properly
evaluated). W ords look for heads; they never look for dependents. T he parser’s
efficiency is increased by the use of a W FST which differs from stan d ard W FST
parsers in building dependency rath er th an constituency stru ctu res and in rep
106
C h ap ter 6
T h e K ielik on e parser
6.1 O v e r v ie w
In this chapter I exam ine the Kielikone dependency parser. Since Ju n e 1982
th e Finnish N ational Fund for Research and Development (‘SITR A ’) has spon
cus of the research is the design and im plem entation of a Finnish te x t interface
cations. T he overall stru ctu re of the interface system — which has recently
modules.
1. A morphological
nent m orphs (Jappinen et al. 1983; Jappinen and Ylilammi 1986). This
would be m uch larger th an for a language like Enghsh which has much
107
2. A parser, known as ‘A D P ’ (A ugm ented D ependency Parser), uncovers
ing of sentences and also for interpreting sentences in their dialogue con
text. Thus the m odule embraces b o th sem antics and pragm atics. In early
cific database it is only necessary to w rite an in terp reter to tra n slate UQL
Some of the dependency parsers covered in this thesis are described on the
1987 hsts 53 item s, of which 14 are specifically concerned w ith parsing. This
being described a t any given point. As we have already seen, m any of the
com ponents in the system have been given names. W hen new names appear
it is not always clear w hether (i) only th e nam es have changed while the
108
com ponents rem ain the same, (ii) the new names introduce new com ponents
to com plem ent the existing com ponents, or (iii) th e new names introduce new
com ponents to supersede old com ponents. This would all be self evident were
it not for the fact th a t SUOM EX is a very complex system and m ost published
papers can only discuss selected sub-parts of it. It is thus necessary to try to
guess w hether elem ents which are not m entioned have been left out for lack of
space or because they have been quietly dropped from th e system. T he parser
itself suffers from this problem since, as we shall see, its internal stru ctu re is
also ra th e r complex.
In order to aid exposition, I shall plot the m ain milestones in the developm ent
of the parser before turning to a more detailed exam ination of th e m ost recent
version.
1984 (Nelim arkka et al. 1984a; Nelim arkka et al. 1984b). A t th a t stage th e
F u n ction al d ep en d en cy gram m ar
clature. N on-term inal phrase nodes or labels do not ap p ear anyw here in th e
109
heitti
a . d v ô r b i a . 1 / subject object
TIME / agent neutral
refer to a single word which has no dependents. T h e word on which all others
of dependency are recognized and these are linked w ith th e trad itio n al syntac
tic functions (or relations) subject, object, adverbial, genitive a ttrib u te , etc.
For example, the sentence Nuorena poika heitti kiekkoa (‘W hen he was
young the boy used to throw the discus’) is given a stem m a analysis as shown in
Figure 6.1 (exam ple cited in Nelimarkka et al. 1984a: 169). T his com bination
constituents a t the end of the sentence before it has built any a t th e beginning.
110
O verall, the strateg y is very close to th a t of an island parser which starts
Notice th a t the parser only attem p ts to link im m ediately adjacent (i.e. neigh
disappears from sight of the parser. C onstituent B now has a new neighbour
and so the parser can a tte m p t to establish a new dependency link between
them .
constituent, plus two stacks, one storing the left context, th e other storing the
R l, or it is pushed onto one stack and the current constituent register is filled
T w o-w ay finite au to m a ta
to store inform ation specifying w hat all constituents may contain. In other
words, it is necessary to store for each word type a com plete record of all its
111
The register of the
current constituent
LI Rl
L2 R2
L3 R3
obligatory and optional dependents. This can then serve as a model for actual
occurrences of th a t word type. For this task the system uses two-way finite
autom ata,
A two-way finite autom aton (Levelt 1974) consists of a set of states. One
of these is distinguished as the sta rt sta te and one or m ore are distinguished
as final states. T he states are linked by transition arcs betw een th e states.
Each arc recognizes a sentence elem ent and moves th e reading head either to
the right or to the left in the input string. T he au to m ato n accepts an input
string if it begins in the sta rt sta te w ith th e first word under th e reading head
autom aton recognizes functions like subject, object, etc. Each arc traversal
also serves to build some structure, nam ely to insert a dependency relation
112
the nam e of the function specified by th e arc traversed. States are divided
into ‘left’ and ‘rig h t’ states indicating the side of the current word on which
It has been known for some tim e th a t any language recognized by a two-way
auto m ato n is regular (i.e. type 3, the most highly constrained set of languages
in the Chom sky Hierarchy). This power is not sufficient for th e requirem ents
m ade to activate one another. They make use of three ‘control’ arcs which shift
processing from the current word to one of its neighbours. These control op
erations are ‘B uildPhraseO nR ight’, ‘FindR egO nLeft’, and ‘F indR egO nR ight ’.
ciated w ith a given word, the final action of th e autom aton is to m ark the
It is w orth noting th a t a u to m ata ‘know’ nothing about when and why they
ordering of function and control arcs in the au to m ata is said to result in very
6 .2 .2 A g r a m m a r r e p r e s e n ta tio n la n g u a g e : D P L
It is not clear from the literature w hether the representation language described
in this section was developed concurrently w ith th e com ponents covered in the
113
Lehtola 1986). All functions, relations and a u to m ata were, a t one tim e, ex
(Lehtola et al. 1985: 100). However, this can be read as m eaning ‘th e main
gram m ar w riter specifies an inventory of perm itted p roperty names and values.
These can then be built into descriptions. A num ber of operators are available
the following:
= equality
: = replacem ent
insertion
w hat a DPL entry looks like. This exam ple is taken from Lehtola et al. (1985:
102). I shall not discuss its detail here. T he im p o rtan t point to note is th a t
acknowledged th a t procedural gram m ars — other th an gram m ars for tin y frag
the functions and relations in the gram m ar. Fairly m inor m odifications to the
114
(FUNCTION: Subject
( RecSubj -> (C := Subject))
)
(RELATION: RecSubj
((C = Act < Ind Coud Pot Imper >) (D = -Sentence +Nominal)
-> (D = PersPron (PersonP R) (PersonN R)
((D = Noun) (C = 3P) -> ((C = S) (D = SG))
((C = P) (D = PL))))
((D = Part) (C = S 3P)
-> ((C = ’OLLA)
=> (C :- +Existence))
((C = -Treuisitive +Existence) ) ) )
Before moving on to exam ine the next developm ent in the Kielikone project
we m ust note a cryptic com m ent buried in one of the papers describing the
D PL representation language:
stitu en ts in sight of the current word are its im m ediate left and right neigh
bours. T he above com m ent seems to suggest th a t the parser really has th ree
SIFAL system (M arcus 1980) (which has a three-cell lookahead buffer). This
else in the literatu re points in this direction we m ust sim ply place a question
6 .2 .3 C o n s tr a in t b a se d g ra m m a r: F U N D P L
unwieldy formalism. The gram m ar w riter was required to work out complex
115
responded by designing a more user-friendly high-level representation language
w riter is no longer required to worry about control issues (at least, not to the
gram m ars such as LFG (K aplan and B resnan 1982), FU G (Kay 1985), and
ally, th e job of the parser is to search for an analysis of th e sentence which does
gram m ars are not unification gram m ars. FU N D PL is sim ply a high level in
F u n c tio n a l s c h e m a ta
as schemata. Each schema has four parts: p attern , structure, control, and
those in the W h e n slot of the schema. (Presum ably th e slot is nam ed to signify
som ething like ‘when this p a tte rn is m atched, use th e schem a’). T he structure
p a rt of the schema lists optional and obligatory dependents for the head of the
116
F JS C H E M A ; nam e
W h e n = [properties] pattern
O b lig a to r y = (functions)
O p tio n a l = (functions) structure
O r d e r = <conc.description>
T r y L e ft = < functions>
T r y R ig h t = < functions > control
D ow n
Up
A s s u m e = [properties] assignment
L ift = function(attrib u tes))
m aterial is indicated by two consecutive dots (..). O r d e r = < D 1..R ..D 2>
schem a consists of heuristic inform ation to guide the parser’s search order.
are usually, though not necessarily always, found in particu lar locations, the
heuristic inform ation can cut down average search tim e considerably. D o w n
and U p are used to change levels between m atrix and subordinate sentences.
T heir use is not well docum ented. Presum ably their purpose is to prevent
not clear how they work and no examples are available. Clearly, th e designers
of FU N D PL are being som ewhat optim istic when they say th a t their system
T he A s s u m e slot assigns new features (e.g. -fPhrase) to the regent once the
schem a has been fully m atched and bound. T h e L ift slot is like th e A s s u m e
T he exam ple shown in Figure 6.5 appears in Jappinen et al. (1986: 463).
It is the functional schema for norm al Finnish transitive verbs which allow u n
lim ited adverbials on either side. T he schem a allows all ordering perm utations
117
(F J S C H E M A : V PT rA ct
W h e n = [Verb Act Ind + Transitive]
O b lig a to r y = (Subject O bject)
O p tio n a l = (Adverbial*)
T r y L e ft = < S u b ject O bject A dverbial>
T r y R ig h t = < O bject Adverbial S u b ject>
A s s u m e = [+ P hrase + Sentence))
B in a r y r e la tio n s
stitu e n t. T hey do not contain any inform ation detailing w hat m ight co n stitu te
a legitim ate dependent of the regent. For example, th e schem a in Figure 6.5
— are com pletely distinct from binary relations which define all perm itted
dependency relations which may hold betw een pairs of words in Finnish sen
tences. B inary relations are boolean expressions which succeed if all conditions
are m et, otherwise they fail. U nfortunately, th e literatu re offers only half a bi
118
(CATEGORY : SynCat
< (Word)
(Noun ! Word)
(Proper ! Noun)
(Common ! Noun)
(Pronoun ! Word)
(PersPron ! Pronoun)
(DemPron ! Pronoun)
(IntPron ! Pronoun)
the conditions for both R and D are satisfied then th e value of the relation is
T y p e d efinitions
higher individuals in the hierarchy. For example, a category SynC at, con
For example, the following property definition (from Valkonen et al. 1987b:
219) records the fact th a t ‘P o lar’ can have two values, ‘P o s’ or ‘Neg’. T h e
(32)
(PROPERTY: Polar < ( Pos ) Neg >)
119
Lexicon
evitably, some features have not been covered. Some of these were left out
because they were minor, ephem eral suggestions. O thers were left out be
cause the literature contains insufficient or confusing inform ation. For exam
ple, K ettunen 1986 mentions a parser called ‘DADA’ (an acronym from the
again so it is hard to tell w hether it was a short-lived altern ativ e to the older
6.3 T h e p arser
T he best texts describing the present state of the parser are Valkonen et al.
(1987b) and K ettunen (1989). There is not full agreem ent between these p a
pers — they even disagree about the nam e of the parser! Valkonnen et al. call
6 .3 .1 T h e gram m ar
properties.
120
2. A lexicon for associating features w ith words. Recall th a t
(Jappinen and Ylilammi 1986), which analyzes words into their compo
determ ine w hether the features of any two words are such as to allow
6 .3 .2 B la c k b o a r d -b a s e d c o n tr o l
which appears in Valkonen et al. (1987a: 700) and Valkonen et al. (1987b:
221 ).
T he system has two knowledge sources, a body of functional schem ata and
sources do not com m unicate directly. Instead, they read from and w rite to
current word its properties are m atched against the triggering p attern s of the
functional schem ata (i.e. the values of the W h e n slots in the schem ata).
O nly one m atch can be entertained a t any one time. A m atching schema
be built around th e current word. This active environm ent is located on the
blackboard and is m onitored by the binary relations. These are used to indicate
121
BLACKBOARD KNOWLEDGE
SOURCES
/ ^ Binary
Other computational state data dependenc
relations
CONTROL
where they fit in the above diagram . This process continues u n til all of the
obligatory slots (and perhaps some optional slots) have been filled in th e active
environm ent. A t this point th e local p artial dependency tree is com plete and
processing can shift to another constituent w ith another active environm ent,
unless, of course, the constituent to be com pleted has a m ain verb ( + Sentence)
(H ayes-R oth et al. 1983; Nii 1986). T he principle behind blackboard system s
122
a dem on w atching the blackboard until som ething appears which th a t demon
board and returns to a sem i-dorm ant m onitoring state. In this way, different
are only two knowledge sources, namely the functional schem ata and th e bi
a num ber of other dependency parsers described in this thesis seem to work
6 .3 .3 T h e p a r sin g a lg o r ith m
In this section I describe the parsing algorithm . Before getting too close to
123
T he parsing algorithm is defined by a two-way finite autom aton. This is not to
w riter to define functional schem ata and still built on the fly by the FU N D PL
1. One of the schem ata associated w ith the current constituent is activated.
No m ore th an one schema may be active a t any one tim e, i.e. only one con
stitu e n t may be a t step 2 or step 4 in th e autom aton. However, any num ber
1. All constituents have heads, w hether they consist of single words or com
comes:
124
(a) There are no left neighbours or left neighbours are a t step 3, pend
ing. Go to step 3.
step 3, pending.
PEN DIN G stack and go to step 1 w ith the next constituent to the
(b) Failure.
125
Figure 6.9: the Kielikone parser control strategy autom aton
1. IF (C = l V C -l=top(Stack))
T H E N goto(2)
2. IF (saturated(C) v C + l= e )
T H E N goto(4)
E L S E push(C),
C:=C-1-1,
goto(l).
T H E N record(C C+1).
remove(C+l),
goto(2)
E L S E fail.
126
4. IF (C + l= e & C -1=0 & empty(Stack))
T H E N Result:=C,
succeed
E L S E IF C + l= e
T H E N C:= top(Stack),
pop(Stack),
goto(3)
E L SE C:=C+1,
goto(l).
to be being m anipulated a t the present mom ent. Active schem ata are not, of
them selves, either active or inactive: they are simply representations. They
‘ex p ert’ on some word in the lexicon. This flavour of system is mentioned
127
6 .3 .4 A m b ig u it y
choice of analyses for hom ographie words, choice of schem ata, and choice of
to am biguity since it can result in identical structures being built m any times
over.
6 .3 .5 L o n g d is t a n c e d e p e n d e n c ie s
U nder norm al circum stances, dependency relations are established betw een
m arking schem ata which can become possible neighbours of moved item s as
act as a place-holder for the moved item which is said to be ‘cap tu red ’. T he
diate vicinity of the constituent. For example, a schema m ight contain the
entry:
(DISTANT Object)
128
A m odification to the parser is also required. T h e parser is given an ex
tra register. Any captured constituents are copied from the ‘D istantM em ber’
slot of the ‘h ost’ schema into the special purpose register. This register m ust
requirem ent of the current constituent, it can be copied from the register into
be satisfied by more conventional means). A fter initially being copied into the
G ram m ar (H udson 1988b: 202ff; also 189 below). However, unlike W G, the
Kielikone solution does not appear to handle ‘island co n strain ts’ (Ross 1967).
a sentence w ith this kind of prohibited extraction, i.e. it would parse both of
(33)
6 .3 .6 S t a t is t ic s a n d p e r fo r m a n c e
system contains 66 binary relations, 188 functional schem ata and 1800 idiosyn
129
analyzer contains 35000 entries.
parsed in linear tim e by parsers based on context free gram m ars. (Some ex
amples are given in Earley 1970). It is norm al to cite worst case or possibly
average case complexity rath er th an best case com plexity in order to evalu
ate a parser. U nfortunately, these figures are not published for th e Kielikone
system .
6 .3 .7 O p e n q u e s tio n s
T h eo retica l sta tu s
Unlike some of the other dependency parsers reviewed in this thesis (e.g. the
Lexicase and W ord G ram m ar parsers. C hapters 8 and 9), th e Kielikone parser
Finnish has fairly free word order, it does not have a trad itio n of D C scholar
ship as is the case w ith, for example, G erm an and Russian. Indeed, Tarvainen
(1977) is one of the few texts which makes any a tte m p t a t analysing Finnish
literature.
they were also developing a cognitive m odel (e.g. N elim arkka et al. 1984a:
168; Lehtola et al. 1985: 106). N ot everyone shares this view. For exam-
130
pie, S taro sta and N om ura (1986: 127) describe the Kielikone parser as having
M o d u la r iz a tio n
As they stan d , th e parser and the gram m ar are alm ost distinct — but not
quite.
To begin w ith the gram m ar, the functional schem ata contain U p and
D o w n slots which can be interpreted as control statem ents. They also con
the am ount of search required of the parser. Jappinen et al. (1988b) have
moves the boundary betw een gram m ar and parser. They do this by introducing
an ordered set of constituent types to look for (in much the sam e m anner as
D iv is io n o f la b o u r
is a distinction between functional schem ata and binary relations. This m ight
be restated more succinctly by asking why th e notion of ‘co n stitu en t’ has been
with the kind of simple pairwise relations fam iliar from dependency gram m ar
whereas th e functional schem ata are concerned w ith larger objects which can
131
be identified w ith constituents in th e traditional sense. In fact, a schema acts
This seems like a harsh criticism w ith which to conclude this exam ination of
and, as has been noted above, these have not always been readily interpretable.
T here has been hardly any real discussion of motives for choices or argum ents
against possible alternatives. Parsers are notoriously difficult to com pare and
evaluate. Bald perform ance figures are not very helpful. W h at is required is
6 .4 Sum m ary
T he Kielikone parser works from left to right, bottom -up. W ith each input
slots and heuristic inform ation. Search proceeds from heads to dependents in
T he m ain features of the Kielikone parser are sum m arized in Table 6.1.
132
Table 6.1: m ain features of the Kielikone dependency parser
133
C h ap ter 7
T h e D LT M T sy stem
7.1 O v e r v ie w
In this chapter I exam ine the D istributed Language T ranslation (DLT) systems
and the D utch M inistry of Economic Affairs. It began in late 1984 and, so
nation ‘sem i-autom atic’ will become clear shortly. Unlike some of the other
ing th e DLT system , including a six-volume book series published by Foris and
erful language-neutral inference engine which could be sim ply custom ized for
134
too great to risk having to re-build the whole system every tim e a new language
is added. T he design adopted in DLT ‘distributes’ the tran slatio n task into
two sub-tasks. Firstly, the source language is tran slated into an interm ediate
the target language. This is not obviously a simplification since where there
m ight have been a single language pair, there are now two. T he rationale for
system is to w rite a sub-system for translating betw een th a t language and the
interm ediate representation. Once this has been done, it is possible to tra n s
late from th e newly added language to all of th e other languages in the system
w ithout further effort. T hus, if there are ten languages in th e system and a
for one language pair instead of ten language pairs. T h e interm ediate represen
T ranslation is sem i-autom atic in DLT in the sense th a t th e system can seek
clarification from the user when necessary. W hen th ere are no difficulties, the
system can tran slate from source to interm ediate to targ et as though operating
in batch mode. W hen a problem arises in tran slatin g from source language
to interm ediate representation, the system can query th e user (in th e source
resolve the am biguity by asking the user to select am ongst altern ativ e readings.
shall describe it in more detail in the next section. DLT is controversial in its
135
are raised which then have to be solved. Instead, they argue for an approach
I shall not investigate this claim here. (For m ore inform ation see
Sadler 1989c.) Instead, I shall focus on the way in which DG is used to rep
parser.
136
Translation 1 Translation 2
SYNTACTIC SYNTACTIC.
IN T E R
SO U RCE M E D IA T E TARGET
LEXICAL LEXICAL ■
7.2 D e p e n d e n c y g ra m m a r in D L T
not been able to find any published account of the form of dependency rules
th a n arb itra ry graphs. T h a t is, they m ust be rooted, directed, acyclic, and
non-convergent.
tree.
137
word or from depending on a single word by virtue of more th an one
dependency relation.
lead to crossing arcs, were it not for the fact th a t surface word order is not
is illustrated in Figure 7.2 which shows th e analysis for th e sentence W hom did
you say it was given to? (op. cit.: 103). (DLT dependency trees are usually
one word. Nodes are even prohibited from representing frozen m ulti-word
A lthough nodes signifying more th an a single word are not allowed, a case
is m ade for allowing nodes to signify less th a n one word, i.e. a m orpheme. T h e
^Except in the form of features indicating the word’s position in the input string.
138
did
you say
was
given
wnom
Figure 7.2: dependency analysis of the sentence W hom did you say it was given
to?
argum ents hinge around phenom ena such as English clitics {can’t = can not)
verbs which com bine a root w ith a participle in a single word in some contexts
b u t which sep arate them into two words in other contexts. T hus, th e root and
recognize a root word and its inflectional affix as separate nodes in th e tree.
Things are not quite as simple as this, since th e DLT gram m ar recog
139
sing
and
< < ) )
Î Harry
Tom Dick
tures, such as the one shown in Figure 7.3 (Schubert 1987: 114fF; cf. Hellwig’s
A num ber of parsing approaches have been considered in connection w ith the
DLT project, m ost of them modifications of parsing techniques well tried with
PSGs. In this section I shall briefly m ention three of these — augm ented PSG
(A PSG ), definite clause gram m ar (DOG), and augm ented tran sitio n network
In th e early stages of the DLT project two parsers were developed for a
subset of English in order to com pare their com putational efficiency. These
were based on A PSG s (W inograd 1983: 377ff) and ATNs (W oods 1970;
Schubert (1987: 213) argues th a t far from being tied to PSG , A PSG
140
merited in a parsing environm ent developed a t the U niversity of A m ster
dam (van der K orst 1988). However, it stretches the m eaning of ‘dependency
parser’ som ew hat to designate the APSG parser thus. R ather, it is a PSG
as it goes along. Its input is a PSG , not a DG. According to K orst the gram
(op. cit.: 6-7). I shall not consider the A PSG parser any fu rth er here.
Schubert argues th a t DCGs (Pereira and W arren 1981) are not inherently
To the best of my knowledge no further research has been done towards de
literature. In fact, an ATN was used in th e first DLT p rototype which was
141
SUBJ SUBJ
DOBJ DOBJ
lOBJ lOBJ
POBJ POBJ
PRED PRED
PREC PREC
VERB
ADVC ADVC
INFC INFC
PREA PREA
ADVA ADVA
SUBO VC
LIA POSTA
ATNs are very simple and effective for parsing languages w ith an adjacency
in Figure 7.4 is taken from Schubert (1987: 219). It shows th e top level
netw ork for describing the stru ctu re of simple D anish sentences. Labelled
am ongst the dependents of the verb. Registers are used to ensure th a t a verb
has th e correct num ber of dependents, e.g. th a t a verb has exactly one subject
(as opposed to either any num ber of subjects or one before, and one after the
This dependency parser im plem ents a top-down, left corner parse strategy.
142
NOUN
PRÜNGU:
PLACE
HOLDER
VERB
NUM
order of the verb’s dependents is fairly free. It could be argued th a t this works
to.
As is norm ally the case w ith ATNs, the gram m ar and the parser are con
a clean separation should be m aintained between gram m ars and parsers for
reasons of clarity and m odifiability (e.g. see G azdar and Mellish 1989: 95ff).
Presum ably th e sam e conclusions have been reached by th e DLT team since
For the second prototype of the DLT system , a completely different approach
to parsing has been adopted. In the earlier prototype, fairly stan d ard rule-
governed parsers were tried. For th e second prototype, experim ents are being
143
Probabilities can be incorporated into gram m ars in a t least two ways. First,
of each rule actually being used in a context in which it could be used. For
exam ple, the following n o tation could be used to indicate th a t th e rule n(det,*)
is appropriate for 60% of all nouns and the rule n(det,adj,*) for 20% of all
nouns.
This inform ation can be used heuristically during parsing so th a t the rule w ith
highest probability is tried first. A lternatively, all possible rules can be tried
and all possible analyses built for a sentence. T he analysis w ith th e highest
penses w ith the dichotom y betw een well-formedness and ill-formedness, replac
ing it instead w ith a gram m aticality continuum ranging from fairly ill-formed
above. Additionally, all other rules possible w ithin th e logic of th e gram m atical
(‘Annealing P arser for Realistic Input Language’; Haigh et al. 1988) and RA P
(‘Realistic Annealing P a rse r’; Atwell et al. 1989) projects use th e technique of
w ith a gram m ar which does not rule out any stru ctu ral possibilities a priori^
instead assigning very low probabilities to all tree configurations not a tte ste d
144
some kind is produced for every sentence, including those which conventional
ical studies of tex t corpora. A corpus is first parsed and th e analyses verified.
T he frequency of application of each rule is counted and then used to com pute
the probability of each rule. These probabilities are then projected from the
‘train in g ’ corpus to the rest of the language. A rationale for allowing all log
ically possible rules w ith very low probability is th a t no training corpus will
ever be large enough to furnish examples of th e use of all rules of a n atu ral
language. By allowing every logically possible rule w ith very low probabil
(van Zuijlen 1989a, 1989b), he has in practice adopted a more straightforw ard
to be regarded as the heart of the DLT system (Sadler 1989a; Sadler 1989b).
te x t fully analyzed in another language.^ This can then be tre a ted as a resource
for working out correspondences between the languages. Since the analysis of
145
Table 7.1: different dependency links retrieved from the BKB
Word Links
you 17
can 10
remove 4
the 9
docum ent 58
from 16
drawer 37
151
base is of no interest: only one language is exam ined). For his first proba
bilistic parsing experim ent (Jan u ary to April 1990), van Zuijlen used a BKB
tree bank consisting of 1400 dependency trees, representing some 22000 words
from a software m anual. (This is far too small a tree bank to have any sig
links retrieved for the input sentence You can remove the docum ent from the
drawer is shown in Table 7.1 (all examples from van Zuijlen 1990).
which point towards and away from th e word instances, a tally is also kept of
the pattern in g of these links w ith each other. Thus, a record of th e individual
146
different perspectives, the head perspective and the governor perspective. More
1. the governor label of the head link corresponds to the dependent label
3. the position of the governor should agree w ith th e direction of the de
pendent link.
T he netw ork produced for the test sentence You can remove the docum ent from
original 151 links found in the corpus, only 19 have fulfilled th e construction
links which do not form part of any possible coherent parse tree which has a
single root to which all other words are subordinate. T he removal of impossible
links from the network in Figure 7.6 leaves 13 links rem aining in th e network.
trees in a single graph w ith structure-sharing (van Zuijlen 1988). However, its
E v a lu a tio n Associated w ith each link in the network is a pair of num erical
dependency p a tte rn of its governor, taking the governor’s other dependents into
147
You CcLii remove the document from the drawer
GOV
SUB. ^ov]
[GQV^HtNFCl
___ I
OBJ
p B J “|— R t R2
^REÔ- PARC
PTSK
OBJ
DET
DET
ATRl B b-j ]
ATRl
DET
DET
§ov]- PCT
Figure 7.6: a dependency link network for th e sentence You can remove the
docum ent fro m the drawer
148
weight and th e suitability measures are merged in an adjustable proportion to
yield the quality of the analysis represented by a given tree. In this way the
alternative readings for the sentence can be com pared and a ‘best analysis’ can
be selected. I shall not explore the m athem atics of th e ‘best analysis’ selection
m ethod here.®
will be very interesting to w atch this research develop and to see w hat the per
form ance of the parser turns out to be when it has a reasonably large corpus
to operate on. In the m eantim e judgem ent m ust be reserved on it until more
shall refrain from presenting a more formal PARS version of the algorithm or
a worked example.
provide a startin g point for th e process of m etataxis. I have briefly noted the
PSG and definite clause gram m ar. I have looked in more detail a t a depen
strategy. Probabilities are used to decide the ‘b e st’ analysis when more th a n
149
Table 7.2: m ain features of the DLT ATN dependency parser
T he m ain features of the DLT ATN dependency parser are sum m arized in
Table 7.2.
based on the use of rules and probabilities generated ‘on th e fly’ from a hand-
analyzed corpus. T he parser mixes bottom -up and top-dow n search: th e actu al
the parser begins by constructing as m any m inim al islands (i.e. word pairs)
as it can and then rules out those which are not consistent w ith a coherent
ble 7.3.
150
C h ap ter 8
L exicase parsers
8.1 O v e r v ie w
S tarosta and his graduate students a t th e U niversity of Hawaii over the last
two decades. It is unique in contem porary linguistic theory for a num ber of
this a num ber of papers were soon added (e.g. S taro sta 1971a, 1971b). No
other theory of n a tu ra l language m entioned in this thesis has rem ained so sta
ble for such a long time.^ Second, th e theory has been widely field-tested.
Lexicase gram m ars have been w ritten for significant p arts of around fifty dif
disregard for th e hard facts of language of which some theories are occasion
ally accused. A th ird fact which distinguishes Lexicase from its rivals is th a t
the theory has been all b u t ignored in the linguistic m ainstream . On first
long and which can draw on such an impressive body of descriptive m aterial
151
praised. Neither of these things has happened to any great extent. Instead,
it has been largely ignored. This may be due in p art to the fact th a t the
recently (S tarosta 1988). (At present th e Lexicase literatu re runs to some 130
item s.) The fact th a t this introductory volume has received some positive re
views (e.g. Blake 1989; Fraser 1989b; Miller 1990) m ay signal the awakening of
interest in Lexicase (but see T urner 1990 for a searing a tta ck on th e same vol
ume). Certainly, some of the m ain features which have distinguished Lexicase
from other theories for m ost of its existence — its lexicalism, its recognition of
h ead/dependent asym m etries, its extensive use of features — now form p a rt
I shall not a tte m p t to evaluate Lexicase theory here. R ath er, I shall sketch
th e m ain points of the theory and examine two parsing algorithm s developed
for use w ith Lexicase gram m ars. Section 8.2 provides an overview of Lexicase
theory. Section 8.3 describes the two Lexicase parsing algorithm s.
8 .2 L e x ic a se th e o r y
ety of generative localistic case gram m ar” (S taro sta 1988: 1). It is panlexi
calist in th e sense of H udson (1981a), i.e. the rules of th e gram m ar are lexi
cal rules, expressing relations among lexical item s and features w ithin lexical
Robinson 1970; Anderson 1977; Sgall et al. 1986; M el’cuk 1988). D ependency
152
generative in the traditional Chom skyan sense — th e rules and representations
are expressed formally and explicitly and are concerned w ith a speaker-hearer’s
(Fillm ore 1968); every nom inal constituent is analysed as bearing a syntactic-
sem antic relation to its regent. However, it has evolved away from m ainstream
use of spatially oriented sem antic features. W hereas m ost case gram m ars are
prim arily concerned w ith situations and ‘deep’ analyses (e.g. Fillmore 1968;
Schank 1975), Lexicase tends tow ards identifying case relations w ith syntactic
relations (in this it accords w ith A nderson’s case gram m ar (A nderson 1971)).
tion and the requirem ent th a t every verb contain a P a tie n t in its case frame
8 .2 .1 D e p e n d e n c y in L e x ic a s e
3. th e one-bar constraint;
Before exam ining these constraints, it is w orth noting th a t very few discus
(K ornai and Pullum 1990), only one of which could be said to be equivalent
to S ta ro sta ’s DG.
153
T h e lexical leaf constraint
T he lexical leaf constraint ensures th a t all term inal nodes are words. T hrough
out th e years th a t PSG has been used by linguists, term inal nodes have been
used to represent a num ber of different things besides words, for exam ple
have been introduced into dependency trees. For example, Robinson proposes
syntax. It rules out ‘em pty category’ analyses and the possibility of handling
‘m ovem ent’ by associating moved item s w ith ‘gaps’. S taro sta sums up the
effect of this constraint as follows:
requires this sentence to be analysed in a tree stru ctu re w ith exactly three
leaf nodes corresponding to th e three words in the sentence; the stru ctu re in
of a Lexicase tree for this sentence once we have exam ined th e rest of the
constraints.
154
COMP
NP INFL VP
+ TENSE
+ AGR NP
N V
Stan invent
lexicase
invented
Stan
lexicase
155
T h e o p tio n a lity constraint
PSG (Em onds 1976: 16; Jackendoff 1977: 36; K ornai and Pullum 1990). No
tice th a t this does not exclude the possibility of a phrase containing more th an
struction have a single head” (S tarosta 1988: 12). This is misleading: con
head; rath er, they require th a t every dependent ha.ve a single head. The notion
is not clear w hat work it does. Most constructions are endocentric and have a
single head. T he rest are exocentric and contain at least two coheads^ exactly
one of which is the lexical head and the rest of which are phrasal heads. Two
lexical head and the noun is the phrasal head. In coordinate constructions the
lexical head is the conjunction and the conjuncts are each phrasal heads.^
tric/ex o cen tric distinction (Bloomfield 1933: 194-7) his definitions rested upon
nition, coordinate constructions are endocentric since fish^ chips, and fish and
'Conjuncts may, of course, be realized as single lexical items.
156
chips all have th e same distribution. However, by S taro sta’s optionality-based
which holds between a word (or words) and a whole construction. He reserves
For exam ple, in the sentence I saw big bad John^ John is th e regent of big', it
is also the regent of bad; but it is the head of the whole phrase big bad John.
The one-bar constraint states th a t “each and every construction (including the
sentence) has at least one im m ediate lexical head, and every term inal node is
the head of its own construction” (S taro sta 1988: 14). This has the effect of
Every term inal node has its own one-bar projection and every non-term inal
a sister and a dependent of the m ain verb. S tarosta argues th a t the absence
subject w hat the verb would have done if it were the head of the sentence.
and 8.2 is to reduce and simplify the range of possible stru ctu ral analyses. T he
Figure 8.3.
^Starosta (personal communication) names the Kielikone project (described in Chapter 6)
as his source for this usage.
157
Stan lexicase
T h e s is te r h e a d c o n s tr a in t
their dependent sisters” (S tarosta 1988: 20). In other words, all gram m atical
of X.
The relationship between regents and dependents is antisym m etric; regents
are sub categorized by their dependents b u t dependents can not impose con
straints on their regents. For exam ple, a dependent could not require its regent
to precede it.
T h e f e a tu r e s o n le x ic a l ite m s c o n s tr a i n t
The ‘features on lexical item s c o n strain t’ states th a t “features are m arked only
on lexical item s, not on non-term inal nodes” (S taro sta 1988: 23). This con
strain t is the final step from a stan d ard X gram m ar to a DG. If only the lexical
item s carry features, and lexical item s are sub categorized by th eir dependent
sisters, then clearly all the X stru ctu re is doing is relating lexical item s p air
further. Since every node in a tree is a one-bar projection of its head lexical
item , node labels are predictable and therefore redundant. Thus th e analysis
158
invented
Stan |"+y "| lexicase
[.N ] - - M
Figure 8.4.
This looks rem arkably like a trad itio n al dependency stem m a except for
th e word class features have been shown. However, several different kinds of
feature m ay ap p ear in the lexical entry for a word. These are described in the
next section.
8 .2 .2 L e x ic a l e n tr ie s in L e x ic a s e
features are binary; a lexical item either has or has not got some property.
The presence of property P is identified thus: [+P], its necessary absence thus:
[-P]. C ontextual features determ ine which words are dependent on which other
159
(34)
a [-j-Det]
b [-fint]
c [-[+Det]]
d [+[+N]_]
e [+-(+N ]l
f O -I+ N ll
bearing the feature is a determ iner. Exam ple (34b) is a negative non-
Exam ple (34c) is a negative contextual feature indicating th a t the word bearing
the feature does not have a dependent determ iner (relative position u n stated ).
the feature requires a preceding dependent noun. Exam ple (34e) is a positive
pendent noun. Under certain circum stances the expected word m ay be absent
(for exam ple, in the case of ‘m oved’ wh-words). Double contextual features are
izations would be made. However, S tarosta takes the trad itio n al view th a t ai
up rules which are responsible for inserting all predictable features into lexical 1
entries. These rules he divides into redundancy rules, sub categorization ru le s,,
160
in terpretation rules, and phonological rules. This classification is purely a de
scriptive convenience. Each type of rule has th e sam e basic operation: if a set
of conditions is m et by a word (the left hand side of the rule) then a set of
2. inflectional features;
5. case forms.
Syntactic categories are atom ic. They can not be defined, for exam ple, as
[+N ,+V ]. M ajor syntactic categories are drawn from a very small inventory
which contains th e following items: noun (N), verb (V), adverb (A dv), preposi
(D et), and conjunction (Cnjn). These m ajor categories are divided into dis
trib u tio n al sub categories (e.g. sub categories of N include pronoun and proper
Inflectional features
son, num ber, gender, case, tense, etc. These features have a central role to
161
S em an tic features
Sem antic features serve to distinguish words from each other. It is assumed
th a t the gram m ar contains enough sem antic features to distinguish every lex
ical item from every other (non-synonymous) item in respect of a t least one
distinctive sem antic feature. In parsing, sem antic features have the charac
absolute. Thus, the verb drink m ight expect an object m arked [+dkbl] (drink
C ase relations
Lexicase assumes five ‘deep’ case relations, nam ely A G EN T, PA TIEN T, LO
case relations in other Fillm orean systems are m ade by th e sem antic features
162
C ase form s
Unlike the other features, case forms are not atom ic. R ather, they are con
figurations of surface case m arkers such as case inflections, word order, pre-
case relations. They are grouped together according to which case relations
they identify and on the basis of shared localistic features. Case forms are
composed of gram m atical features such as ‘nom inative’ or ‘ergative’ and lo
calistic features such as ‘source’, ‘goal’, ‘term inus’, ‘surface’, ‘association’, etc.
S taro sta and N om ura claim th a t case forms are used by th e parser to recognize
Once again, this m ust be taken on tru st as no docum ented exam ples are avail
able.
as a case gram m ar b u t this would lead away from my prim ary objective of
publish a detailed explanation of the place of case relations and forms in parsing
therein are disputed in S taro sta’s review of th a t m onograph (S taro sta 1990).
163
INPUT
Pre-processor Morphological
Analyser
Placeholder Placeholder
Substitution Expansion
Parser
Det 's
Adj ^s
Adv 's
Conjunctions
Orphans
OUTPUT
8 .3 L e x ic a se p a r sin g
In this section I exam ine two Lexicase parsers. The first, and b e tte r docu
uct of Francis Lindsey Jr., a grad u ate stu d en t a t the U niversity of Hawaii. It
8 .3 .1 S ta r o s ta a n d N o m u r a ’s p a r se r
T he principle reference for S taro sta and N om ura’s parser is S taro sta and No
m ura (1986).
C om p on en ts
164
T h e p re -p ro c e sso r The pre-processor replaces each word in the input
sentence w ith a feature m atrix, fully specified for all contextual and non-
contextual syntactic and sem antic features. If a word form in the input sen
tence could correspond to more th an one feature m atrix then the word is
replaced w ith a ‘cluster’, a list of all the possible feature m atrices. The o u t
finds a word in the input sentence and looks it up in the gram m ar-lexicon. If
the word can not be found then the morphological analyzer checks to see if
searches are carried out with the stem to see if any o ther affixes produce
hom ographie words. Once again, all of th e possible feature m atrices are stored
together in a cluster.
m atrices. If the only thing the feature m atrices have in common is th e word
form then th a t is all the placeholder will consist of. T he object of placeholder
parse can be produced for the unam biguous p arts of th e sentence and then,
lexical item s to search for dependents. These dependents m ust satisfy the
165
not allowed to cross, i.e. Lexicase has an explicit adjacency constraint.^ As
features of the words are checked. If they are violated, the dependency link
is discarded immediately. After each word pair has been linked by m eans of
positive contextual features and checked and passed by negative contextual fea
are checked. If the link violates the im plicational features the analysis is not
clusters of item s sharing more features in common. The resulting strings are
passed through the parser once m ore to add links th a t become possible as th e
new clusters and entries become accessible. This process of placeholder expan
sion is repeated until all placeholders are eventually resolved into their original
constituent entries. This ensures th a t all possible readings are obtained for a
P a r s in g a lg o r ith m
m orphological analysis the second, placeholder su b stitu tio n th e th ird and then
placeholder substitution should not take place increm entally from left to right.
However, this would not buy anything ex tra since th e p a rser’s input is required
166
effect of the interacting com ponents is to maximize generalizations about the
sentence and to proceed, iteratively, to all possible specific analyses. The
process produces a maximally general analysis for the whole sentence, then it
copies the analysis and adds different, more specific, details to each copy and
then repeats the process for each copy. T he process runs to completion for
m a tte r how m any tim es it is used. However, this system lacks th e elegance
and simplicity of a chart parser’s single pass through a word string. Even if
there were some way for th e Lexicase parser to construct a chart-like structure
pass through the word string m any tim es for other reasons.
The parser sweeps through the word string eight times during each iteration
of the parser/placeholder expansion cycle. This is because it tries to spot
particular kinds of word on each pass. The passes are ordered as follows:
outside these phrasal domains b u t they need never consider any links
thus become available to sub categorize the head of the phrase in which
the P P is located.
167
sequent linking may take place.
not entirely clear why this phase exists in th e parser since all determ iners
lexical item s, the heads are linked to eligible and accessible dependent
3 and 4 m ight be, their desired effect is obvious: in English determ iners
m ark the left boundary of N P ’s and so linking them to their head nouns
ered, the num ber of linking choices should be extrem ely lim ited. Since
168
to them .
Each of these passes through the sentence could take place in any direction but
parser, coupled w ith the general lack of published fine-grained detail, rule out
algorithm .
parsing a relatively short sentence involved som ething of th e order of 100 passes
through the sentence! No perform ance figures are supplied for the parser since
it has never been im plem ented (although this is not m ade clear in any published
description). The fact th a t m ultiple passes are required need not hav a negative
effect on the efficiency of the parser, since the num ber of passes is fixed (i.e.
independent of input length). However, th e fact th a t th e same string has to
be processed tim e and again does beg several questions ab o u t the exact n atu re
subtree has been constructed somewhere in a string, does anything prevent the
kinds of lexical item s into the parser, S tarosta and N om ura have built in th e
assum ptions (i) th a t all languages make use of th e sam e inventory of word
classes and (ii) th a t the appropriate order in which to analyse th em rem ains
constant between languages. The fact th a t the parser refers explicitly to things
called ‘nouns’ and ‘verbs’ ensures th a t it will fail to work if it is presented w ith
169
a perfectly good gram m ar which happens to use different word class labels
parser would be invoked with two argum ents: a pointer to th e gram m ar to use
and a pointer to the parse order definition to use. Any inconsistency between
these two would lead to the result of the parse being not ‘succeed’ or ‘fail’, but
‘erro r’.
8 .3 .2 L in d s e y ’s p a r se r
less inform ation is available describing it. All of the inform ation in this section
T h e lexicon builder
The lexicon builder takes as its input a Lexicase lexicon and a set of Lexicase
rules. It uses the rules (which are, of course, statem ents of predictable infor
m ation about lexical entries) to expand out the lexical entries to produce a
fully specified, full form lexicon. In the case of hom ographie entries, a m aster
m atrices of all th e words with th e same form is created. A m aster en try would
170
(word (category shared features)
(distribution shared features)
(other shared features)
(wordl
(category features specific to wordl)
(distribution features specific to wordl)
(other features specific to word 1)
(word2
(category features specific to word2)
(distribution features specific to word2)
(other features specific to word 2))))
T h e parser
Once again, the parser examines the contextual features of a head word in order
phrase and th a t a phrase is com plete when all dependents of a word have been
found. The parser proceeds as follows (quoted directly from Lindsey 1987: 3):
1. Find the entries for each word of th e input sentence in the lexicon.
2. E lim inate from consideration all words which because of their position
3. D eterm ine which words m ust be sisters of a head because of the distri
4. For all words not unam biguously assigned as sisters to some head by
the above steps, determ ine possible alternative assignm ents. This step
5. U npack m aster entries and determ ine which specific hom onym success
fully satisfies all distributional restrictions. This is done from top down,
6. P rin t out those parses in which all words of th e in p u t sentence fit into
171
Steps 1 through 4 set up a list of potentially successful parses which can then
be exam ined as the basis of alternative sentence readings once the m aster
terse to be really inform ative. The words “determ ine which words m ust be
Thus, Lindsey’s system is more flexible th a n S tarosta and N om ura’s but in
is not clear, for exam ple, w hether or not th ere is any loss of analytic accuracy
on th e p a rt of th e simpler system.
Lindsey’s algorithm .
8 .4 Sum m ary
In this chapter I have briefly reviewed th e theory of Lexicase and two parsers
which are based on it. It is clear th a t the theory is m uch b e tte r developed
th a n the parsers based on the theory. T he parsers do not m ake use of th e full
172
Table 8.1: main features of S tarosta and N om ura’s Lexicase parser
S tarosta and N om ura’s parser searches top-dow n for dependents for dif
ferent classes of word on each of several passes. Unam biguous p arts of the
sentence are built first and then these ‘com m on’ p arts are copied to different
parse trees, one for each possible reading of the sentence. T he m ain features
Very few details are available for Lindsey’s parser. It seems to share a
num ber of properties w ith S tarosta and N om ura’s parser. For example, heads
clusters which can later be unpacked and tried in different parse trees. The
m ain (known) features of Lindsey’s parser are sum m arized in Table 8.2.
173
C h ap ter 9
was notable for two principal reasons. F irst, it argued th a t transform ations
in linguistic theories. Instead, he argued th a t all gram m atical and lexical (and
viewed as a body of facts about words (Hudson 1981a). A round this tim e he
com patible. In fact, the first implies th e second since a collection of facts about
words could not include facts about supra-w ord constituents. T he second
implies the first since a grarrunar w ithout constituents leaves th e word as the
174
These ideas were molded into a coherent theory which came to be known as
th ere has been a m ajor revision of the W G notation and a succession of papers
Hudson.
9 .2 W ord G ram m ar th e o r y
9 .2 .1 F a c ts a b o u t w o r d s
includes any w ord-length unit, however specific or general. Thus, the first
PATR-H (Shieber 1986). However, Hudson has evolved his own m etalanguage
which has a quasi-English syntax and is often simpler to read th an more fa
175
T he predicate is placed in infix position rath er th a n th e m ore fam iliar prefi>x
position
for the sake of readability. T he chosen ordering is congruent w ith the normaal
following:
1. is
2. has
3. precedes
4. follows
5. i ^
T h e pred icate is
X is Y
(N am el of Name2)
^It is possible to define a WG system which has only one predicate and which makes thhe
necessary distinctions in terms of features (see Section 9.2.3 below). However, for the sakke
of clarity of presentation I shall work with the five predicate system here.
176
SU B JEC T O B JE C T
( Ï 1
Ollie obeyed Ronnie
1 2 3
‘su b ject’ or ‘ag en t’) and ‘Nam e2’ m ay be either th e atom ic nam e of a non
(35)
a (subject of word2)
b (agent of (referent of word2))
this way categorial, functional, and sem antic inform ation can be combined in
T he has predicate is used for two main purposes. F irst, it is used to assign
(36)
(37)
177
T he second use of the has proposition is in specifying th e dependency require
(38)
Here the use of a quasi-English formalism is slightly misleading. The use of the
predicate has in (38) does not express the fact th a t some particu lar finite verb is
finite verb has a subject.^ T hus, it could be read as follows: ‘A finite verb
A has (Q B)
where A is some nam ed entity, B is th e nam e of a slot (e.g. ‘su b jec t’) and Q
is a ‘q u a n tita to r’. A q u an tita to r (H udson’s term ) specifies th e num ber of slots
a a X = one X required
b ano X = a t most one X allowed
c m ano X = any num ber of Xs allowed
d m any X = two or more Xs allowed
e mony X = one or more Xs allowed
f no X = X prohibited
T he u tility of these should be fairly obvious. (39a) is used when exactly one
b u t can never have more th a n one filler. For exam ple, a noun can optionally
have a dependent relative pronoun. (39c) is the least constrained — any num
b er of fillers will suffice. For exam ple, a noun can be modified by any num ber
of adjectives. (39d) is used when at least two fillers are required. T he principle
^Hudson intends his theory to be based on the notion of ‘prototypes’ (for useful intro
ductions see LakofF 1985 and Taylor 1989).
178
use for this is coordinate constructions where a conjunction m ust conjoin at
least two conjunct s. (39e) is used when at least one filler of the specified type
is required. For exam ple, a whole has mony parts. (39f) is a simple prohibition
stating th a t a word can not have a slot of some stated kind. In general, a WG
gram m ar follows the closed world hypothesis, i.e. anything which is not explic
m ore flexible form of q u a n tita to r (Hudson 1990: 23-4). T he new kind of quan
tita to r is stru ctu red ra th e r th an atomic. Its stru ctu re is [i-j] where i and j are
integers, i indicates the m inim um num ber of fillers for the slot and j indicates
the m axim um num ber of fillers for th e slot. Equivalences betw een th e old and
new system s are given in (40). I shall use th e old system in all examples.
(40) a a X = [I-l]
b ano X = [0-1]
c m ano X = [0-_]
d m any X = [2-_]
e mony X = [l-_]
f no X = [0—0]
T he question of w hether quant it ators have the effect of creating m ultiple slots
w ith identical properties or allowing single slots to have m ultiple fillers has to
be worked out for any im plem entation but it has no theoretical im portance.
(41)
In these examples, the second argum ent has th e form (a X). This use of a
should not be confused w ith the use shown in (39a) and (40a). This version is
179
case (a X). T he two versions appear in com plem entary distribution: th e qman-
tita to r only appears in has propositions; the instance m arker only appearrs in
is propositions.
T h e p r e d ic a te s p re c e d e s a n d follow s
precedes and follows are used to express relative linear orderings. For extam-
ple:
(42)
For exam ple, (43) shows the same facts as (42) b u t uses only one predicatte.
(43)
a (subject of word2) precedes word2
b word2 precedes (object of word2)
m entation should have to include bo th predicates. See Section 9.2.3 for fu rtth er
examples of the use of positional constraints.
T h e p r e d ic a te i ^
T he isa predicate is used to relate entities to more general entities. For excam-
ple:
(44)
a A P P L E i ^ common-noun
b common-noun isa noun
c noun i ^ word
180
entity
situation theme
event companion
action dependent
communication
speech
word
words. As it stands, this system has no m echanism for m aking or using gen
specifying all the properties of every word. It m ust make generalizations over
9 .2 .2 G e n e r a liz a tio n s a b o u t w o r d s
top of the hierarchy is shown in Figure 9.2 (from Hudson 1990: 76).
181
word
entities. T he details need not concern us here. (For more inform ation on
en tity and which could be described as the ‘word type hierarchy’. P a rt of the
‘D O G ’ or any other specific common noun. Any such pro p erty is said to be
‘inheritable’ by the lower node from th e higher node. T he sim plest version of
(45)
IF XisaY,
P is tru e of Y
THEN P is tru e of X
M ost generalizations which can be m ade about language have got exceptions.
th e exceptional properties in relation to the highest node for which they hold
182
true. T he property inheritance principle is then revised as follows:
(46)
IF XisaY,
P is tru e of Y,
not: not: (P is tru e of X )“
THEN X h a s (Q P)
‘*‘It is not the case that X is prohibited from having the property P ’.
T he usual properties are assumed for an entity unless there are good reasons
usual way to form plural nouns in English is to add the S-morpheme (‘m S’) to
th e noun stem . This generalization can be m ade for all nouns. The relevant
(47)
However, there are a small num ber of nouns (e.g. salmon) which exceptionally
do not follow the norm al plural rule. These would have to be specially marked
(48)
In the case of words such as Aoo/which have coexistent default and exceptional
form and nothing is added to block the default form from being generated by
(47).
(49)
T hus, hoofs can be generated using (47) and hooves can be generated using
(49).
183
head dependent
predependent postdependent
object oblique
direct indirect
object object
in relation to word-sized units in W G , sy n tax and sem antics can also m ake use
in Figure 9.4 shows how th e g ram m atical relations (i.e. types of dependency
(50)
X has (a subject)
(51)
X has (a predependen t)
184
Table 9.1: inheriting properties for w l
S t o r e d p r o p o s it io n s A d d e d p r o p o s it io n s
w l isa DOG
word order.
(52)
such as w l (‘word 1’). Each word is analysed to establish its lexeme and
m orphosyntactic features. Once its lexem e has been found, th e word instance
(53)
m ar, while th e colum n on the right lists th e new propositions added for w l.
word. A lthough absent from Table 9.1, constraints on slots are also inherited
185
during the process.
and Hudson (1990), Hudson (1990a: chapterS), and Fraser and H udson (1992).
9 .2 .3 A s in g le -p r e d ic a te s y s t e m
I have already noted th a t the predicates precedes and follows are not bo th nec
essary. In fact, as Hudson points out (Hudson 1990: 24ff), only one predicate
sented w ith the symbol the gram m ar begins to look very sim ilar to any
o ther unification-based gram m ar. For exam ple, th e following exam ples show
(54)
(55)
a verb has ([1-1] subject)
b (quantity of (subject of verb)) : [1-1]
cat : Verb
a rg l : Subject
(56)
(57)
186
ri
People w ith
m
spare cash spend
1 ri
it in Capri
to by th e m ost deeply em bedded concept to the left of the ‘:’ predicate (i.e.
‘w ord’).
based formalisms (e.g. G PSG , LEG, and Hell wig’s DUG). This is not, however,
to claim th a t they are identical nor th a t the insights typically expressed in these
fram eworks are the same. (For example, W G provides a m uch richer system
be to convert W G gram m ars into a more fam iliar notation. This is a first
step tow ards theory comparison. The next step goes beyond th e scope of the
present work.
9 .2 .4 S y n t a x in W G
stituent stru ctu re is used (Hudson 1989b; Hudson 1990: chapter 14). T he
187
ni
{[Big Mark] and
ri
[wee Nicki]}
r ïi
live in Edinburgh
stru c tu re and its com ponent conjunct s. Dependencies betw een elem ents of
the coordinate structure and elem ents outside the coordinate stru ctu re are
states th a t:
A part from the exceptional case of coordination, all other syntactic stru c
T he sentence in Figure 9.7 shows an exam ple of a dependency stru ctu re which
188
want to leave
SU B JEC T
C ats adore
V ISITO R
Figure 9.8: the use of visitor links to bind an extracted element to th e m ain
verb
head. In this analysis, the extracted word is first bound to th e m ain verb by
(58)
shown in Figure 9.8 (visitor links are draw n below the sentence).
betw een the verb and cats. T he general form of the rule for identifying norm al
189
O B JE C T
SU B JEC T
C ats adore
V ISITO R
Figure 9.9: th e use of the visitor link to relate the ex tracted elem ent to the
m ain verb as its object
(59)
convincing argum ent for the use of visitor links since a simple rule could have
allowed the object to depend on th e verb directly w ithout the m ediation of the
visitor. (60) offers a b e tte r exam ple since th ere is m ore intervening m aterial
(60)
O nly one e x tra rule is required to copy th e visitor link from verb to verb, thus
(61 )
190
cats think you know adore
Figure 9.10: the use of visitor links to interpret the object of an em bedded
sentence
This rule allows sentences like (60) to be analysed w ithout difficulty. The
in Hudson (1988b).
form alism , whose form al properties and consequences are as yet undefined
formally.
9 .2 .5 S e m a n tic s in W o rd G r a m m a r
191
cl c2 c3 c4 c5 c6
Figure 9.11: sem antic structure is very similar to syntactic stru ctu re in W G
The elem ents in sem antic structu re to which words are linked are called ‘refer
T he two basic kinds of relation which m ay hold between referents are depen
dency and identity. A simple exam ple of a possible W G sem antic rule which
(62)
T he diagram shown in Figure 9.11 (from H udson 1990: 123) should serve to
illustrate the extent of congruence between sy ntactic and sem antic structures
in W G . In th e diagram , the labels c l, c2, etc. are the conceptual referents of the
words to which they are linked by d o tted lines. Arrows betw een referents show
This degree of isom orphism between syntax and sem antics allows th e se
m antics simply to be ‘read off’ th e syntactic stru ctu re in m any cases. One of
192
a respectable range of sentences with m inim al effort required (Fraser 1988).
However, it would be foolish to pretend th a t all sem antic analyses are equally
easy. Some difficult problems rem ain to be solved. To d ate, sem antics in W G
im balance will be corrected before too long. In th e m eantim e, the only com
9 .3 W o rd G ram m ar p a rsin g
in BBC Basic and running on a home com puter w ith ju st 32K of RAM
m ore enthusiasm th an program m ing skills” . However m odest the parser may
have been, it becam e the inspiration for my own larger scale parser (w ritten in
Prolog) which formed the basis of my M asters dissertation (Fraser 1985). The
m ain strengths of this system were its com plete separation of gram m ar and
parser and its simple b u t effective im plem entation of default inheritance. These
the m ost im p o rtan t task of a parser, nam ely building appropriate syntactic
structures.
H udson’s 1984 m onograph. This group, and especially Derek Brough, w rote a
num ber of very small trial parsers (Brough 1986). Around this tim e a former
H udson himself had moved on from his com putational small beginnings and
was now using Prolog on a much more powerful m achine. Hudson (1986a)
193
reports H udson’s first parser w ritten in Prolog.
In late 1986 I sta rte d to work as H udson’s research assistant and began
Like all W G parsers developed up until then, this one incorporated an explicit
check on the adjacency of words to be linked. This was very expensive com
London and, in 1987, by Phil G rantham , a p o stg rad u ate stu d en t at Sheffield
during this period, we kept th e im plem entational and algorithm ic details of the
In the rem aining sections of this chapter, these two parsers are described in
more detail.
9 .3 .1 F r a s e r ’s p a r se r
O b jectiv es o f th e parser
I had two m ain objectives in w riting my parser. The first objective, which it
shared with m y earlier W G parsers, was simply to see w hat a W G parser would
194
trial systems had been constructed I felt th a t I was in a position to identify
they took to parse sentences. They ran very slowly, even when working with
had taken 254 seconds to find a first reading for th e seven word sentence
spite of the fact th a t the program was running on a single-user Sun w orkstation
(Fraser 1988: 58). At least p art of th e reason for th e poor perform ance could
had m ade extensive use of the a s s e r t and r e t r a c t predicates to add facts to,
and remove facts from th e Prolog database. A larm ed by th e poor perform ance
tim es, I carried out a series of benchm ark tests and discovered th a t it was
environm ent lists which could be passed betw een predicates th a n to w rite to
and erase from the Prolog database. This problem was easily solved, and the
parser described here seldom asserts and never retracts during parsing.
am ongst these were the role of the adjacency constraint in th e parser and the
it is presented in H udson (1984). In this respect these parsers differ from all
195
profiling my earlier parsers I discovered th a t most of the processing tim e was
cent and then discovering th a t they were not. Given an n word sentence, it is
possible to hypothesize dependency relations between any word and every one
each word). It struck me th a t this was approaching the problem the wrong way
a relation between two words unless they were adjacent, this ought to avoid a
plicit stack and to stipulate th a t the only place which could be searched for
a dependent or head for the current word was a t the top of the stack. The
m ain difficulty for this approach was in establishing dependencies betw een
word pairs when one of the words had been extracted. T he solution I adopted
search and those which could be derived from the dependency relations already
in existence.
ciency in the discovery of all possible readings for a sentence. I did not w ant
to a m inim um the am ount of useless stru ctu re which would be built. Needless
196
to say, this could not prevent the parser from duplicating effort in some cases.
T h e p a rser
istry and the process of chemical bonding. An atom w ith an overall positive
single word is said to have a valency. W here ions have positive charge, words
have a requirem ent for dependents; where ions have negative charge, words
have a requirem ent for a head. W hen a positively charged ion m eets a neg
atively charged ion (and other factors perm it) the two ions bond to form a
single molecule. Any im balance in charge between the two ions remains as
nucleus). In sim ilar fashion, a word which requires (or allows) a dependent
can bond w ith a word which requires a head to form a molecule unless any
constraints prevent it. Any dependency slots (charges) not involved in this
bond rem ain as properties of the molecule. Molecules can bond w ith other
— in my m odel a molecule w ith satu rated valency can serve as a sentence (so
For obvious reasons, the parsing procedure presented here is called the
bonding algorithm.
197
(63).
(63)
[Negative-list, Positive-list, Subordinates, Derivable]
(64)
[NUMBER, TYPE, SLOT-LABEL, SLOT-TYPE, POSITION]
N U M B ER is a unique identifier for the word which has th e slot (e.g. w5).
of dependency relation which m ust hold between the word and its slot filler
(e.g. subject). SL O T-T Y PE is the word type required of th e slot filler (e.g.
noun). PO SITIO N indicates the filler’s position relative to th e word which has
th e slot. There are three values for PO SITIO N , nam ely b e f o r e , a f t e r and
e ith e r.
positive charge w ith dependency requirem ents and negative charge w ith head
requirem ents ra th e r th an vice versa. The significant point to note is th a t they
Derivable is a list of slots and inform ation detailing how to derive th eir
S ta c k s The bonding algorithm makes use of a single parse stack, and only
m anner. T he parser reads one word at a tim e, constructing for each word a
198
fram e of slots and constraints on fillers. T he inform ation which is used to build
For example, the propositions shown in (65) could be inherited for the first
(65)
word-1 i ^ proper-noun
(66)
The same inform ation can be expressed much more com pactly when it is con
verted into molecule form at. T he molecule which would be constructed for
(67)
( 66 ).
(68)
[ n.
[ [ 2, finite verb, [a, subject], noun, after],
[ 2, finite verb, [a, object], noun, after] ],
[],
[] ]
For the sake of simplicity I have ignored th e requirem ent in English for subject-
verb agreem ent. This can be accom m odated in the fram ework but it would
199
require some digression.
molecules to form larger molecules. In general, if some elem ent of the Positive-
list of one molecule can be combined w ith some elem ent of the Negative-list
of another molecule then the second molecule can be merged into the first to
identify the elem ents of Positive-lists and Negative-lists by m eans of the names
given in (69).
(69)
An a tte m p t can be m ade to bond (67) and (68) by trying to unify an ele
m ent from one Negative-list w ith an elem ent from th e Positive-list of th e other
molecule. We shall say th a t the two unify if the following conditions hold:
Let us consider th e first element of the Positive-list of (68) and the first (and
only) elem ent of the Negative-list of (67). These are shown in (70).
(70)
200
1. ‘proper-noun n oun’ succeeds; and
therefore all of the conditions are satisfied and the molecules m ay bond. The
(71)
[ □,
[ [ 2, finite verb, [a, object], noun, after],
[ 1, proper-noun, [ano, post-adjunct], preposition, after] ],
[ subject, 2, 1],
[] ]
Several interesting things have happened here. First of all, th e m atching el
em ents — a Negative elem ent of (67) and a Positive elem ent of (68) — have
collapsed into a single elem ent which is recorded in th e Subordinates list (read
of (67) have been deleted. T he reason for this will soon become apparent.
nam ely th e top two molecules on the parse stack. I shall refer to th e top-m ost
molecule as M l and the next one down as M2. To begin w ith, a test is made
to see if M2 can depend on M l (i.e. if some elem ent in M l’s Positive-list will
unify w ith some elem ent in M 2’s Negative-list). If this test succeeds then the
see if M l can depend on M2 (i.e. if some elem ent in M l’s Negative-list will
unify w ith some elem ent in M 2’s Positive-list). Again, if they unify, the two
molecules bond to form a new one. This becomes the new M l and th e next
highest stack elem ent becomes available as M2. If two molecules will not bond,
then the stack rem ains unchanged. The next word of th e sentence is read and
By the end of a sentence, there should be exactly one molecule left on the
stack. If there is more th a n one then th e parser has failed to find a single
201
dependency structure for the input string.
T he parser only ever searches its im m ediate left context. In this way the
operation of the stack implicitly applies the adjacency constraint. T hus, one
of th e objectives of the parser has been satisfied: the parser never attem p ts
involved. There is only one place to look for a head or for a dependent, nam ely
com bined molecules (67) and (68), we produced a new molecule (71). However,
(72)
It should be obvious th a t the first word of a sentence can not possibly have
which states th a t any M l which has optional slots for dependents w ith the
before order feature, will have these options closed if it is found th a t there is
nothing else on the stack. This is because there are not, and never will be, any
available fillers. This accounts for the disappearance of slot (72a). Likewise,
if M l has non-optional slots for preceding fillers and th ere is nothing else on
the stack, then no single dependency stru ctu re will ever be able to link all
of th e words in the string into a coherent sentence. This fact can be used
to spot impossible analyses before fu rth er stru ctu re is built fruitlessly. If this
heuristic were not applied, parsing could continue until the end of th e input
rule which states th a t when an M I becomes the head of an M2, any optional
202
* 1
a f t e r slots the M2 may have had are removed. This is because structures of
Had the slot been obligatory and not ju st optional, this would have signalled
result.
Thus, a t the cost of two simple tests a t bonding tim e, th e am ount of need
less processing can be significantly reduced. I shall show below how th e parser’s
203
IN I T IA L I Z A T I O N : read input words into a list
(in molecule form at);
C is the current word in the list;
C := l;
X is a pointer;
X := l;
initialize an em pty stack;
the result is stored in the variable Result.
1. IF empty(Stack)
T H E N IF obligatory_slots(before, X),
T H E N fail
E L S E strip_optional_slots(before, X),
push(X),
C :=C + 1 ,
X:=C,
goto( 2 )
E L S E IF X top(Stack)
T H E N IF obligatoryjslots(after, top(Stack))
T H E N fail
E L S E strip_optional_slots(after, top(Stack)),
record(X —> top(Stack)),
pop(Stack),
goto(l)
E L S E IF top(Stack) X
T H E N IF obligatory_slots(before, X)
T H E N fail
E L S E strip_optional_slots(before, X),
record(top(Stack) —> X),
X:=top(Stack),
goto(l)
E L S E push(X).
C:=C+1,
X:=C,
goto( 2 ).
204
2. IF C=e
T H E N Result:=top(Stack),
pop(Stack),
IF empty(Stack)
T H E N succeed
E L S E fail
E L S E goto(l).
ject the thesis has been extracted out of its norm al post-verbal position.
(73)
T he thesis I w rote
this case, the (thesis) is recognized as the visitor of wrote. W hen the (thesis)
becomes visitor of wrote it is absorbed into the molecule headed by wrote and
finds its use. T he Derivable list contains identity propositions. In this case,
th ere is a proposition which equates the object of a tensed verb w ith th e visitor
added to the parse record. In this way, th e parser is able to build all of the
205
A d d itio n a l o p tim iz a tio n s The root of a sentence differs from all of the
require a head). This makes it easily identifiable during parsing. One useful
relation will ever cross the root. Therefore, when th e root is pushed onto the
stack, the stack must be empty, otherw ise th e molecule or molecules left on
the stack will never be integrated into the molecule headed by th e root. This
can also improve the average perform ance of the parser. A part from th e hand-
analyzed BKB corpus compiled for th e DLT project (7), the only corpora
These were very small, exploratory corpora, which had no claims w hatsoever
it were assumed th a t the sentence were being parsed by an increm ental, left
exceeded three, and certainly never exceeded four. If this result could be shown
to be valid for a corpus of significant size, it would have im plications for the
result would be very slim from a parser sta te in which four or more molecules
were resident on the stack. This constraint is fragile — after all, it is possible
206
while m aintaining consistently high levels of efhciency. The parser analyzes
ical item s. This is roughly 1/64 of the tim e taken by th e parser’s im m ediate
predecessor.
dent on the hardw are and software platform s used. Some im plem entation-
asym ptotic com plexity of the parser is of interest. It is w orth pointing out
th a t the optim izations to the bonding algorithm described above do not affect
the asym ptotic complexity; they only affect the size of the constant in the
calculation.
biguous gram m ar, the m axim um am ount of work required to find the depen
dents of a word and the head of a word is constant for all words. Therefore the
form al com plexity proof has been constructed, it is hard to see how th e formal
result could differ from the one arrived a t informally here. Em pirical experi
m ents w ith th e parser support this result. (T he average tim e of 0.23-0.25 secs
being forced to backtrack. Like m ost parsers which make no use of charts, the
have not been taken into consideration in arriving at this figure. It seems
complexity.
207
9 .3 .2 H u d s o n ’s p a rse r
O b jectiv es o f th e parser
H udson had two main objectives in writing his W G parser. F irst, he was in ter
ested in developing a tool which would help him to w rite consistent large scale
gram m ars of n atu ral languages. It is very difficult when w riting realistically-
sized gram m ars to m aintain internal consistency and to an ticip ate all the con
sequences of the addition of some new gram m ar rule or rules. One solution to
this problem is to build a com putational environm ent, a ‘gram m arians work
b en ch ’, which allows the gram m ar w riter to modify the gram m ar and then to
tures are known. Ideally, any modifications to th e gram m ar should increase the
num ber of test sentences which th e parser analyses correctly. Since H udson’s
reviewed above. This is autom atically compiled into a denser, less readable
system -internal representation. It is not necessary to be fam iliar w ith this rep
tried. This requires th a t the analysis system be com pletely separate from the
are clearly distinct, even to the extent of residing in different com puter files.
T he gram m ars are collections of declarative facts which can be slotted into the
very general objects (which are not specific to any language or construction)
208
techniques which are com putationally efficient but cognitively unm otivated. In
T he theory calls for increm ental processing of sentences and the generation
simplification of w hat hum ans seem to do. There is evidence th a t while people
for a lim ited period only before selecting some particular reading for a word.
This m eans th a t m ost of the tim e alternative analyses are not carried all the
sentences (M arcus 1980) illustrate this phenom enon. H udson’s parser is thus
only a model of certain aspects of increm entality and am biguity handling since
it always processes increm entally and always builds all possible readings for
sentences which are to some extent ill-formed. T h at is, they do not reflect
in the direction of the greatest net gain in inform ation (in this respect it is
209
Fraser and Wooffitt 1990.) This could be expected to have a profound effect
on the design of a com puter model. Sadly, all W G parsers produced to d ate
T he parser
each sentence from left to right, one word a t a tim e. W hen a word is read into
an affix. The stem is used to locate where to a tta ch the word instance in th e
lyzer here. Details can be found in Hudson (1989c: 327ff; a fuller tre a tm e n t
2, etc. Thus, any word in the system is identified by a two elem ent list. T h e
first elem ent identifies its position, the second element identifies its reading. Im
sentence (74), th e first occurrence of saw would have a nom inal and a verb ad
reading. Thus, two distinct words would be identified: [2,1] and [2,2]. T h e
(74)
N ext, th e parser has to inherit properties for each reading identified. It is no>t
way. In this parser all inheritable properties are collected together at once ancd
210
built into a feature structure associated with the word instance. The feature
stru ctu re includes properties inherent to the word such as tense, num ber, etc.
It also includes inform ation about the w ord’s possible dependents and, more
controversially, about its head. T h a t is, m ost words other th a n finite verbs
(75)
m ay serve as a head:
(76)
The algorithm always tries to link a word to a preceding word. It never searches
the right context. It begins by trying to find a dependent for th e current word,
startin g w ith the closest preceding headless word and working back towards
the first word. This process continues until all dependents are found for the
current word or until no more options are available. Next, the current word
searches the previous context trying to find another word (only roots of p artial
trees are considered) which could serve as its head. If it is successful then the
next word is read in, morphologically analysed, assigned default properties and
m ade the current word in the parser. If no head is found for th e current word
then checks are m ade to see w hether (on th e basis of local knowledge) the
current word could possibly be the sentence root or, alternatively, if it could
depend on a word which has not been read in yet. If either of these options is
not ruled out th en the next word becomes the current word. If neither option
copy any dependency relations which hold between any other conjunct roots
and words outside the coordinate structure. If none of these tests succeeds
th en the parse has failed. T he parse succeeds when the final word has been
211
processed and all of the words are subordinate to a single root.
Hudson describes his algorithm as follows (quoted from H udson 1989c:
334):
of W.
has no head;
(b) O therwise, go to 2.
3. Try to take W as a word which need not have a preceding head, either
4. Try to take W as the root of a conjunct which shares its external relations
212
I N I T IA L I Z A T I O N : read input words into a list;
C is the current word in the list;
C := l;
initialize a stack, Stack;
push(Stack, C);
C:=2;
X is a global variable;
the result is stored in the variable Root;
the action ‘ro o t(R o o t)’ succeeds if Root
does not require a head.
1. IF C = e
T H E N goto(3)
E L S E IF empty(Stack)
T H E N goto(2)
E L S E IF C ^ top(Stack)
T H E N record(C top(Stack)),
remove(top(stack)),
pop(Stack),
goto(l)
E L S E X:=C-1;
goto( 2 ).
2. IF X = 0
T H E N push(C),
C :=C + 1 ,
goto(l)
E L S E IF X -> C
T H E N record(X -4 C),
C:=C+1,
goto(l)
E L S E X := X - 1 ,
goto(3).
213
3. Root:=top(Stack),
pop(Stack),
E L S E fail.
Unlike H udson’s previous parsers (Hudson 1985b; H udson 1986a), this one
T his would b e an unreasonable claim to m ake for the trad itio n al version of the
investigation.
We have seen how structured nam es for word instances distinguish them
on th e basis of position and reading. However, this does not cover all possible
am biguities. T here m ay also be am biguities of attach m en t. For example, in
sentence (77) the phrase with a telescope could modify either saw or the man.
T he alternative analyses are shown in Figures 9.13 and 9.14. (T he stan d ard
(77)
214
saw the m an w ith a telescope
1
saw the m an w ith a telescope
identifiers during the parse to cope w ith cases like this. Needless to say, two
instances sharing the sam e position in the sentence may not enter into any
By m eans of the above nam ing convention, all possible readings for the
include a variety of different kinds of com plem ent and adjunct structures,
9 .4 Sum m ary
B oth parsers described here work bottom -up, left to right, w ith a single pass.
215
Table 9.2: main features of Fraser’s Word Grammar parser
This is m ade possible by the fact th a t W G words are subcategorized for: heads
as well as for dependents. The parsers differ in respect of th eir tre a tm e n t
tances. H udson’s parser is much slower and m uch more thorough, g e n eratin g
The m ain features of my parser are sum m arized in Table 9.2. Thiose of
216
C h a p te r 10
C o v in g to n ’s parser
hiss ^publications. Syntactic Theory in the High Middle Ages (Covington 1984)
diaiewal gram m ar which informs his work in DO. Section 10.3 describes the
uniifiication-based gram m atical form alism Covington assum es, and Section 10.4
1(0-2 E a r ly d e p e n d e n c y g ra m m a r ia n s
of D'G back to the M odistae, a group of mediaeval gram m arians startin g w ith
Maarttin of D acia in the mid 1200s who a tte m p ted to m ake ‘modes of signifying’
thœ Ibasis of all gram m atical analysis (Covington 1984: 25). One of th e most
217
in a construction is not symmetrical; one of the words is the dependens, the
not to say th a t his parser slavishly follows its dictates. His parser is n o t an
218
kind th a t are commonly used in unification-based gram m ars (Shieber 1986).
T he following rule:
(78)
category : X ■ category : Y -
gender : a gender : a
num ber : N num ber : N
case : . a case : a
indicates th a t a word of category Y w ith gender G, num ber N and case C can
depend on a word of category X w ith gender G, num ber N and case C. The
rule says nothing about word order. By convention th e head is always w ritten
first. If th e feature structure corresponding to some word unifies w ith the left
h and side of the rule and the feature stru ctu re corresponding to some other
w ord unifies w ith the right hand side of the rule, th e two words can en ter into a
dependency relationship in which the head is the word whose feature structure
A simple sem antics can be built into this fram ew ork as follows:
(79)
category : noun
category : verb
num ber : N
num ber : N
person : P
person : P
sem a n tics : Y
sem an tics : X{Y^Z)
case : nom
This rule allows subjects to depend on verbs and also ensures th a t th e subject
This kind of simple sem antics is used to m anage optionality and obliga
toriness in the gram m ar. If an argum ent is obligatory then it is also unique.
m atrix which can not be subsequently rein stan tiated . Therefore th ere can not
be m ultiple m atches. If this sem antic constraint were not present, th e above
rule could be used to provide the verb w ith as m any ‘su b jects’ as there were
to ensure th a t no sem antic argum ents rem ain u n in stan tiated . In order to add
219
optional dependents to a word, the rules relating to these dependents must
existing features.
Even variable word order languages place some constraints on order such
m arking rules where necessary as ‘head first’ or ‘head la s t’ and requiring the
head of the constituent in question w ith a feature contig which would be copied
recursively to all its dependents. An explicit check would ensure th a t all words
1 0 .4 C o v in g to n ’s p a rser
T he parser m akes an initial pass through th e sentence, looking up each word iin
th e lexicon and replacing the word in th e in p u t string w ith its feature structurée.
^Covington’s paper describing his parser (Covington 1990a) won first prize in the Socital
Sciences, Humanities and Arts section of IBM’s Supercomputing Competition (see Thhe
F in ite S tring 16:3, September 1990, page 31; L S A B u lletin 129, October 1990, page 16).
220
T here is no reason why this lexical scan phase should not be interleaved with
Two lists are m aintained by the parser: ‘PrevW ordL ist’ which contains all
words th a t have been input to the parser so far, and ‘H eadL ist’ which contains
only words which are not dependents of other words. At th e start of parsing
b o th of these lists are empty. At the end, HeadList should contain a single
item , th e only word w ithout a head left in the sentence, i.e. the sentence root.
C o v in g to n ’s parsing algorithm
1. Search PrevW ordList for a word on which the current word can depend.
th e m ost recent one on the first try; if th ere is none, add th e current
w ord to HeadList.
2. Search HeadList for words th a t can depend on the current word (there
can be any num ber), and establish dependencies for any th a t are found,
221
I N I T IA L I Z A T I O N : read input words into a list;
C is the current word in the list;
C:=l;
initialize two empty stacks: Stackl and Stack2;
push(Stackl, C);
C:=2;
Root is the result variable;
X is a global variable;
X :=l.
1. IF C=e
T H E N goto(4)
E L S E IF X=0
T H E N push(Stack2, C),
goto(2)
E L S E IF X C
T H E N record(X C).
goto(2)
E L S E X:=X-1,
goto(l).
2. IF empty(Stackl)
T H E N goto(3)
E L S E IF C top(Stackl)
T H E N record(C —> top(Stackl)),
pop(Stackl),
goto(2)
E L S E push(Stack2, top(Stackl)),
pop(Stackl),
goto(2).
3. IF empty(Stack2)
T H E N X:=C,
C:=C+1,
goto(l)
E L S E push(Stackl, top(Stack2)),
pop(Stack2),
goto(3).
222
4. Root:=top(Stackl),
pop(Stackl),
IF empty(Stackl)
T H E N succeed
E L S E fail.
Notice th a t the parser begins by searching for a word on which the present
word m ay depend and afterw ards searches for words which can depend on
the present word. This is unusual; for example, my own dependency parser
and those of Hudson, and S tarosta and N om ura all begin by searching for
dependents for the current word and thereafter proceed to searching for a
head for the current word. T he reason for proceeding in this way is simple.
head is ruled out. Perhaps this difference is not yet relevant to C ovington’s
system since he has so far tested his algorithm only against d a ta from Russian
and Latin, bo th of which have variable word order and rich case systems.
fixed order, virtually case-free language like English. C ovington claims th a t his
1. W hen looking for the word on which the current word depends, consider
depends.
recently added.
223
A PARS description of Covington’s modified algorithm is given below.
1. IF C=e
T H E N goto(5)
E L S E IF C-1 C
T H E N record (C-1 C),
goto(3)
E L S E X:=C-1,
goto(2).
2. IF H - , X
T H E N IF H - , C
T H E N record(H — C).
goto(3)
E L S E X:=H,
goto(2)
E L S E push(Stack2, C),
goto(3).
224
3. I F empty(Stackl)
T H E N goto(4)
E L S E Top:=top(Stackl),
IF C -> Top
T H E N record(C —> Top),
pop(Stackl),
IF top(Stackl)=(Top-l)
T H E N goto(3)
E L S E goto(4)
E L S E pop(Stackl),
pop(Stackl),
IF Top(Stackl)=(Top-l)
T H E N push(Stack2, Top),
goto(3)
E L S E push(Stackl, Top),
goto(4).
4. I F empty(Stack2)
T H E N C:=C+1,
goto(l)
E L S E push(Stackl, top(Stack2)),
pop(Stack2),
goto(4).
5. Root:=top(Stackl),
pop(Stackl),
IF empty(Stackl)
T H E N succeed
E L S E fail.
similar in spirit although H udson’s parser can deal w ith phenom ena such as
coordination and movement which Covington’s can not handle. Links would
225
not be established in th e same order in bo th parsers since, as I have already
pointed out, H udson’s parser searches for dependents first and C ovington’s
parser searches for heads first. This difference is not trivial. In m any cases it
(80)
its head. This fails. Since only blue and any word on which blue depends (in
this case none) m ay be considered as a head for cheese^ cheese m ust be added
T here are no more words in the sentence so parsing term inates. However,
cheese has not been linked to its head like. T he parsing algorithm has failed to
find a stru ctu re for (80). Having read an earlier draft of this chapter, C ovington
are searched for first, has now been published (Covington 1990b). It appears
the parser uses Prolog’s backtracking facility to ‘unpick’ w hat has been built
back to the point where the wrong choice was m ade and th en starts building a
new analysis. This approach to recovery from failure (it is also th e m echanism
226
it m ust be said th a t he presents his parser as a pro to ty p e so it is probably too
parser. T he tim e required to parse an n-word sentence using the most efficient
2. W ith o u t backtracking, this would require, at m ost, exam ining every com
w orst it m ust repeat all its previous work every tim e it parses another
(i) th e com plexity is due to allowing discontinuous constituents, not to the use
of dependency; (ii) worst case complexity is irrelevant to n atu ral language pro
cessing (after all, hum ans are typically unable to process ‘worst cases’); (iii)
the com plexity can be reduced by p u ttin g a rb itra ry limits on how far away
from th e current word the search for heads and dependents may proceed.
th a t, other things being equal, the parser w ith th e lowest com plexity is to be
prefered over any alternatives. The argum ents he offers could be m ade for
227
any high-com plexity parser so they do not distinguish his parser from others
of sim ilar complexity. N either do they justify the selection of this parser over
DGs; his parser does not handle them . Since he has placed special em pha
selected th e task to which dependency parsers are best-suited. A fter all, his
are accessible to each other. The parsing com plexity m ay be high b u t the
algorithm ic com plexity is low. If, on the other hand, he modifies his parser
but the algorithm ic com plexity is increased. Furtherm ore — in addition to the
phenom ena in English. W hat is required is an adjacency con strain t plus some
principled way of analysing th e small num ber of discontinuous constituents
which regularly occur in fairly fixed word order languages like English.
It has as its prim ary objective th e parsing of variable word order languages.
I have presented two versions of the parser. W hereas th e first version has
no adjacency constraint a t all, the second version does include one. B oth
parsers im plem ent left to right, bottom -up, depth-first search. T hey both also
constraint to parse some sentences correctly. Each version yields one parse
228
Table 10.1: main features of Covington’s first two dependency parsers
in Table 10.1.
229
C h ap ter 11
guage A nalysis’) analyzes and ‘u n d erstan d s’ inform ation from a news agency
wire by using a m ixture of PSG and DC (Danieli et al, 1987), PSG is used
dencies between m ajor phrases. The rational for this approach is th a t phrase
by and large, syntactic and sem antic dependencies are isomorphic. This is
claimed to assist in early disam biguation since sem antic constraints can be
tem bears a striking sim ilarity to N iederm air’s divided valency-oriented parser
230
to be non-trivial as we shall see in Section 11.2. The parser is described in
Section 11.3.
One of the m ost im portant differences between spoken language and w ritten
boundaries are clearly indicated by the presence of a space. In com puter sys
while pauses m ay occur betw een words, there are no guarantees th a t this will
happen in every case. R ather the opposite is the case: it is norm al for words to
w ith the initial segment of the following word. Thus, th e speech recognition
problem does not consist solely in th e identification of w hat lies between word
be segmented as I see or icy. Given the lim itations of present speech recog
segm entations and, for the signal chunk between each hypothesized pair of
word boundaries, there will be several different word candidates. Far from
the ‘correct’ analysis of the sentence. Figure 11.1 shows a very simple lattice
based on the two words I know. This is a portion of a larger lattice presented in
Phillips (1988). In reality, m ost lattices are likely to be m uch m ore com plicated
th a n this.
Not all paths through a lattice are equally likely. W hen a speech recognizer
constructs a word hypothesis it weighs the evidence for and against the validity
231
inner, honour, owner, army
of the hypothesis and assigns a num eric ‘confidence score’ to the hypothesis.
entered in the lattice, otherw ise it is discarded. Since all words in the lattice
have an associated confidence score, it is possible to rank-order p ath s through
the next section how th e SYNAPSIS parser is able to analyze some sentences
levels of syntax and sem antics. Ideally there will be a single p a th through
spoken sentence should thus reduce to the task of constructing a lattice and
232
then parsing every p a th to find the syntactically and sem antically coherent
one(s).
T here is a simple reason why this approach is im practical for most purposes:
there are too m any possible paths through the lattice. Speech understanding is
a real tim e activity. Most speech interfaces are conceived w ith the aim of facil
itatin g rapid hands-off interaction w ith a com puter. T here may simply be in
sufficient tim e for all possible paths to be considered (assum ing th e constraints
an in terp retatio n w ithin the lim its of the desired response time. In the case of
‘conversational’ com puter systems such as the Sundial system (Peckham 1991),
rapid response m ay be necessary for other reasons. For example, it has been
shown th a t in everyday hum an-hum an conversation, speakers seldom leave u n
speaker B a question and speaker B does not respond w ithin the crucial %1
second period, speaker A will feel compelled to take th e initiative and begin
ization and it is unlikely th a t the phenom enon transfers exactly to hum an-
m ents (Fraser and G ilbert 1991b) in which subjects conversed w ith a sim ulated
com puter, suggest th a t a related phenom enon can be found in hum an-com puter
interactions (Fraser and G ilbert 1991a, Fraser et al. forthcom ing). Clearly, a
silence’.
indeed. A ten word sentence, analysed as a lattice consisting of ten edges, each
having four com peting hypotheses, would yield m ore th a n a million possible
m ore th a n four hypotheses for each edge. For exam ple, th e CSELT speech
233
recognizer constructs lattices containing approxim ately fifty tim es the num ber
of actual words u ttered . I shall use a much lower figure to illustrate the n atu re
lattice containing ten com peting hypotheses for each word. This lattice would
yield m ore th an ten billion possible paths. Phillips reports th a t “An actual
parser I have used would usually find a parse [for a ten word sentence] after
for each position” (Phillips 1988). Assuming it were possible to produce one
hundred parses per second for a ten word sentence, it would take about eleven
According to G azdar and Mellish, “Am biguity is arguably the single most
th e word class or word sense am biguity which preoccupies m ost com putational
linguists and to which G azdar and Mellish refer, is norm ally considered from
a lattice, and the indeterm inacy of th e acoustic signal is com pounded w ith the
indeterm inacy of th e gram m ar, the com binatory explosion of possible paths
signal to lim it the search space of th e parser, and the inform ation contained
such, its concerns are different from those of the other parsers I have described
and it is not readily com parable w ith them a t an algorithm ic level. However,
234
it particularly relevant to our present concerns. It also serves to illustrate a
form of syntactic and sem antic inform ation used by th e parser. Section 11.3.5
describes the basic SYNAPSIS parsing strategy and Section 11.3.6 outlines a
1 1 .3 .1 O v e r v ie w o f S Y N A P S I S
T he SYNAPSIS (SYN tax-A ided P arser for Sem antic In terp retatio n of Speech)
guage requests (Fissore et al. 1988). The d atabase used during developm ent
in the interp retatio n of a lattice. T h a t is, knowledge of syntax and sem antics
parser had to im plem ent a top-down strategy. On th e o ther hand, since it was
observed th a t correct words were usually — though not always — am ongst the
235
Since the search space is so large, it was considered ap p ro p riate to apply
representation based on caseframes (Fillm ore 1968) was adopted, for reasons
they have to do w ith the usual range of issues which confront linguists. The
the sem antics be form ally explicit, descriptively adequate, and com positional.
tive units in the lattice are words, it is desirable th a t single words should trig
ger sem antic rules. This is tru e of caseframes which are associated w ith single
words. (Effectively th e caseframe expresses the sem antic valency of the word).
A nother m otivation is the desire to “correlate sem antic significance w ith acous
tends to be u tte re d more clearly, and hence is easily recognized w ith good
acoustical score” (ibid.). For these reasons, caseframes have been adopted in a
num ber of speech understanding system s (e.g. B rietzm ann and Ehrlich 1986;
Caseframes encode only sem antic slots for a given word, and constraints
on sem antic constraints alone. For exam ple, th e sem antic caseframe for the
verb put will indicate th a t it requires a PA TIEN T of some m aterial type (i.e.
th e thing which is ‘p u t’) and a GOAL of type LOCATION (i.e. where the
PA T IEN T is ‘p u t’). This says nothing about the realization of these cases.
For exam ple, it places no constraints on the relative ordering of put, its PA
sem antic constraints in order to maxim ize th e useful inform ation in top-dow n
236
predictions.
T he way in which syntactic and sem antic inform ation is combined is of
to the case slots, after the fashion of C onceptual D ependency (Schank 1975;
Schank and Riesbeck 1981). This would produce a ‘sem antic gram m ar’. There
are a num ber of argum ents against this way of tackling the problem . Firstly, it
misses a lot of syntactic generalizations. Most sem antic gram m ars are w ritten
piecemeal w ith new ‘concepts’ being added when required and (usually barely
adequate) word order features being added to each new sem antic entry. There
such as relative clauses. Secondly, th e gram m ars are necessarily tied to some
sem antic dom ain. A sem antic gram m ar developed in th e context of Italian
sem antic gram m ars are not readily modifiable since sy ntactic and sem antic
a discussion of the shortcom ings of sem antic gram m ars see R itchie 1983.)
tween syntax and sem antics during gram m ar developm ent and to ensure th a t
way, formal rigour and consistency can be m aintained. W hen th e syntax and
th e sem antics are com pleted for some phase of th e project, they are auto
m atically compiled into a unified fram ework sim ilar to a feature gram m ar
w ith mixed syntactic and sem antic features. G rishm an observes th a t parsers
m an tic...p attern s and then applying (lim ited) sy ntactic checks, whereas most
parsers are guided by syntactic p a tte rn s and then apply sem antic checks”
(G rishm an 1986: 121). SYNAPSIS tre a ts neither syntax nor sem antics as
prim ary, bu t instead it merges the two into a genuinely mixed gram m ar which
237
is nonetheless easily portable and modifiable.
Syntax in SYNAPSIS is expressed in term s of DG. T he choice is n a tu
ral, given the adoption of caseframe sem antics. B oth system s are w ord-based
and both directly encode the notion of a head or governor and a set of de
DG practitioners:
In order to avoid term inological com m itm ent to regarding either sy n tax or
sem antics as basic, rules containing merged syntactic and sem antic constraints
1 1 .3 .2 D ep en d en cy gram m ar
D G = { C ,R }
(81)
238
X i G C and n > 0.
m odification to the Gaifman rule form at. Notice th a t because the gram m ar is
words in the gram m ar. This makes it difficult to express th e strongest possible
predictions which the gram m ar ought to be able to make, nam ely predictions of
single words. For example, the English verb depend requires a nom inal subject
and a com plem ent which m ust be the word on. This observation is very robust
and could be used to direct word recognition w ith pinpoint accuracy. It would
(82)
depend = NOUN * on
is the following:
(83)
D EPEN D = NOUN * ON
where ‘D E P E N D ’ and ‘O N ’ are classes which each possess exactly one member.
propriate for describing formal languages b u t, like stan d ard phrase stru ctu re
rules, they are ill-equipped for making th e full range of generalizations relevant
m orphosyntactic agreem ent, it is necessary to augm ent the basic rule set. P re
vious chapters have docum ented how a popular approach has been to define
DGs in term s of complex feature sets which are com binable by unification.
conditions take the form of a word class label (which m ust be present in the
variable (a character preceded by ‘?’) may be used. W here the same variable
239
For example,
R u le : VERB = ART A D J NOUN * ART A D J NOUN
the article, adjective, and noun preceding it m ust agree w ith it in num ber and
w ith each other in gender. T he article, adjective, and noun following the verb
are not required to agree w ith it at all b u t they m ust agree w ith each other in
gender and num ber. (This exam ple is appropriate for Italian b u t not wholly
appropriate for English’s much sparser agreem ent system ). Published accounts
do not m ake clear how the particu lar symbol in a rule (e.g. one of the two
(Shieber 1986).
1 1 .3 .3 C a se fr a m e s
A caseframe represents the sem antic valency of a head word. It contains any
num ber of case slots (i.e. roles) and constraints on the types of possible
slot fillers. Some slots m ust be filled; others are optional; they correspond to
necessary p arts of the state, action, or entity the caseframe describes b u t they
unnecessary for the purposes of the present discussion. T he exam ple shown
240
[LOCATED-IN-REGION]
(AGNT:Compulsory) [MOUNT+PROVINCE+LAKE]
—> (LOC:Compulsory) — [REGION]
in Figure 11.2 should serve to illustrate w hat a caseframe looks like. (The
sem antically well-formed utterance. Notice th a t both the syntax and the se
1 1 .3 .4 K n o w le d g e so u r c e s
T he dependency rules and the caseframes are not used serially, w ith one rule
tion. Instead, the syntactic and sem antic rules are combined to form a unified
sam e tim e. In principle, the combining of syntactic and sem antic constraints
could be done ‘on the fly’ during sentence processing, thus creating the re
alm ost certainly be costly in term s of processing tim e and it could result in the
sam e com bination having to be perform ed m any tim es during a single recogni
to pre-com pile th e syntactic and sem antic inform ation into its unified form at.
indicative verb w ith two dependents, one preceding it and th e o ther following
it. The following noun m ust agree in num ber w ith th e verb. Com m ents are
preceded by
241
VERB(prop) = NOUN(interr-indir-loc) <GOVERNOR> NOUN(subj)
;; Features and agreement
<GOVERNOR> (MOOD ind) (TENSE pres) (NUMBER ?X) ...
NOUN-1 ...
NOUN-2 (NUMBER ?X)
[TO-HAVE-SOURCE]
—> (AGNT:Compulsory) — [RIVER]
— (LOC:Compulsory) — [MOUNT]
Com bining the sem antic inform ation expressed in Figure 11.4 w ith th e
syntactic inform ation shown in Figure 11.3 produces the knowledge source
(KS) shown in Figure 11.5. (All of these d a ta structures are taken from
T he first ‘co n strain t’ entry states th a t th e head word m ust be a present in
dicative verb and th a t the M OUNT elem ent m ust be realized as a noun. The
;; Composition
TO-HAVE-SOURCE = MOUNT <H EADER> RIVER
;; Constraints
<H E A D E R > -MOUNT ((H-cat VERB) (S-cat NOUN) (H-feat MOOD ind TENSE
pres...)...)
<H E A D E R > -RIVER ...
;; Header activation condition
ACTION(TO-HAVE-SOURCE)
;; Meaning
(TO-HAVE-SOURCE ! * agnt 1 loc 0)
242
‘header activation condition’ is a flag to tell the parser how to use th e KR.
The ‘m eaning’ entry is used to construct the com positional sem antics of the
are now ready to consider the parsing procedures it uses. Two versions of the
1 1 .3 .5 T h e s e q u e n tia l p a r se r
existing lattice; they are not used to guide the construction of th e lattice. One
before starting to analyse w hat has been said. T he argum ent advanced by the
SYNAPSIS designers is th a t a real-tim e system cannot afford to sta rt parsing
as soon as the left-hand-side of the lattice has been built since there is always
the possibility th a t word recognition m ay be locally poor and this would lead
to a lot of wasted effort. It is much more prudent, they argue, to wait until the
end of the sentence and then begin parsing from the word in th e lattice w ith
scoring word will have been recognized correctly, and this allows fairly reliable
the utterance. Their claim is th a t this non-linear increm ental process results in
imagine a number of heuristics which m ight be useful, such as tim ing pauses
243
against some threshold, or arbitrarily insisting th a t a sentence m ay not exceed
instance (DI), T he tree stru ctu re derives from th e fact th a t the in stan ti
ated header has unfilled case slots. One way of viewing Dis is as phrase
hypotheses.
w ord hypothesis.
unfilled slots. If the DI is of type T, then this can be used to in stan tiate
244
require a filler which is syntactically a NOUN and sem antically a R EG IO N .
All the word hypotheses in the lattice are checked using the V ER IFY operator.
W hen several hypotheses meet these conditions, the best scoring hypothesis
is activated while th e others are stored in a ‘w aiting’ list until such tim e as
word is added to an existing DI, the confidence score of the word hypothesis
and the quality factor of the DI are combined to produce a new quality factor
for the DI. T he best way to com pute this new quality factor is an open research
question. Versions of SYNAPSIS have been tried out which calculate quality
factors on the basis of joint probabilities (i.e. the sum of the word hypotheses
and forth betw een deduction and activation cycles. D eduction s ta rts from the
slot in another DI (P R E D IC T IO N ), or
2. finding a filler in the word lattice for one of its case-slots (V E R IFY ), or
word hypothesis. W hen the best DI has a quality factor worse th a n th e best
word hypothesis, the activation cycle begins and a next highest-scoring word
The parse is com plete when a single DI w ith no unfilled com pulsory case
245
breadth first strategy is adopted at choice points w ith no m easure of ‘goodnesss’
available to guide th e choice. For exam ple, it is not clear from th e literatu ire
how indeterm inacies caused by lexical am biguities are resolved. One soluticon
m ight be to rank order knowledge sources having th e same header type amd
rious discontinuous constituents are less for Italian th a n for English becauise
w ithout difficulty.
J o llie s
Function words cause serious problems for all speech recognition system s. B e
cause m ost function words are both short and typically unstressed, they are
often not recognized at all. If th e function words are not recognized they a re
absent from the lattice. This can cause problems for parsers when they try
246
word it is. In general, the longer a word is, the easier it is to identify w ith con
fidence. The shorter a word is, the harder it is to recognize. So, for example,
In the SYNAPSIS system , words which are considered to have only a func
tional role are known by the charming nam e jollies. A robust speech parser
ought to be able to find them if they are present in th e lattice since some jollies
m ake a useful contribution to parsing. For example, if they are recognized they
help to ensure th a t the correct p ath through the lattice is tem porally coherent.
Not all jollies are short, and some m ay have good confidence scores associated
are su b stan tial reasons to consider it” (Giachin and R ullent 1988: 199). All
jollies are tre a ted as term inal slots in their KS. T here m ay b e syntactic or even
sem antics. Since they are assumed to have no sem antic predictive power, jollies
are not available for m anipulation by the stan d ard operators. Instead, a special
T here are three JO LLY -TY PES: SH O RT-O R-IN ESSEN TIAL, LONG-OR-
ESSEN TIA L, and UNKNOW N. The type of th e jolly slot is worked out during
247
th e lattice is not searched in order to find it. However, it is necessary to assign
a short tim e period to the slot, ju st in case a jolly is present. If this time
period were not inserted, the correct p ath through the lattice would not be
tem porally coherent. To allow for a range of durations the tim e period is given
fuzzy boundaries.
duration can be found in the lattice. This is done ju st in case a long jolly with
Monte has two slots, neither of which has yet been filled. T he SPEC slot
will eventually be filled w ith quale, while the JOLLY slot corresponding to da
‘m issing’. Notice th a t this does not necessarily m ean th a t it is absent from the
S ta tistics
use of around 150 KSs and has a 1011-word lexicon. No details of th e linguis
tic coverage are available, although the gram m ar is said to have a branching
factor of about 35. SYNAPSIS was tested on 150 lattices produced from nor
248
Type : TO-HAVE-SOURCE
Header: NASCE
Left :MOUNT Right :RIVER
(to be solved)
w ith missing jollies were analyzed correctly. This figure did not increase sig
nificantly as th e num ber of missing jollies per u tteran ce increased. T hus, the
current standards.
1 1 .3 .6 T h e p a r a lle l p a r se r
in the test set. (T he average sentence length was 7-8 words). This is clearly
too long for m ost practical purposes. In response to th e need for b e tte r speed
parser. However, each D PS only has access to a subset of KSs. Thus, each
249
Figure 11.7: a single parse tree
D istributing the knowledge base does not autom atically yield a speed-up. If
anything, the opposite could be expected since there is now a com m unications
trees in the DLT project, namely the need to represent dependency stru ctu re
independently of word order. Each node in this tree represents a single word.
Since a parse tree (1)1) can now be distributed am ongst several D PS, it is
one processor m ight know about KSs of type M OU N T while another knows
about KSs of type RIV ER. T he left and right branches of th e DI shown in
250
Figure 11.8: a distributed representation of th e sam e parse tree
Symbolics Lisp M achines com m unicating via E th ern et. T h e system has been
This sketch of the parallel version of SYNAPSIS has necessarily been brief.
1 1 .4 S u m m a ry
251
syntactic level and caseframes a t th e sem antic level. DG builds on th e n o
parse tim e, DG rules and caseframes are combined to form syntactico-sem anltic
of Lexicase, w ith its b a tte ry of syntactic, sem antic, and case features womld
T he special requirem ents of speech parsing have led to the developm ent of
a parallel version of the SYNAPSIS parser. This also m arks SYNAPSIS o>ut
of the order in which they construct parse trees, each individual parser is onily
capable of constructing a parse tree in one order for each gram m ar. If SYNAIP-
would m ost probably add branches to its parse tree in a different order eacch
serving to illustrate some novel ways in which DGs can be applied in tlhe
T h e m ain features of the serial SYNAPSIS parser are sum m arized in Tra
hie 11.1.
252
Table 11.1; main features of the SYNAPSIS dependency parser
253
C h ap ter 12
E lem en ts o f a ta x o n o m y o f
d ep en d en cy parsing
Let the teacher, or the m an of science who does not always fully
appreciate gram m ar, consider for a m om ent th e m ental processes
a boy is p u ttin g himself through when he parses a sentence, and
he will see th a t there is in intelligent and accurate parsing a tru e
discipline of the understanding. (Laurie 1893: 92)
In this chapter I exam ine a num ber of dependency parsing param eters to see
how they com pare w ith the corresponding param eters of PSG parsing. In so
dependency parsers. The param eters I shall investigate are those I have used
in sum m arizing the properties of each parser surveyed, nam ely origin o f search
(Section 12.1), m anner o f search (Section 12.2), order o f search (Section 12.3),
number o f passes (Section 12.4), focus o f search (Section 12.5), and ambigu
ity management (Section 12.6). In addition, I shall exam ine th e role of the
In PSG parsing, search can proceed bottom -up, top-dow n, or some m ixture of
the two. A t a coarse level, the sam e is tru e of dependency parsing. Table 12.1
records the origin (and direction) of search for each of th e parsers surveyed.
254
Table 12.1: origin of search—summary
D e p e n d e n c y P arser S e a r c h origin
Do th e fam iliar term s ‘botto m -u p ’ and ‘top-dow n’ have their usual m ean
A first answer m ust be ‘yes’. B ottom -up parsing starts from the words in a
string and uses a gram m ar to combine the words into constructions. In th e case
string.
the sam e for dependency and constituency parsers. W hereas in PSG there
‘to p ’ and the word instances a t the ‘b o tto m ’, in DG this is not the case. The
s ta rt sym bol of a DG is a word or, a t worst, a word class. T here are no nodes
interm ediate betw een the ‘to p ’ (start) node and the ‘b o tto m ’ (word) node
a ttach ed to it. T he num ber of nodes in a dependency tree m ay not exceed the
num ber of words in the sentence whose stru ctu re the tree describes. W hereas
255
In the three following subsections I shall examine these concepts in more
detail.
1 2 .1 .1 B o t t o m - u p d e p e n d e n c y p a r sin g
search bottom -up, i.e. eight out of twelve, with one unspecified. This probably
A bottom -up PSG parser a tte m p ts to take some contiguous group of words
and replace them by a single phrase; a bottom -up DG parser a tte m p ts to take
a group of words (in m any cases, exactly two words), and replace th em by
or m ore words while retaining a lexical head, there is a strict upper bound on
the num ber of reduction steps required (ignoring any requirem ents for back
uses are restricted so th a t, for a rule of th e form a —> ^5, where a and /? are
step 5.
256
(a) they m atch, in which case pop the m atching words off the stack and
push the word which m atched the head of the rule back onto the
5. If there is exactly one elem ent on the stack then succeed, otherwise fail.
s h i f t _ r e d u c e .p l in A ppendix A .3.
varieties.
PSG DG
V N V P [1] v(n,*,p)
N A N [2] n(a,*)
P P N [3] P(*,n)
[4] *(v)
PSG and DG analyses of the sentence Tall people sleep in long beds are shown
in Figure 12.1.
are not shown. T he num ber of each rule used to effect a reduction is given in
□ A
□ AN
□ N [2]
□ N V
□ N VP
□ N V P A
□ N V P A N
□N V P N [2]
□ N VP [3]
□ V [1]
257
A N V P A N
Tall people sleep in long beds
a n V p a n
Figure 12.1: PSG and DG analyses of the sentence Tall people sleep in long
beds
□ a
□ a n
□ n [2]
□ n V
□ n V p
□ n V p a
□ n V p a n
□ n V p n [2 ]
□ n V p [31
□ V [1]
In this case, the numbers of shift and reduce operations are identical for
PSG and DG systems. T he num ber of shift operations is fixed for all ver
sentence being parsed. T he sm allest num ber of reduce operations possible for
any sentence is also the same, in principle, for PSG and DG, nam ely 1. This
m axim um num ber of reduction operations is also the sam e for PSG parsing
and DG parsing, w ith one im p o rtan t exception which I shall describe shortly.
258
In PSG parsing, the num ber of reductions equals the num ber of phrases. T he
m axim um num ber of phrases for an arb itrary sentence is achieved w ith a bi
nary branching phrase stru ctu re tree. T he num ber of phrases in a binary
use the term primary dependencies to m ean dependencies which are found by
is n — 1.
This equivalence excludes those PSG s which allow unit rew rite rules, i.e.
a ^
in which b o th a and jd are single non-term inal symbols. In this case, the
m axim um num ber of reductions required is not bounded, given an arb itrary
sentence and an arb itrary gram m ar. I know of no version of DG which would
th e stack before it can effect a reduction. All of the other botto m -u p depen
dency parsers I have described estabhsh dependency links betw een heads and
dependency relations involving the same head. This results in th e increm ental
259
building of dependency structures. This process is centred on relations ra th e r
th a n constructions.
lowing way. In shift-reduce parsing, words (or word class labels or feature
structures) are put on the stack and gram m ar rules are used to license reduc
tions. In increm ental parsing, sentence words are used to pick out rules headed
by these words and these rules are then p u t on the stack. A slightly more com
(3 are arb itrary strings of dependent symbols, including the em pty string):
T he R ule o f R ed u ction
1. If a rule of the form X(o:,Y,*,yd) is th e top elem ent of a stack and the next
elem ent is a rule of th e form Y(*), then pop the top two stack elem ents
2. If a rule of the form X (*,o) is th e top elem ent of th e stack and the
next elem ent is a rule of the form Y(*,X,^d), th en pop the top two stack
elem ents and push a new elem ent of th e form Y(*,a,/d) onto th e stack.
If all words in the in put sentence have been read and th e only rule on th e stack
has the form X(*) and there is a rule of th e form *(X) in th e gram m ar then
A trace of the bottom -up increm ental parse of the sentence Tall people
sleep in long beds, using the sam e gram m ar as before, is presented below.
Stack items are separated by means of the ‘|’ m arker. T h e bracketed numbers
260
□ a(*) tall
□ a(*) I n(a,*) people
□ n(*) [1]
□ n(*) Iv(n,*,p) sleep
□ v(*,p) [1]
O v(*,p) Ip(*,n) in
□ v(*,n) [2]
□ v(*,n) Ia(*) long
□ v(*,n) Ia(*) In(a,*) beds
□ v(*,n) In(*) [1]
□ v(*) [2]
dependency parser which makes use of the Rule of R eduction can be found
PSG -style shift-reduce parsing (in which the DG form alism provides an upper
bound on th e num ber of reductions which the PSG form alism does not) and
1 2 .1 .2 T o p -d o w n d e p e n d e n c y p a r sin g
down parsers in Table 12.1. These each im plem ent top-dow n search in distinct
ways which I shall call deep top-down parsing, shallow top-down parsing, and
261
D eep top -d ow n parsing
The DLT ATN parser is an exam ple of a deep top-dow n dependency parser.
the words of the sentence. T he (simplified) m ain v e r b netw ork for English
could involve a num ber of jum p arcs to other networks which could themselves
contain jum p arcs, and so on. T hus, it is possible for the parser to build
quite a deep search tree on the basis of th e netw ork before th e first word is
ever exam ined. W hen th a t word is exam ined, it has th e function of either
A bstracting away from the detail of th e ATN im plem entation of this search
m ethod, I shall try to show how it m ight work given a m ore conventional
Gaifman type DG. F irst though, I shall reconsider non-ATN top-dow n PSG
symbol and expanding each successive left-m ost symbol un til a term inal is en
For exam ple, Figure 12.2 shows a PSG analysis of th e sentence A cat sleeps
symbol (S) and seeing how it could be expanded (S —> NP V P). It selects the
left-m ost sym bol (N P) and finds an expansion for it (N P —> D et N). Once
nal. (For ease of exposition I ignore th e distinction betw een words and word
classes.) Now, for the first time, it is possible to establish contact betw een the
hypothesized stru c tu re and the actu al words of the sentence. A n exam ination
esis may be extended w ith the expansion of th e next left-m ost symbol, and so
262
Det N V PP
NP
Det
on.
If this process is used directly w ith a DG, problems are encountered. The
s ta rt sym bol (v) corresponds to a term inal (sleeps), b u t this word is not located
E ither th e parser can look for a rule to expand for the s ta rt symbol, or it can
search for the s ta rt symbol in the sentence. In this section I shall explore the
explore th e other.
(84)
v(n, *, n)
T he left-m ost symbol can be selected. Like all symbols in a D G , this one m ust
identify a word. T hus, it is necessary to see w hether or not this m atches the
first word in the sentence. Since it does not, it is necessary to find a rule
(85)
263
T he cat sleeps on th e com puter
n(det, *)
Once again, the left-m ost symbol m ust be com pared w ith th e first sentence
word. This tim e a m atch is found. It is still necessary to check w hether or not
‘d e t’ may occur w ithout left side dependents, before going any further. If it
can (it can), then it is necessary to try to find th e next left-m ost dependent.
This involves selecting the left-m ost of d e t’s right side dependents (if it has
any) and then expanding leftward once again, testing each expansion against
their actual position in the sentence as words 3 and 2 is not know n until p ars
ing successfully completes). Since this parsing m ethod builds a stru ctu re of
a rb itra ry depth before it finds a sentence word, I call it deep top-down parsing.
nately it carries an overhead not found in top-dow n PSG parsers, namely the
necessity to check each left-m ost symbol against th e first sentence word after
every expansion.
defined.
264
It is possible for a deep top-dow n parser to enter a loop from which it can
not escape. T he following rules illu strate this, assum ing th a t a right to left
( 86 )
a p(*,n)
b n(*,p)
c n(*)
W hen searching for an ‘n ’, the first ‘n ’-headed rule the parser encounters tells it
this num ber. This test can be used to term inate fruitless searches, w hether
the root symbol). Assume th a t the s ta rt symbol is v and once again the
m atch. This word becomes the hypothesized sentence root. Now the gram m ar
is searched for a rule headed by v. If one is found (e.g. v(n, *, n)), th e left-m ost
This process continues recursively until th e first word is found and there is no
more left context to search. At this stage, th e most deeply em bedded right
context is selected and searched, once more from the left. W hen all words
in th a t right context are accounted for, control passes back to th e next m ost
265
deeply em bedded process which has a right context to search. In this way all
of th e words to the left of the root can be parsed. T h e same process can now
begin for the ro o t’s right context. Parsing succeeds when heads have been
found for all the words in the left and right contexts of th e root (and th e left
list. Initially, S is the root symbol. For descriptive sim plicity I assum e here
P R O C E D U R E divide-conquer(S,W )
IF S is in W
search the gram m ar for left side (L) and right side (R) dependents
for S ;
E L S E fail.
parsing works by recursively calling the sam e procedure w ith a sh o rter word
string to search in each call. For obvious reasons I call shallow top-dow n
266
rules allowing m ore th an one dependent on each side of th e head. A som ewhat
m ore complex algorithm is required to deal w ith this. It functions, when more
th e basic divide-conquer procedure, w ith each dependent and the p art of the
string still unaccounted for serving as inputs on each procedure call. T h e parse
succeeds if each dependent accounts for different p arts of th e string, and all of
see w hether or not it is th e word being sought. In the best case, th e word
being sought will always be a t the s ta rt of the string, so th e tim e taken to find
each word will be exactly k. T he tim e taken to parse an n-w ord sentence w ith
the w orst case, the word being sought will always be a t th e end of th e search
string. Thus, for an n word sentence, it will take k n to find th e sentence root.
T hus, divide and conquer parsing w ith an unam biguous gram m ar takes, a t
Now assum e th a t the gram m ar is ambiguous. In the worst case, any word
kn X k{n — 1) x k{n — 2) x - • • x 2k x k
gram m ar is, in th e worst case, proportional to n\. Presum ably this figure can
267
The divide and conquer variety of shallow top-down dependency parsing
has not previously been described in the literature, unless this is what Hays
outline sketch of his top-down parser and it is not clear if he ever implemented
it.
The attraction of the divide and conquer variety of parser lies not in the
serial version of the algorithm, but rather in the parallel version. W hat the
parser does is to take a string, divide it in two, decide what to search for in
each half, and then proceed to repeat this process for each half. Once a string
has been halved, search in one half can take place independently of search in
the other. It is necessary for both searches to succeed in order for the original
search to succeed, but otherwise there is no connection between the two. Thus,
every time a process divides a string it can activate two new processes, one for
each substring. The original process simply has to wait to receive the root of
the subtrees describing its left and right contexts, in which case it can succeed
and inform the process which created it. Alternatively, one of the processes
it spawned will fail to find what it was looking for, in which case the original
Consider the case of parallel divide and conquer parsing with an unambigu
cally. The best and worst case parsing times remain the same. In the best case
the word being searched for is always found first (On). In the worst case the
word being searched for is always at the end of the string (On^). However, the
processes as there are different readings for th a t word. Each process would then
be required to find all possible dependents for a word, given a dependency rule.
268
Thus, all possible dependency trees of depth one would be found concurrently.
cation, rather than in search. Much more work needs to be devoted to this
in its capacity to divide a string into two substrings, each with a separate
‘things to look out for’ list, appears to have no counterpart in PSG parsing.^
In describing their Lexicase parser, Starosta and Nomura make the following
claim:
sentence root. However, it does try to expand nodes which have been
designated a priori. For example, the first step of their algorithm reads:
(Starosta and Nomura 1986: 131). W hat is this if not an attem p t to build all
categories in a designated order, not necessarily starting with the root symbol.
1 2 .1 .3 M ix e d to p -d o w n an d b o tto m -u p d e p e n d e n c y
p a rsin g
The CSELT parser implements a mixed top-down and bottom -up strategy. It
begins by selecting a word in the sentence, not on the basis of some distin-
^Notice that this kind of parsing has got a lot of similarities to old-fashioned schoolroom
parsing: ‘first find the main verb, then find its subject and its objects, then.., ’
269
guished s ta rt symbol in the gram m ar b u t rath er, on th e basis of th e recognition
find a rule headed by this kind of word. W hen a rule is found, it is associated
w ith the word in the lattice. T he rule is used to search top-dow n for depen
dents for the word. W hen dependents are found, th e cycle repeats itself for
opposite search approaches can be interleaved very sim ply and efficiently.
m ethods illustrates is the proxim ity of ‘to p ’ and ‘b o tto m ’ in dependency struc
also a sym bol in the string. Here then is p o ten tial cause for confusion, even
here, too, is som ething which clearly distinguishes dependency parsers from
PSG parsers. It is this proxim ity of ‘to p ’ and b o tto m ’ which makes shallow
shallow top-dow n PSG parser, for example, one which uses an X or lexicalized
gram m ar to identify the sentence root and each of its subordinates. However,
top-dow n dependency parser may never loop indefinitely since every search
270
1. T he search p a th between the sta rt symbol in a PSG and th e string to be
allows dependency parsing search to a lte rn ate simply and usefully be
12.2 S ea rch m a n n er
first versus breadth-first) for PSG parsers and m anner of search for dependency
parsers. E ither a parser extends one search p ath as far as possible (depth-first)
the best-scoring option is selected a t each choice point. This too has an exact
ing problem spaces can also be employed in dependency parsers, e.g. beam
search which takes a m iddle line between depth-first and breadth-first search,
parsers surveyed.
271
Table 12.2; manner of search—summary
D e p e n d e n c y P arser S earch m a n n e r
There is a limit to the number of possible search orders for an n word sentence.
(By ‘search order’ I mean the order in which words are considered for inclusion
summarized in Table 12.3 — eight out of twelve parsers operate left to right,
with three search orders unspecified. None operates right to left, but I can see
no reason in principle why any of these parsers should not be able to search
in this way with equal success.
ally the order in which sentences are presented to the parser and it is not
necessary to wait until the last word has been typed or spoken before pars
ing can begin. There is particular interest in left to right parsing when the
parser not only considers the words in the order in which they appear in the
sentence, but also adds them to the developing syntactic structure in (more
or less) th a t order, thus allowing the sentence to be interpreted incremen
272
Table 12.3: order of search—summary
D e p e n d e n c y P a r s e r S e a r c h o r d e r
(e.g. M arslen-W ilson and Tyler 1980) and com putational linguists who w ant
(C urry and Feys 1958; T urner 1979). This variety of CG is known as Com bi
of com binators is th a t the order in which they apply is unim p o rtan t; th e re
sult is always th e same. This (along w ith th e rules of functional com position
and type raising) leads to the possibility of producing a stric t left to right
in every order. This leads to the so-called spurious ambiguity problem (also
tage of being able to in terp ret a sentence left to right increm entally is the
m ent of CCG parsers has been devoted tow ards trying to solve th e spurious
273
ambiguity problem (Hepple 1987). Different proposed solutions include:
checking overhead.
the grammar.
I mean the following: as soon as two words which bear a direct dependency
relation to each other become available in a sentence (i.e. as soon as the second
word is read), the words can be related and accordingly interpreted. Thus, a
take place.^
All of the surveyed DG parsers which operate left to right with a single
orders. The CSELT parser always selects the highest-scoring word to process
next, regardless of its position. The probabilistic DLT parser enters edges in a
graph and then tries to navigate through the graph. There is no necessity for
the edges to be entered in any specific order, and it is easy to imagine edges
tional PSGs, with most parsers opting for a left to right approach in practice.
^Except in the case of shift-reduce dependency parsers of the sort shown in
s h ift_ r e d u c e .p l in Appendix A.3.
274
S tarosta and N om ura suggest th a t the choice of search order should be guided
[The Lexicase parser] scans from left to right or vice versa, de
focus of the parser. A left to right parser in which heads seek dependents
would have to read up to the final word of a sentence in which all dependents
precede their heads before it could build any structure. A left to right parser in
which dependents seek heads would build alm ost all stru ctu re before reaching
the final word. A parser in which heads and dependents seek each other would
12 .4 N u m b e r o f p a sse s
T he num ber of passes m ade by parsers in the survey, by which is m eant the
num ber of times the read head of each parser scans a sentence during the parse,
Nine of the parsers make a single pass through the sentence. Recall th a t
confining the num ber of passes to one is a prerequisite for increm ental in ter
pretation.
T he parsers of Hellwig, Lindsey, and S taro sta and N om ura all require more
th a n one, and possibly very m any passes. S taro sta and N om ura’s parser is
expansion cycle and there m ay be m any such cycles. In general, increasing the
num ber of passes increases the inefficiency of a parser (since th e sam e symbols
275
Table 12.4: number of passes—summary
D e p e n d e n c y P a r s e r N UMBER 0 F passes
PSG parser, except where this interacts w ith certain search focus variables, as
12.5 S earch fo cu s
‘search focus’. A discussion of PSG parsers would not contain such a section
relations between words. Suppose th a t X and Y are two words; there are a
search’. T he parsers surveyed identify eight different foci of search. These are
276
Table 12.5: focus of search—summary
D e p e n d e n c y P arser S e a r c h fo cus
1 2 .5 .1 N e tw o r k n a v ig a tio n
N etw ork navigation parsers are of m arginal interest in this context since they
words of the sentence being parsed. T he only exam ple of a netw ork navigation
1 2 .5 .2 P a ir s e le c tio n
parsed and consulting a look-up table to find out w hether or not a pair of words
parser to be ‘pair selection’ and ‘agreem ent testin g ’. P air selection involved
277
4000 X 4000 m atrix to find out whether or not the words could be linked and,
if so, which was the head and which was the dependent.
The focus of search is thus a pair of words. As we shall see, all of the other
1 2 .5 .3 H e a d s se ek d e p e n d e n ts
Lexicase parsers of Starosta and Nomura, and Lindsay, my Divide and Conquer
parser) always search for dependents for the current word. In the course of
searching for a dependent (A) for the current word (B), the word which, in
reality, should be the current word’s head (C) may be tested to see if it can be
a dependent of the current word. The test will fail and search will move on to
consider another word. The inverse dependency relationship will not be tested
until word C becomes the current word, a t which point the original word B
bottom -up processing, as the surveyed systems illustrate. Starosta and No
bottom -up.
1 2 .5 .4 D e p e n d e n ts seek h ea d s
Hellwig’s parser illustrates the fact th a t a diam etrically opposite search focus
also works. In his parser, all search is directed towards finding a head for the
278
current word. Notice, however, th a t in his system words do not subcategorize
for their heads. Rather, it is necessary to go and look in the subcategorization
frames (slots) of other words in order to see if the current word can depend on
12 .5 .5 H ea d s seek d e p e n d e n ts or d e p e n d e n ts seek
h ea d s
and bottom -up processing, according to the current state of the parse and
the lattice. It also alternates between searching for dependents (VERIFY and
The exact progression from one search focus to the other can not be defined a
priori since this depends on the recognition confidence scores in the lattice.
12 .5 .6 H ea d s seek d e p e n d e n ts an d d e p e n d e n ts seek
h ead s
The DLT probabilistic parser works by searching an annotated corpus for every
occurrence of each word in the sentence. A record is made of all of the upward
ipate. These relations then serve as templates of relations into which each
sentence word could possibly enter. Some pairs of templates will be inverse
copies of each other, and these select each other during a process analogous to
unification. Thus, all words search for all of their heads and dependents, and
1 2 .5 .7 H ea d s seek d e p e n d e n ts th e n d e p e n d e n ts seek
h ea d s
The WG parsers w ritten by Hudson and myself begin by searching for depen
dents for the current word. Once all available (i.e. adjacent) dependents have
been found, the focus of search shifts, and a head is sought for the current
279
word. T he insight embodied in this strategy is th a t, under norm al circum
stances in a relatively fixed word order language like English, the head of a
word does not intervene between th a t word and its dependents whereas the
T he rationale for changing the focus of search for the current word is th a t
it allows the parser to construct as much stru ctu re involving the current word
fact, it makes it possible to build stru ctu re increm entally in a single linear
pass through the sentence. This is not possible w ith either of the strategies of
T he parsers which search for dependents only are H ays’ top-dow n parser,
th e Kielikone parser, the Lexicase parsers of S taro sta and N om ura, and Lind
say, and my Divide and Conquer parser. I have previously described Hays’
top-dow n parser and my Divide and Conquer parser as single pass parsers, but
this is slightly misleading since th e single pass tracks not from left to right,
b u t from root to leaves of the dependency tree. This point was also made
by Proudian and Pollard (1985) and quoted above. I have also described the
Kielikone parser as a single pass system b u t this too disguises some im portant
details. W henever a dependent can not be found for the current word, search
an o th er word becomes current. Thus, while words enter the parser one a t a
tim e from the left and there is never any a tte m p t to perform th e same op
eration on the sam e word more th an once, words do not become current in
strict linear order from left to right through th e sentence. T he same word
can becom e current for several non-consecutive periods of time. W ithout the
not be able to parse m ost sentences. B oth of th e Lexicase parsers m ake m any
passes through the sentence. T h e m otivations and effect are much th e sam e as
for th e Kielikone parser, although th e Kielikone parser achieves its goal w ith
280
much greater efficiency.
Only Hellwig’s parser searches for a head for the current word w ithout
will not work for a single pass parser. Hellwig’s parser makes m ultiple passes
through a sentence.
first for dependents and then for heads for each word, they are able to parse
in a single linear pass from the beginning to th e end of the sentence. Once
again. Thus, the strategy of seeking dependents and then seeking heads for
the current word facilitates weak increm ental processing in terp retatio n .
1 2 .5 .8 D e p e n d e n t s se e k h e a d s t h e n h e a d s s e e k d e p e n
d e n ts
for a head for the current word and then for its dependents. I have shown how
agrees w ith this analysis and now advocates searching for dependents before
1 2.6 A m b ig u ity m a n a g e m e n t
Table 12.6.
some new ones and m entions quite a few m ore in passing. Clearly, a significant
am ount of effort has been and is being directed tow ards extending w hat is
known about dependency parsing. However, very little of this effort has yet
parsing.
281
Table 12.6: ambiguity management—summary
D e p e n d e n c y P a r s e r A m b ig u it y m a n a g e m e n t
Some inform ation is available on am biguity m anagem ent for eleven of the
how m any possible analyses there are for th e sentence being parsed. T he DLT
ATN parser either finds an analysis or fails. It can not undo any incorrect
choices which m ay have led to a dead end in th e parsing of an otherw ise ac
ceptable sentence. H ays’ b o tto m -u p parser also delivers a t best a first parse,
one parse have the capability to deliver a larger num ber. In fact they m ay
build all or m ost of any possible alternative parse trees. T h e DLT probabilis
tic dependency parser selects th e parse which has th e best global score, which
respect of its global score, which is some function of the recognition confidence
282
term s of a ‘‘placeholder’ or ‘m aster e n try ’ which contains only the intersection of
form al difference betw een lexical and syntactic ambiguity.) As much stru c tu re
before it is split into some num ber of disjoint more specific structures. This
calls for m ultiple parser passes, bu t it is supposed to deliver all readings for
are available of this am biguity m anagem ent strategy in operation. I have been
quired, to generate all possible parses. B oth of Covington’s parsers m ake use
ing parser also uses backtracking to undo m istakes and to generate m ultiple
H udson’s parser builds all possible parse trees in parallel. Again, this works,
Hellwig’s dependency W EST parser has the only system for m anaging am
biguity in this survey which could form the basis of an efficient solution. W EST
283
parsing is known to be an effective way of avoiding duplication of effort in
finding all possible parses for some sentence. W FST s have trad itio n ally been
A.3. A parser based on the same principles of W FST usage to minimise search
which do not check a d a ta stru ctu re of interm ediate results before searching.
However, even greater efficiency can result if a table is used to record current
a W FST in which only com plete substrings (the result of chains of hypotheses)
W h at does a hypothesis look like in a stan d ard PSG ch art parser? Suppose
recorded in a chart.
(a) S .NP VP
(b) S N P .VP
(c) S ^ NP VP.
284
of th e dot in the right hand side of the rule to a position after N P indicates th a t
an N P has been found, thus offering p artial support for the hypothesis. The
position of th e dot a t the right extrem e of th e right hand side in (c) indicates
first word in a string is usually identified by th e edge which goes from node
node 0 to node 3. Following G azdar and Mellish (1989: 194ff), I shall represent
the extrem e right hand side of the rule hypothesis) can be placed in the chart
for every word class assignm ent allowed by th e gram m ar for th e words in the
sentence.
only one of these. Proceeding bottom -up, the following rule m ay b e applied
rule in the gram m ar of the form B —>A W2, add an edge < i,i,B —>.A
T he fact th a t the new edge begins and ends a t th e sam e node sim ply results
285
from the fact th a t no p art of it has yet been a ttested in th e string.
< i,k ,A —>W1 B .W 2> to the chart (G azdar and Mellish 1989: 195).
already been atte ste d and nothing to the right of the dot has yet been attested .
Thus the following sam ple dotted dependency rules are possible.
(a) verb(.noun,*,prep)
(b) verb (noun,.*,prep)
(c) verb(.noun,*,.prep)
(d) verb(.noun,*,prep.)
in which it occurs.)
Exam ple (a) hypothesizes a verb w ith a preceding nom inal dependent and
support. In exam ple (b), th e noun has been found, and in (c), th e head verb
has also been found. In exam ple (d), th e dot is a t th e extrem e right hand
side of th e body of the rule, thus indicating th a t the whole stru c tu re has been
286
a tte ste d and the edge is now inactive.
T he bottom -up and fundam ental rules of PSG chart parsing can also be
ery rule in the gram m ar of the form B(A ,W 2), add an edge
< i,i,B (.A ,W 2)> to the chart. A and B are categories and W l
edges < i,j,A (W l,.B , W 2)> and < j,k ,B (W 3 .)> , where A and B
words, then add edge < i,k ,A (W l,B ,.W 2 )> to th e chart.
If these rules of dependency chart parsing are applied, all possible depen
dency structures (and sub-structures) for an input string can be produced effi
dency chart parser. Careful com parison of this file w ith g a z d a r _ m e llis h .p l
will reveal th a t th e two are virtually identical in m ost respects, and p artic
noting is th a t dependency gram m ar rules of the form X(*) have no direct PSG
correlates. T hey can not be used as the basis for hypotheses — equivalent
differ from unit rew rite PSG rules which do generate hypotheses. However,
this difference does not interfere w ith the basic control stru c tu re of th e pars
ing algorithm .
287
12.7 A d jacen cy as a con strain t on search
clearly illustrated in the case of parsers like my Bonding parser which only
needs to look at the top of a stack. This constraint is also built into most
PSG parsers, since phrases are typically contiguous. At the opposite extreme,
Systems like Kielikone and Hudson’s parser operate within the constraints
link between it and its actual head. This requires the principles of dependency
depend on the same head by means of more than one dependency relation (i.e.
the moved word must be related by the dummy relation to one head and by
ber of constraints explicit which are usually implicit in PSG. In this way, it
allows the grammar writer and the parser designer to consider each constraint
independently and to experiment with different versions of the constraints.
For example, Hudson found the adjacency constraint to be too tight for his
purposes so he revised it. He is not alone; almost all DG theories and a num
Tinkering with the basic constraints of PSG in this way is almost unheard of
(although when someone does this it tends to revolutionize the way linguists
288
imizing the number of phenomena which can be covered. The strict adjacency
constraint built into many of the parsers surveyed is too strict to allow for the
parsing of variable word order languages. However, even variable word order
Hellwig has taken an interesting step in exploring one way in which well-
This involves increasing the search space during parsing so th a t the top stack
element is not the only one to be examined. However, search in his system is
the top stack element in a first parsing cycle, and then searches the next-to-top
element in the next parsing cycle. Thus, the claim implicit in the design of the
not be separated from its head by more th an one subtree. In this way, head-
dependent pairs which are not adjacent in the standard sense can be found, so
the systems assumed, all results will be uncertain at best and useless at worst.
Regrettably, strict formal definition has been the exception rath er than the
12.8 S u m m ary
dency parsing and to draw out some principles and techniques. Variation was
289
found in search origin, search manner, search order, number of passes, search
search. Substantial similarities with standard PSG parsing were found. The
main differences concern search origin, search focus, and the use of an adja
cency constraint.
DG trees can be seen as a special case of PSG trees in which every node di
rectly dominates exactly one term inal symbol. One consequence of this is th at
as ‘top-down’ and ‘bottom -up’, can not be borrowed into dependency parsing
for the purposes of dependency parsing, and have added some new distinc
tions, such as the distinction between ‘deep’ top-down parsing and ‘shallow’
top-down parsing.
For a given word, the object of search may be to find a head for the word, or
search foci, although others may be possible. The issue of what to search for
of search focus can determine a number of design features, and may even
discovering valid analyses. I have shown how different parsers embody different
attem pts to balance the requirements of constrained search within the context
management of ambiguity warrants special mention here, since very few depen
dency parsing systems take this problem seriously. The special requirements
290
of a t least some extended versions of DG m ean th a t, for them , existing tools
for the m anagem ent of am biguity in constituency parsers are likely to be in
appropriate.
291
C hapter 13
C onclusion
A t the beginning of this thesis I set out the formal properties of DCs, as defined
the CFPSGs, namely the class in which every phrase contains exactly one
differences between the gram m atical systems, then, are not significant either in
terms of their formal power or their adequacy for describing natural language.
linguists who use DG have added extensions to the basic formalism, thereby
creating new kinds of system of uncertain formal power. In this thesis I have
paratively recently (within the last decade or so) th a t most phrase structure
grammarians have come to assume th at every phrase does, indeed, have a
head. Thus, head-driven parsing using PSGs has emerged as a live research
292
topic even more recently. The principal difference between DG and PSG is
that DG rules necessarily identify the head of each construction, whereas PSG
rules only identify the head of a phrase if some additional constraint is supplied
(as in the case of versions of X grammar). Head-marking is intrinsic to DG,
but extrinsic to PSG as originally defined. One would therefore expect to find
a much longer record of work on head-driven parsing in the field of dependency
parsing.
ing systems does not satisfy these expectations. There has been very little
about parsing with head-marked rules. Some parsers, (for example, the DLT
ATN and DOG systems) make no special use of heads a t all. On the other
hand, there have been hardly any visible attem pts to relate developments in
row an existing PSG parsing technique while attem pting to make use of the
headedness of DG rules.
dency parsers constructed so far operate bottom -up incrementally. The basic
explore the space within which dependency parsing algorithms are located.
every count — the same as th a t occupied by PSG parsing algorithms. It has not
been necessary to introduce completely new terms to describe w hat is going on
293
in dependency parsing algorithms; existing terms will suffice. However, some
minor divergences have come to light as, for example, in the case of top-down
parsing which I have subcategorized into deep and shallow variants. Whereas
deep top-down parsing can be implemented either in a dependency framework
head-marking frameworks.
of properties have been much more thoroughly explored than others. In this
reported in the the PSG parsing literature, are not represented in this survey.
I have attem pted to make good some of these deficits by describing what
help point up the similarities and differences between dependency parsing and
H y p o th e sis 1
I have offered at least two existence proofs of this hypothesis in the text. In
applied in PSG parsing could be taken over into dependency parsing. The
PSG and DG versions of the algorithm differ only trivially in the way in which
parses are followed through for the same sentence with equivalent grammars,
the operation of the parsers is identical, shift for shift, reduction for reduction.
294
and Mellish’s existing implementation of a PSG bottom -up chart parser and
showed how, with only the most modest of changes to the code, and none at all
This should not be surprising, since this is a very weak hypothesis. It is well
H ypothesis 2
It is possible to construct a fully functional dependency parser using
rithm. This works on the principle th a t top-down parsing need never hypothe
head, which can always be identified in the rule. This is not the case in an
sibilities it raises for dividing up the parse problem and solving (conquering)
which make use of the notion ‘head’ and those which do not. While most of the
standard PSG parsing algorithms are not head-driven, a small num ber (which
means all dependency parsers make significant use of information about heads.
295
The overwhelming weight of opinion in linguistic theory supports the mark
ing of heads in phrases, but remarkably little progress has yet been won by
turns out to be generally disappointing, not least because the systems which
have been developed have never been systematically related to any other (more
m ainstream) parsing results. I offer this thesis as a first step towards the inte
296
A p p e n d ix
P r o lo g L istin g s
A . l In tr o d u c tio n
encode the algorithm s described in th e m ain tex t. This sub-set is entirely con
sistent w ith stan d ard ‘E dinburgh’ sy n tax (Clocksin and M ellish 1987). How
ever, a sm all num ber of non-standard predicates has been utilised to set up
the environm ent in which the m ain algorithm s are located. T h e m ost com
nam e of a Q uintus library file. T he only such file to be loaded is files which
lar library predicate used in the program s listed here is f ile_exists/l which
form:
dynamic Predicate/N.
297
Predicate is the name of the dynamic predicate; N is its arity. Both
Predicate and N must be instantiated. Each dynamic predicate declaration
may simply be commented out for use with Prologs which do not require such
declarations.
The listings set out below present a diverse range of recognition and pars
ing algorithms which are united in their use of dependency grammars, but
divided in the ways in which they manipulate their d ata structures, includ
methodology has been used for those algorithms which make use of Gaifman-
style dependency grammars (see Chapter 2 for details). The grammar writer
appropriate (i.e. efficient) for each algorithm. The compilation process only
code for the Gaifman dependency grammar rule compiler is listed in the file
dg_com pile.pl.
Section A.2 indexes each predicate which appears in the listing according
to the file in which it is defined. The files themselves are given in alphabetical
298
A .12 I n d e x o f p r e d ic a te s
P r e d ic a te F ile
addjspans Jncluding_trees/3 hays.parser.pl
allow ed_char/l dg_compile.pl
alpha_num eric/1 dg.com pile.pl
a p p e n d /3 lib.pl
assert Jf_new /1 lib.pl
begin_new J in e / 0 m ap.to_dcg.pl
build _ c a tJist/2 hays.generator.pl
co n cat/3 lib.pl
conquer/5 divide.pl
construct .assignm ents /0 m ap.to.dcg.pl
construct.assignm ents/2 m ap.to_dcg.pl
construct_call/ 0 map_to_dcg.pl
construct_em bedded_call/0 map_to_dcg.pl
construct j:ules / 0 map_to_dcg.pl
cross .p ro d u c t/3 lib.pl
dep_w rite/3 map_to_dcg.pl
dcg.generate/0 dcg.pl
dcg .p arse/0 dcg.pl
dg_com pile/l dg_compile.pl
dg_compile Jo o p / 2 dg_compile.pl
divide/4 divide.pl
di vide.conquer / 0 divide.pl
div id e.co n q u er/1 divide.pl
dot lib.pl
dru le/3 dg_compile.pl
each jn e m b e r/2 lib.pl
eac h .tre e / 4 hays.generator.pl
em bedded .s ta g e .t w o /3 hays.generator.pl
em bedded .x .p ro d u c t/3 lib.pl
e n u m e ra te /0 hays .generator, pi
e n u m e ra te /1 hays .generator, pi
enum erate Jo o p / 0 hays.generator.pl
e n u m erate .su rfa c e/1 hays.generator.pl
e x tra ct.a n y jsu b jstring_w ith.trees/4 hays_parser.pl
ex tract jsubjstring-and_trees/5 hays_parser.pl
ff_druIe/3 dg_compile.pl
flush.com m ent / 3 dg_compile.pl
299
flushline/2 dg_compile.pl
generate_one_root /1 dcg.pl
generate_tree/l hays_generator.pl
g e t^ lL ch a rs/1 dg_compile.pl
get_alLchars2/3 dg_compile.pl
gram m ar.present / 2 dg_compile.pl
g ro u p /4 dg_compile.pl
in_word/2 lib.pl
incorporate/2 dg_compile.pl
in it/1 divide.pl
initialize_parse_table / 2 hays_parser.pl
know n_tree/l hays_generator.pl
lower_case/l dg_compile.pl
map_to_dcg/2 map_to_dcg.pl
m u ltiJin e /1 dg_compile.pl
note_grammar_present / 2 dg_compile.pl
n u m e ric /1 dg_compile.pl
padding_char/l dg_compile.pl
parse Jncreasing_substrings /1 hays_parser.pl
p rin t_ se t/l dcg.pl
purge_grammar_rules / 0 lib.pl
re a d Jn /1 lib.pl
read w o rd /3 lib.pl
restsen t/3 lib.pl
return_adm issible_trees/2 hays_parser.pl
reverse/2 lib.pl
reverse/3 lib.pl
rfF_drule/3 dg_compile.pl
root/1 dg_compile.pl
s a tu ra te /2 dg_compile.pl
sentence Je n g th /1 hays_parser.pl
s e p a ra to r/1 dg_compile.pl
show_complete_tree/0 hays_parser.pl
spans/3 hays_parser.pl
speciaLchar/1 dg_compile.pl
sr_recognize/0 shift_reduce.pl
sr_recognize/1 shift_reduce.pl
sr_recognizeJoop/2 shift_reduce.pl
sr_reduce/2 shift_reduce.pl
stage_one/l hays .generator, pi
stage_two/2 hays_generator.pl
surface/2 hays_generator.pl
300
tab u lar _parse/0 hays_parser.pl
tokenize/1 dg_compile.pl
upper_case/l dg_compile.pl
w h ittle/5 divide.pl
word_class/2 dg_compile.pl
word_classify/2 divide.pl
w ord.exs / 3 map_to_dcg.pl
w rite_sentenceJist /1 lib.pl
w riteln/1 lib.pl
301
A .3 C o d e listin g s
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% FILENAME: dcg.pl
%
% WRITTEN BY: Noriticin M. Fraser
%
% DESCRIPTION: A definite clause grammar incorporating some
% notions from dependency grammar. For more
% information on definite clause grammars see
% Pereira and Warren (1980).
%
% VERSION HISTORY: 1.0 November 28, 1992
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%<
%
% LOAD DECLARATIONS
:- ensure_loaded(lib).
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%:
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * «
*
* dcg_parse/0.
*
* I ?- dcg_parse.
* I: the big mouse chased the timid cat.
*
/***************************************************************************
*
* d cg _ g en er a te/0 .
302
*
* Generate all strings (and associated syntactic parse trees) defined
* by the DCG.
*/
dcg_generate
setof ( R o o t ,root(Root),Set),
generat@_one_root(Set),
nl.
/***************************************************************************
*
* generate_one_root(+RootList).
*
* Generate all possible strings for a given sentence root.
*/
generate_one_root([]).
generate_one_root( [First IRest] ) :-
Rule =.. [First,T r e e ] ,
setof( [String,Tree],phrase(Rule.String).Set).
print_set(Set).
generate_one_root(Rest).
/***************************************************************************
*
* print_set(+ResultList).
*
* Print out a list of String/Tree generation result pairs, one
* pair at a time.
*/
print_set( [] )
nl.
print_set([[String.Tree]I Rest])
writeln(['String : ’.String]).
writeln([’Tree: ’.Tree]),
nl,
print_set(Rest).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% THE GRAMMAR
% A very simple definite clause grammeir to illustrate how to
% build dependency trees using DCGs.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
303
noun(X) — > det(Det), [Head],
{ class(Head,noun),
X = noun(Det,*) >.
noun(X) — > det(Det), adj(Adj), [Head],
{ class(Head,noun),
X = noun(Det,A d j ,*) }.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% VALID SENTENCE ROOTS
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
root(i_verb),
root(t_verb).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% WORD CLASS ASSIGNMENTS
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
class(big,adj).
class(fierce,adj).
class(timid,adj).
class(a,det).
class(the,det).
class(cat,n o u n ) .
class(dog,noun).
class(mouse,noun).
class(snored,i_verb).
class(ran,i_verb).
class(chased,t_verb),
class(likes,t_verb).
304
a%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
r.
y, FILENAME: dg_compile.pl
r.
K WRITTEN BY: Norman M. Fraser
%
y, DESCRIPTION: Compile a standard Gaifman format dependency
% grammar into several different forms, namely :
% Gaifman Prolog form, full form, and reversed
y, full form.
I
% VERSION HISTORY: 1.0 August 12, 1992
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% LOAD DECLARATIONS
% library(files) is a Quintus Prolog library. To run with other
'/, prologs replace call to file_exists/l in dg_compile/2 with the
% local equivalent.
%
:- ensure_loaded(library(files)).
:- ensure_loaded(lib),
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% DYNAMIC PREDICATE DECLARATIONS
:- dynamic multi_line/l.
:- dynamic root/1.
:- dynamic word_class/2.
:- dyneunic drule/3.
:- dynamic ff_drule/3.
:- dynamic rff_drule/3.
:- dynaunic grammar_present/2.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
/***************************************************************************
*
* dg_compile(+File).
* dg_compile(+Compilation,+File).
*
* Compile a Gaifman dependency grsimmar into a variety of
* Prolog-readable forms. Three compilations are supplied.
*
* Gaifman dependency grammars allow rules of the following three
* varieties:
*
* (i) *(X)
* (ii) X(*)
* (iii) X(Yl,Y2,...,Yi,*,Yj...,Yn-l,Yn)
*
* GAIFMAN PROLOG FORM
* Gaifmsui Prolog Form (GPF) is the simplest Prolog implementation of
305
* Gaifman's rule system, therefore it may be regarded as the
* canonical implementation. A grammeir in standard Gaifman form can be
* compiled into GPF as follows:
*
* (1) Replace every rule of type 1 with a GPF rule of type ’r o o t ( X ) . ’
* (2) Replace every rule of type 2 with a GPF rule of type
* ’drule(X, [],[]).'
* (3) Replace every rule of type 3 with a GPF rule of type
* ’drule(X,A,B).’ where A is a Prolog list consisting of Yl-Yi in
* the same order as they appear in the original rule, eind B is a
* Prolog list consisting of Yj-Yn in the same order as they appear
* in the original rule. If nothing precedes in the original rule,
* then A = [] ; if nothing follows in the original rule then B = [].
*
* To compile a Gaifman grammar contained in a file called 'grammar1' into
* GPF, use:
*
* dg_compile(gpf,grammar1).
*
* Since GPF is the default compilation, the same result may be achieved
* using:
*
* dg_compile(grammar1).
*
* FULL FORM
* Full form dependency rules are produced using the following mapping :
*
* (1) Replace every rule of type 1 with a full form rule of type 'root(X).'
* (2) Replace every rule of type 2 with a full form rule of type
* ’ff_drule(X,[X]).'
* (3) Replace every rule of type 3 with a full form rule of type
* 'ff_drule(X,A).’ where A is the Prolog list consisting of the
* concatenation of Yl-Yi, X, and Yj-Yn in that order.
*
*
* To compile a Gaifman grammar contained in a file called 'grammar1' into
* full form, use:
*
* dg_compile(ff,grammar1).
*
* REVERSED FULL FORM
* Rerersed ull form dependency rules are produced using the following
* mapping:
*
* (1) Replace every rule of type 1 with a full form rule of type 'root(X).'
* (2) Replace every rule of type 2 with a full form rule of type
* 'rff_drule(X,[X]).'
* (3) Replace every rule of type 3 with a full form rule of type
* 'ff_drule(X,A).' If A is a Prolog list consisting of the
* concatenation of Yl-Yi, X, and Yj-Yn in that order, then A1
* is the mirror image of that list.
*
* To compile a Gaifman grammar contained in a file called 'grammar1' into
* reversed full form, use:
*
306
* dg_compile(rff,grammarl).
*
* To compile the same source file into all three formats at the same
* time use:
*
* dg_compile(all.greimmarl).
*
* The output of dg_compile/l eind dg_compile/2 is written directly to
* the Prolog internal database (user).
*/
dg_compile(File)
dg_compile(gpf,F i l e ) .
dg_compile(Compilâtion,File) :-
(
file_exists(File)
I
writeln([’Unknown file: ’.File]),
abort
).
writeln([’Compiling ’.File,’ into ’.Compilation,’ format.’]),
see(File),
retract 2uLl(multi_line(_) ),
assert(multi_line(off)),
tokenize(FirstRule),
dg_compile_loop(Compilation,FirstRule),
told,
note_grammar_present(Compilation,File),
close_all_streams,
writeln(’Grammar compilation completed.’).
dg_compile_loop(Compilation,eof([])).
dg_compile_loop(Compilation,eof(Rule)) :-
dot,
phrase(valid_rule(X),Rule),
incorporate(Compilâtion,X).
dg_compile_loop(Compilation,[]) :-
tokenize(Rule),
dg_compile_loop(Compilâtion,Rule).
dg_compile_loop(Compilâtio n ,FirstRule) :-
dot,
phrase(valid_rule(X).FirstRule),
incorporate(Compilâtion,X),
tokenize(NextRule),
dg_compile_loop(Compilât i o n ,NextRule).
incorporâte(all,dependency_rule(Head,Before,After)) :-
assertz(drule(Head,Before,After)),
append(Before,[Head IAfter].Phrase),
assertz(ff_drule(Head,Phrase)),
reverse(Phrase,RevPhrase),
assertz(ff_drule(Head,RevPhrase)).
incorporâte(gpf,dependency_rule(Head.Before,After)) :-
assertz(drule(Head,Before,After)).
307
incorporâte(gpf_sat,dependency_rule(Head,Before,After))
saturate(Before,BefOrel),
saturate(After,After1),
assertz(gpf_sat_drule(Head,BefOrel,Afterl)).
incorporâte(ff,dependency_rule(Head,Before,After)) :-
append(Before,[Head 1 After],Phra s e ) ,
assertz(ff_dmle(Head,Phrase)).
incorporâte(ff_sat,dependency_rnle(Head,Before,After))
saturate(Before,Beforel),
satnrate(After,After1),
Headl =.. [Head,*],
append(BefOrel, [Head IAfter1],Phrase),
assertz(ff_sat_dnile(Headl,Phrase)),
incorporâte(rff,dependency_rnle(Head,Before,After))
append(Before,[HeadlAfter],Phrase),
revers e (Phrase,RevPhrase),
assertz(f f _ d m l e ( H e a d ,RevPhrase) ).
incorporât e (rf f _ s a t ,d e p e n d e n c y _ m l e (Head,Before,After) )
saturate(Before,BefOrel),
saturate(After,Afterl),
Headl =.. [Head,*],
append(Before1 , [Head IAfterl],Phrase),
reverse(Phrase,RevPhrase),
assertz(rff_sat_drule(Headl,RevPhrase)).
incorporâte(_,sentence_root(Root))
assertz(root(Root)).
incorporât e(_,class_ass ign(_,[])).
incorporate(_,class_assign(Class,[FirstWordjRest]))
assertz(word_class(FirstWord,Class)),
incorporate(_,class_assign(Class,Rest)).
saturate( [], □ ) .
saturate([First IRes t ] ,[New IResult]) :-
New =.. [First,*],
saturate(Rest,Result).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% TOKENIZE A DEPENDENCY GRAMMAR
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
/***************************************************************************
*
* tokenize(-ListOfTokens)
*
308
* Produce a list of tokens for the current line in the standard input,
♦/
tokenize(Result)
get_all_chars(ListOfChars),
group (ListOf Chars, [] , [] ,Result).
/***************************************************************************
*
* get_all_chars(+Filename,-ListOfChars)
*
* Construct a list of all legitimate characters on the current line
* (in reverse order).
*/
get_all_chars(AllChars)
getO( C ) ,
get_all_chars2(C,[],AllChars).
get_all_chars2(C,Result,eof(Result))
end_of_file(C).
get_all_chars2(C,Result,Result) :-
multi_line(off),
newline(C).
get_all_chars2(C,Current,Result)
comment(C),
flushline(C,Cl),
get_all_chars2(Cl.Current,Result).
get_all_chars2(ThisChar,[LastChar ICurrent].Result)
asterisk(ThisChar).
oblique(LastChar).
flush_comment(120.120.C1).
get_all_chars2(Cl.Current.Result).
get_all_chars2(C.Current.Result) :-
close_curly(C).
retractall(multi_line(_)).
asserta(multi_line(off)).
getO(Cl),
get_all_cheurs2(Cl. [CI Current] .Result),
get_all_chars2(C .Current.Result) :-
open_curly(C).
retractall(multi_line(_)).
asserta(multi_line(on)).
getO(Cl).
get_all_chars2(Cl.[CI Current].Result),
get_all_chars2(C.Current.Result) :-
allowed_char(C).
getO(Cl).
get_all_chars2(Cl.[CI Current].Result).
get_all_chars2(C.Current.Result) :-
write(M Illegal character ignored: ’),
p u t(C) .
w r i t e C (ASCII '),
write(C).
write( ’) ’),
nl,
309
getO(Cl),
get_all_chars2(Cl.Current.Result).
/* ***************************************************************** *(**:***+**
*
* group(+Inlist.?Current_Word.+Current_List.-Result )
*
* Tokenize a list of character codes,
*/
group(eof(Anything).O n e .T w o .eof(Result))
group(Anything.O n e .T w o .Result).
group( [] . [] .Result .Result).
group( [] .Current_List .So_Far. [Current_Atom| So_Far]I )
name(Current_Atom.Current_List).
group( [HIT].n.So_Far.Result)
special_char(H).
n a m e (Current_Atom.[H] ).
group(T.[].[Current_AtomlSo_Far] .Result).
group( [HiT] .[] .So_Far.Result)
padding_char(H).
group(T. □ .So_Far.Result).
g roup( [HIT].Current_List.So_Far.Result) :-
alpha_numeric(H).
group(T.[HiCurrent_List].So_Far.Result).
group([HI t ] .Current_List.So_Far.Result) :-
separator(H).
name(Current_Atom,Current_List),
g roup( [HIT]. □ .[Current_Atom|So_Far].Result).
/***********************************************************************
*
* Character manipulation utilities and definitions
*
***********************************************************************/
/***************************************************************************
*
* flushline/0.
*
* Flush the input buffer to the next end of l i n e .
*/
flushline(C.C)
end_of_file(C),
flushline(C.Cl)
newline(C).
getO(Cl).
flushline(_.C)
getO(Cl).
flushline(Cl.C).
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ] |c * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
310
* :llush_comineiit(+CurrentChar,+PreviousChar,+ReturnChar).
*
* IFlush the input buffer to the end of the next multiline comment,
*/
flush_comment(C,_,C)
e nd _ o f _ f i l e ( C ) .
flush_comment( C , C l ,C 2 )
o bliqu e ( C ) ,
asteri s k ( C l ),
getO(C2).
flush_comment(Cl,_,C3)
getO(C2),
flush_comment < C 2 ,C l ,C3) .
allo)wed_char(C)
padding,_char(C).
allO)wed_char(C) :-
alpha_numericCC)
allowed_char(C)
special _c h a r ( C ) .
separator(C)
paddin g _ c h a r ( C ).
separator(C)
special _ c h a r ( C ) .
padding_chax(C)
space(C).
padding_char(C) :-
tab_char(C).
padding_char(C) :-
c omma(C).
padding_char(C)
period ( C ) .
padding_char(C) :-
newline(C).
alpha_numeric(C)
lower_case(C) .
alpha_numeric(C)
upper_case(C) .
alpha_numeric(C)
underscore(C).
alpha_numeric(C)
numeric(C).
lower-case(C)
C >= 97,
C =< 122.
^PP9r_case(C)
C >= 65,
C =< 90.
311
numeric(C)
C >= 48,
C =< 57.
special_char(C)
open_bracket(C).
special_char(C)
close_bracket(C)
special_char(C) :-
colon(C).
special_char(C)
asterisk(C).
special_char(C)
open_curly(C).
special_char(C)
close_curly(C).
special_char(C)
oblique(C).
end_of_lile(-l), % EOF
tab_char(9). % tab
newline(lO), % nl
space(32). % ' »
comment(37). % %
open_bracket(40). % (
close_bracket(41). % )
asterisk(42), % *
comma(44), % ,
dash(45). % -
period(46). % .
oblique(47), % /
colon(58). % :
underscore(95). % _
open_curly(123). % {
close_curly(125). % >
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% A DEFINITE CLAUSE GRAMMAR FOR DEPENDENCY GRAMMAR RULES
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% VALID RULE TYPES
%
valid_rule(X) — >
dependency_rule(X).
valid_rule(X) — >
class_assignment(X).
valid_rule(X) — >
root_declaration(X).
312
% WORD CLASS ASSIGNMENT RULES
%
class_assignment(X) — >
[A], colon_string(B), set_of_words(C),
{atom(A),
X = class_assign(A,C)}.
colon_string(X) — >
set_of_words(X) — >
open_set(A), word_list(X), close_set(C).
open_set(X) — >
['{'].
close_set(X) — >
[»>’].
word_list(X) — >
[A],
{atom(A),
X = [A]>.
word_list(X) — >
[A], word_list(B),
•Catom(A),
X = [A|B]>.
%
% SENTENCE ROOT RULES
%
root_declaration(X) — >
asterisk_string(A), open_brkt_string(B), [C], close_brkt_string(D),
{atora(C),
X = sentence_root(C)>.
asterisk_string(X) — >
['*'] .
open_brkt_string(X) — >
['('].
close_brkt_string(X) — >
[ ’) ’].
%
% DEPENDENCY RULES
%
dependency_rule(X) — >
[A], open_brkt_string(B), asterisk_string(C), close_brkt_string(D),
{atom(A),
X = dependency_rule(A, □ , [ ] )}.
dependency_rnle(X ) — >
[A], open_brkt_string(B), word_list (C), asterisk_string(D),
close_brkt_st:ring(E),
313
■Catom(A),
X = dependency_rule(A,C , [])}.
dependency_rule(X) — >
[A], open_brkt_string(B), asterisk_string(C), word_list(D),
close_brkt_string(E),
■Catom(A),
X = dependency_rule(A, □ ,D)>.
dependency_rule(X) — >
[A], open_brkt_string(B), word_list(C), asterisk_string(D),
word_list(E), close_brkt_string(F),
■Catom(A),
X = dependency_rule(A,C,E)}.
314
'/%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FILENAME: divide_conquer.pl
/**************************************************************************
*
* divide_conquer/l.
* divide_conquer/0.
*
* Parse a string. Version with filenzime argument loads a Gaifman Prolog
* Form grammar. The parser is based on the 'divide and conquer' algorithm.
* The basic idea is to use the head of a rule to split the string to be
* parsed in two and then to recurse down each half in turn.
*/
divide_conquer(File) :-
(
file.exists(File),
purge_grammax_rules,
dg_compile(File)
I
writeln(['ERROR ! Non-existent grammar file : ',File,'.']),
abort
),
divide.conquer.
divide.conquer :-
write('Type the sentence to be parsed (end with a full stop)'),
nl,
w rite ( ': '),
read_in(Sentence),
word.classify (Sentence, Class.List),
init(Class.List).
divide.conquer :-
writeln(['*** PARSER FAILED ♦*♦»]).
315
/*********************************%*#**************************************
*
* word_classily(+Classless,-Classifi€ed),
*
* Take a list of unclassified words eand return a list of word classes,
* basing assignments on the current (grammar.
*/
word_classify([.], □ ).
word_classif y( [Word IRest_Words] , [Clasîs IClass_List] ) :-
word_classify(Rest_Words, (Class_List),
word_class(Word,Class).
/***************************************************************************
*
* init(+String).
*
* Begin the parse.
*/
init(List)
root(Start),
drule(Start, Left_Deps, Rightc_Deps),
divide (List,L e f t ,R i g h t ,Start )),
conquer (St art. Left, Left _Deps,, [] ,Report 1),
conquer (Start,R i g h t ,Right _Depps, [] ,Report 2) ,
writeln([’Root = ’,Start]),
writeln([’Leftside = ’,Reporttl] ) ,
writeln([’Rightside = ’,Reporrt2] ),
/***************************************************************************
*
* divide (+String,-Lef t P a r t ,-RightParrt,+Head).
*
* Find Head in String emd return thee substring to its left as LeftPart
* and the substring to its right as Right Part.
*/
divide([] ,_,_,_).
divide([H|T],[],T,H).
divide([HIT],[HiLeft],Right,Root)
divide(T,L e f t ,R i g h t ,Root).
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * % * s t * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
*
* conquer(+Head,+String,+Dependents;,-Remainder.of.Substring,-Report).
*
* Find trees rooted in each of the [Dependents in String. These will
* each depend on Head. Return a n y oif String not accounted for as
* Remainder. Report what has been fdound.
*/
conquer (_,[],[],[],[]). % SUCCEED : all satisfied
conquer (_,[],[_ I_] ,_,_) :- % FAIL: deps but no words
!,
fail.
:316
conquer(Head,[Dep],[Dep],[],[(Dep.HHead)])
druleCDep, [],[]). % SUCCEED: only dep matches only word
conquer(Head,String,[Dep].Remainderr, [(Dep,Head)I Reports]) :-
drule(Dep,Left_Deps,Right_DDeps) ,*/, ONE DEP: divide aind conquer
divide(String,Left,Right,Deep),
conquer (Dep, Left, Lef t_Depps, □ , Report 1) ,
conquer(Dep, Right, Right_DDeps,Remainder,Report2),
append(Report1, Report2, ReeportS).
conquer(Head,String,[First_Dep|Restt_Deps].Remainder,Reports) :- % MANY DEPS
drule(First_Dep, Left_Deps,, Right_Deps),
divide(String,L e f t ,Right,Fiirst_Dep),
conquer(First_Dep,Left.Leftt_Deps,[],Report1),
whittle(First.Dep, Right, RRight_Deps, Remainder2,Report2),
conquer(Head, Remainder2, RRest_Deps,Remainder1,Reports),
append(Report1,Report2,Repoor13),
append(Reports,Reports,Repoort5).
conquer(Head,[First_WordIRest_Wordsg],[First_DepIRest_Deps].Remainder,
[(First_Dep,Head)I Report1]) :-
drule(First_Dep, □,[]),
conquer(Head,Rest_Words,ResSt_Deps,Remainder,Report 1).
/***************************************************************************
*
* whittle(+Head,+String,+Dependentss,-Remainder,-Report).
*
2317
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% FILENAME: gazdar_mellish.pl
%
% WRITTEN BY: Gerald Gazdar & Chris Hellish, with minor
% modifications by Norman M. Fraser.
%
% DESCRIPTION: Contains the concatenation of several files
% (ncimely: buchartl.pl, chrtlibl.pl, library.pl,
% psgrules.pl, lexicon.pl, examples.pi) from the
% program listings in Gazdar & Hellish (1989).
% Some minor changes have been made to mahe the
% program run under Quintus Prolog. A few
% predicates which are irrelevant here have
% been removed (mostly from library.pl).
%
% VERSION HISTORY: Jcmuary 16, 1993 (date created in this form)
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% ORIGINAL NOTICE FOLLOWS:
%
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% Example code from the book "Natural Language Processing in Prolog" %
% published by Addison Wesley %
% Copyright (c) 1989, Gerald Gazdar & Christopher Hellish. %
% % % % % % % % % % % % % % % % */. % % % % % % % % % % % % % % % % % % % %
%
% Reproduced by kind permission.
%
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
%
% buchartl.pl A bottom-up chart parser
%
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
parse (VO,Vn,String) :-
8tart_chart(V0,Vn,String). % defined in chrtlibl.pl
%
add_edge(VO,VI,Category,Categories,Parse) :-
edge(VO,VI,Category,Categories,Parse),!.
%
add_edge(Vl,V2,Category1,[],Parse) :-
assert_edge(Vl,V2,Categoryl,[],Parse),
foreach(rule(Category2,[CategorylI Categories]),
add_edge(Vl,VI,Category2,[CategorylI Categories], [Category2])),
foreach(edge(VO,VI,Category2,[CategorylI Categories],P arses),
add_edge(VO,V 2 ,Category2,Categories,[Parse IParses])).
add_edge(VO,VI,Categoryl,[Category21 Categories],Parses) :-
assert_edge(VO,VI,Categoryl,[Category21 Categories],Par s e s ) ,
foreach(edge(VI,V 2 ,Category2,[],Parse),
add_edge(VO,V2,Categoryl,Categories,[ParseI Parses])).
318
/**************************************************************************/
%
% chrtlibl.pl Librêiry predicates for database chart parsers
%
/**************************************************************************/
%
start_chart
uses add_edge (defined by particular chart parser) to insert inactive
edges for the words (and their respective categories) into the chart
%
start.chart(VO,VO,[]).
st£irt_chart(VO,Vn, [Word IWords] )
VI is VO+1,
f oreach(word(Cat egory,W o r d ) ,
add_edge(VO,VI,Category,[],[Word,Category])),
steurt.chart (V I ,V n ,Words ).
% test
% allows use of test sentences (in examples.pi) with chart parsers
%
test(String)
VO is 1,
initial(Symbol),
parse(VO,Vn,String),
f oreach(edge(VO,V n ,Symbol,[],Parse),
mwrite(Parse)),
retractall(edge(_,_,_,_,_)).
%
% foreach - for each X do Y
%
foreach(X,Y)
X,
do(Y),
fail.
foreach(X,Y)
true.
do(Y) Y , !.
%
% mwrite prints out the mirror image of a tree encoded as a list
%
mwrite(Tree)
mirror(Tree,Image),
write(Image),
nl.
%
% mirror - produces the mirror image of a tree encoded as a list
%
mirror ( □ , [ ] ) !.
mirror(Atom,Atom)
atomic(Atom).
mirror( [X1|X2],Image)
mirror(Xl,Y2),
mirror(X2,Y1),
319
append(Y1,[Y2],Image).
%
% assert_edge
% asserta(edge(...)), but gives option of displaying nature of edge crreaated
%
assert_edge(Vl,V2,Categoryl,[],Parsel)
asserta(edge(Vl,V2,Categoryl,[],Parsel)).
% dbgwrite(inactive(Vl,V2,Categoryl)).
assert_edge(VI,V2,Category1,[Category21 Categories].Parsel)
asserta(edge(Vl,V2,Categoryl,[Category21 Categories],Pars e l ) ) .
% dbgwrite(active(VI,V2,Categoryl,[Category21 Categories])).
%
/*********************************************************%*************'****/
%
% library.pl A collection of utility predicates
%
/**********************************************************************#+***/
%
% '--- > ’ an arrow for rules that distinguishes them from D<CG ( ’— >') iruiles
%
?- op(255,xfx,-- >).
%
% definitions to provide a uniform interface to DCG-style :fule format:
% the 'word' predicate is used by the RTNs and other parse.rs
% the 'rule' clause that subsumes words is used by the cheurt parsers
%
word(Category,Word)
(Category ---> [Word]).
%
rule(Category,[Word])
use_rule,
(Category ---> [Word]).
%
% in order for the clause above to be useful,
% use_rule. needs to be in the file.
%
rule(Mother,List_of.daughters) :-
(Mother --- > Daughters),
n o t (islist(Daughters)),
conj tolist(Daughters,List.of.daughters).
%
% conjtolist - convert a conjunction of terms to a list off terms
%
conjtolist((Term,Terms), [Term I List.of.terms]) !,
conjtolist(Terms,List.of.terms).
conjtolist(Term,[Term]).
%
% islist(X) - if X is a list, C&M 3rd ed. p52-53
%
islist(D) !.
islist([. I . ] ) .
%
320
*/h rread in(X) — convert keyboard input to list X, C&M 3rd ed. plOl-103
VL
r:eaad_in( [Wordll Words] )
getO(CCharacterl),
readwcord(Characterl ,Word,Character2),
restseent(Word,Characters,Words).
%
res3tsent(Word,,Character, [] )
lastwcord(Word),!.
resstsent(Wordll .Character 1, [WordS IWords] ) :-
readwoord(Characterl,Word2,Character2),
restseent(Word2,Characters,Words).
•/.,
rea^dwordCCharaacterl.Word,Characters)
singlee_character(Characterl). !.
nameCWWord.[Character1]).
ge10(CCharact er2).
reac.dword(Charaacterl.Word,Characters)
in_worcd(Characterl,Character3).!.
getO(CCharacter4).
restwoord(Character4.Characters.Characters).
name(WWord,[Character31 Characters]).
reacdwordCCharaacterl.Word,Characters)
getO(CEharacter3),
readwonrd(Character3.Word,Characters).
%
resttword(Charaacterl.[Characters ICharacters].Character3) :-
in_wordd(Characterl.Characters).!.
getO(Chharacter4).
restworrd(Character4.Characters,Character3).
rest:word(Characcter. [] .Character).
%
sing;le_characteer(33). % !
s ing;l e_charact e er (44 ). % .
sing;le_characteer(46). % .
s ing;le_charact esr (58) . % :
sing;le_characteBr(59). % ;
sing;ie_charactesr(63). % ?
%
in_wcord (Char act t e r .Charact er) :-
Chêuractèer > 96.
Charact6 er < 123. % a-z
in_word (Charact e e r .Charact er )
Characteer > 47,
Characteer < 58. % 1-9
in_word(Characteerl.Characters) :-
Charact6 erl > 64.
Characteerl < 91.
CharacteerS is Characterl + 32. % A-Z
in_wo>rd(39.39). ' % '
in_wo>rd(45.45). ! % -
%
lastword( ’. *) •
lastW'ord( ’ ! ’).
lastword( ’) .
321
%
% testi - get u s e r ’s input and pass it to test predicate, then repeat
%
testi :-
w r i t e C ’End with period and < C R > ’),
read_in(Words),
append(String,[Period],Word s ) ,
nl.
test(String),
nl,
testi.
%
% dbgwrite - a switchable tracing predicate
%
dbgwrite(Term)
dbgon,
write(Term),
nl, ! .
dbgwrite(Term).
%
dbgwrite(Term,Var)
dbgon,
integer(Var),
tab(3 * (Var - 1)),
write(Term),
nl, ! .
dbgwrite(Term,Var)
dbgon,
write(Term), write(" "), write(Var),
nl, ! .
dbgwrite(Term,Var).
%
dbgon. % retract this to switch dbg tracing off
/**************************************************************************/
%
% psgrules.pl An example set of CF-PSG rules
%
/**************************************************************************/
%
% DCG style format
%
/* op(255,xfx,--->). */
%
initial(s). % used by chart parsers
%
s ---> (np, v p ) .
np ---> (det, n b ) .
nb ---> n .
nb ---> (n, rel).
rel ---> (wh, v p ) .
vp ---> iv.
vp ---> (tv, np).
322
' ( [xreui*J9ii*o:v‘3[onp‘B*eAB3‘ireuio/i‘9iiq.] ) 3,saq.
^as9q.
’( [P8%P '99%' M91I^'XpWSS 'M91I^'UIT::(] )q.S9Q.
-: e^s9îi
• ( [i[0np‘‘B*û9S‘itpires] )5.S9Q.
-: C:^S9q.
' ( [P9Tp‘urp3[])îlS9q.
-: T^S9Q.
%
%9SZT2d JOJ P9UTJ9P 9q qSTlUI <^S9q.< 9q.%0Tp9J:d - S9"[dureX9 ^S9q. JO Q.9S V %
/**************************************************************************/
%
S9%durex9 q.s9q. jo q.9s v ld*s9xdurex9 %
%
/**************************************************************************/
/* •(<--- ‘x j x ‘533)do -i */
/**************************************************************************/
%
UOOTX9X 9xdurex9 uv xd'iio^TXsX %
%
/**************************************************************************/
%
•(du *d) <— dd
•(s ‘AS) <— dA
(dd ‘du ‘Ap) <— dA
te sts
t e s t ( [lee,handed,a,duck,that,died,to,the,woiticin] )
%
/**************************************************************************/
%
% Necessary addition for Quintus Prolog compatibility
%
/**************************************************************************/
not(X) :-
\+X.
324
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FILENAME: hays_parser.pl
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
LOAD DECLARATIONS
- ensure_loaded(lib).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
/***************************************************************************
*
* tabular_parse/0.
*
* Parse a string. After initializing the table with each category
* licensed by the string and the grammar, make multiple passes, on
* each pass considering only sub-strings one word longer than in the
* last pass. For each saturated dependency, record (i) the creation
* of a new saturated head and (ii) the tree rooted in that head.
* To conclude, signal either success or failure and, if success,
* return all well-formed trees which span the entire input string.
♦/
tabular.parse :-
retractall(sentence_length(_)),
retractall(spans(_,_,_)),
read_in(String),
initialize_pcirse_table(String),
parse_increasing_substrings(l),
(
show_complete_tree
I
writeln('PARSE FAI L E D ')
).
/***************************************************************************
*
* initialize_parse_table(+String).
*
* Given a list of words, initialize the sub-string table with all
* their possible category assignments.
325
*/
initialize_parse_table(WordString) :-
initialize_parse_table(WordString,0).
% initialize_parse_table/2.
initialize_parse_table([.],N)
assert(sentence_length(N)).
initialize_parse_table([First IRest],M) :-
f indall(Class,word_class(First,Class) .Bag),
N is M+1,
add_spsms_including_trees(Bag,M ,N ) ,
initialize_parse_table(Rest,N).
add_spans_including_trees([]
add_spcins_including_trees( [First IRest] ,M,N)
a s sert(spans(M,[First,*],N)),
add_spaiis_including_trees(Rest,M ,N ) .
/***************************************************************************
*
* parse_increasing_substrings(+Length).
*
* Extract all strings of length Length from the table euid attempt to
* parse them. If parsing succeeds record the head and the dependency
* structure in the table.
♦/
parse_increasing_substrings(N)
sentence_length(N).
parse_increasing_substrings(N) :-
gpf_ s at _drule(H e a d ,Before,After),
append(Before,[Head IAfter].Body),
length(Body,N),
extract_sub_string_amd_trees(N,Sta r t ,Body.Trees,F inish),
NewHead =.. [Head,*],
NewTree =.. [HeadI Trees],
assert_if_new(spans(Start,[NewHead,NewTree].Finish)),
fail.
parse_increasing_substrings(M)
N is M+1,
parse_increasing_substrings(N).
/***************************************************************************
*
* extract_sub_string_and_trees(+N,-Start,-Result,-Trees,-Finish).
*
* Extract a sub-string from the table, N units long where each unit
* is a single word or a fully-connected dependency tree. Returns both
* sub-string auid the corresponding trees. Also returns the Start and
* Finish addresses of the sub-string.
*/
extract_sub_string_and_trees(N,Sta r t ,Result,T r e e s ,Finish) :-
extract_any_sub_string_with_trees(Start,Result.Trees,F i n i s h ) ,
length(Result.N).
326
% extract_amy_sub_string_with_trees/4.
extract_ 2üiy_sub_str ing_with_trees (Start, [Label] , [Tree] ,Finish) :-
spans (Stcirt, [Label,Tree] ,Finish).
extract_ajiy_snb_string_with_trees(Start,[LabelI Substring],[TreelTreeList].Finish)
spans(Start,[Label,Tree].Intermed),
extract_émy_sub_string_with_trees(Intermed,Substring,TreeList.Finish).
/***************************************************************************
*
* show_complete_tree/0.
*
* Succeeds if a root edge (and associated tree) spans the whole sentence
* in the sub-string t a b l e . Writes out all spanning trees to the standard
* output,
*/
show_complete_tree
sentence_length(N),
findall([Label IT r e e ] ,spans(0,[Label ITree],N).TreeBag),
return_admissible_trees(TreeBag.Admit).
writeln('PARSE SUCCEEDED'),
each_member(Admit.writeln),
return_admissible_trees( [] . [] ).
return_admissible_trees([[Label,Tree]I Res t ] , [Tree IResult]) :-
Label =.. [Root I_],
root(Root),
return_admis8ible_trees(Rest.Result).
return_admissible_trees([First IRest].Result)
return,admissible_trees(Rest.Result).
327
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% FILENAME: hays_recognizer.pl
%
% WRITTEN BY: Norman M. Fraser
%
% DESCRIPTION: A recognizer for detrmining whether an arbitrary
% string belongs to the language generated by a
% given grammar. This is an implementation of the
% algorithm described by David Hays in Lsinguage 40(4) :
% 516-517, 1964.
%
% VERSION HISTORY: 1.0 August 8, 1992
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% LOAD DECLARATIONS
:- ensure_loaded(lib).
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% DYNAMIC PREDICATE DECLARATIONS
:- dynamic sentence_length/l.
:- dynamic spans/3.
/***************************************************************************
*
* recognize/0.
*
* Try to recognize a string. After initializing the table with each
* category licensed by the string and the grammar, madce multiple
* passes, on each pass considering only sub-strings one word longer
* than in the last pass. For each new saturated dependency, record
* the creation of a new saturated head. Signal either success or
* failure in recognizing the string.
*/
recognize :-
retractall(sentence_length(_)),
retractall(spans(_,_,_)),
read_in(String),
initialize_table(String),
apply_rules_of_increasing_length(1),
(
complete_spain ->
writeln('PARSE SUCCEEDED')
I
writeln('PARSE F A I L E D ')
).
/***************************************************************************
*
* initialize_table(+CatStr).
*
328
* Given a list of words, initialize the sub-string table with all
* their possible category assignments.
*/
initialize_table(WordString)
initialize_table(WordString,0).
% initialize_table/2.
initialize_table([.],N)
assert(sentence_length(N)).
initialize_table([First I Rest],M) :-
findall(Class,word_class(First,Class),Bag),
N is M+1,
add_spems(Bag,M,N),
initialize_table(Rest,N).
add_spans( []
add.spans( [First I R e s t ] ,M ,N)
assert(spans(M,First,N)),
add_spans(Rest,M ,N ) .
/***************************************************************************
*
* apply_rules_of_increasing_length(+Length).
*
* Extract all strings of length Length from the table and attempt to
* parse them. If parsing succeeds record the head aind boundaries of
* the new edge in the table.
*/
apply_rules_of_increasing_length(N)
sentence_length(N).
apply_rules_of_increasing_length(N)
gpf_sat_drule(Head,Before,After),
append(Before,[HeadI After],Body),
length(Body,N),
extract_sub_string(N,Start,B o d y ,Fini s h ) ,
New =.. [Head,*],
assert_if_new(spans(Start,N e w ,Finish)),
fail.
apply_rules_of_increasing_length(M) :-
N is M+1,
apply_rules_of_increasing_length(N).
/***************************************************************************
*
* extract_sub_string(+N,-Start,-Result,-Finish).
*
* Extract a sub-string from the table, N units long where each unit
* is a single word or a fully-connected dependency tree. Also returns
* the Start emd Finish addresses of the sub-string.
*/
extract_sub_string(N,Start,Result,Finish) :-
extract_ainy_sub_string(Start,Result,Finish),
length(Result,N).
329
% extract_any_sub_string/3.
extract_cOiy_sub_string(Start, [Label] .Finish) :-
spans(Start.Label,Finish).
extract_any_snb_string(Start,[Label ISubstring].Finish) :-
spans(Start.Label.Intermed).
extract_any_sub_string(Intermed.Substring.Finish),
/***************************************************************************
*
* complete_span/0.
*
* Succeeds if a root edge spans the whole sentence in the sub-string table.
♦/
complete.span
8 ent enc e_1 ength(N ),
spéins(0,Label,N),
Label =., [Root,*],
root(Root).
330
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FILENAME: hays_generator.pl
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
LOAD DECLARATIONS
libraryCfiles) is a Quintus Prolog library. To run with other
prologs replace call to file_exists/l in enumerate/1 with the
local equivalent.
:- ensure_loaded(libreury(f iles) ).
:- ensure_loaded(lib).
:- ensure_loaded(dg_compile).
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% DYNAMIC PREDICATE DECLARATION
:- dynamic known_tree/l.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
/***************************************************************************
*
* enumerate(+File).
*
* The top level predicate. Enumerates all the strings generated by a
* dependency grammeur in Gaifman Prolog Form contained in File.
*/
enumerate(File) :-
(
file_exists(File),
purge_grammar_rules,
dg.compile(gpf,File)
331
writeln([’ERROR! Non-existent grammar file: ’ .File,’.’]),
abort
).
retractall(known_tree(_)),
enumerate_loop.
/***************************************************#***********************
*
* enumerate/0.
*
* An alternative top level predicate. Enumerates a l l the strings generated
* by the dependency grammar in Gaifman Prolog Form w h i c h has already
* been compiled.
*/
enumerate
(
grammar.present(gpf,_)
I
writeln(’ERROR ! GPF grammar not l o a d e d . ’),
abort
),
retractall(known.tree(_)),
enumerate.loop.
/**************************************************»*************************
*
* enumerat e_loop/0.
*
* A failure-driven loop which forces backtracking through all possible
* strings generated by the g r a m m a r .
*/
enumerate.loop :-
generate.tree(Tree),
(
known_tree(Tree) ->
fail
I
assert(known.tree(Tree)),
build_cat_list(Tree,CatStr)
).
enumerate.surface(CatStr),
fail,
enumerate.loop.
/***************************************************************************
*
* generate.tree(-Tree).
*
* Generating a dependency tree is a two-stage process as described by
* Hays.
*/
332
g;enerate_tree(Tree) :-
stage_one(Root),
stage_two(Root,T r e e ) .
/***************************************************************************
*
* stage_one(-Root).
*
* The first stage retrieves a permissible sentence root from the
♦ g rammar.
*/
Stage_one(Root)
root(Root).
/***************************************************************************
*
* stage_two(+Root,-Tree).
* stage_two(+Root,-Tree,+N).
*
* The second stage constructs a Tree rooted in Root and well-formed
* according to the rules of the grammar being used. N is a counter
* which keeps track of the depth of the tree. When max_tree_depth(Max),
* N = Max, enumeration is aborted.
*/
stage_two(Root,Tree) :-
stage_two(Root,T r e e ,1).
stage_two(Root,Tree,_) ;-
drule(Root, [],[]),
Tree =.. [Root,*].
stage_two(Root,Tree,N)
drule(Root,Before,After),
embedded_stage_two(Before,BeforeTrees,N ) ,
embedded_stage_two(After,AfterTrees,N),
append([Root IBeforeTrees],[*IAfterTrees],ListOfTrees),
Tree =.. ListOfTrees.
embedded_stage_two(_,[],Max) ;-
max_tree_depth(Max),
writeln(’Maximum depth reached in search tree. Pruning...’),
!.
embedded_stage_two( [] ,[],_).
embedded_stage_two( [Head|Tail],[HeadTreeITailTrees],M) :-
N is M+1,
stage.two(Head,HeadTree,N ) ,
embedded_stage_two(Tail,TailTrees,M ) .
/***************************************************************************
*
* max_tree_depth(-Integer).
♦
* This is required to avoid infinite looping. The maximum may be reset
333
* to smy positive integer value, as required.
*/
max_tree_depth(20).
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
*
* build_cat_list(+Tree,-CatList).
*
each_tree(_, [] ,Result,Result ).
each_tree(Root, [*IRes t ] ,Current,Result) ;-
append(Current,[Root],New),
each_tree(_,Rest,New,Result).
each_tree(Root, [Terminal IRes t ] ,Current,Result) :-
Terminal =.. [Name,*],
append (Current, [Naune] ,New),
each_tree(Root,R e s t ,New,Result).
each_tree(Root,[Tree IRes t ] ,Current,Result) :-
build_cat_list(Tree,Res1),
append(Current,Resl,New),
each_tree(Root,R e s t ,N e w ,Result).
/***************************************************************************
*
* enumerate_surface(+CatList)_.
*
* Find all grammatically possible surface strings which instamtiate a
* list of word categories. Write each of these to the standard output.
*/
enumerate_surface(CatList)
findall(String,surface(CatList,String),A l l ) ,
each_member(A l l ,write_sentence_list),
!.
/***************************************************************************
*
* surface(+CatList,-SurfList).
*
* Return a single list of surface forms (words) for a given list
* of word categories.
*/
surface ([] , [] ).
surface([Cat I R e s t ] ,[Word I Result])
word_class(Word,Cat),
surface(Rest,Result).
334
" (a.tidTii)9SJBd"nqx
*(qndui)UT“p99j
*(, ;,)9qxj:ft
*iu
*(,p9sx-ed 9q oq. 3uxxqs 9dXq. 9S99Xd, )9qxxfl
9SX9d"x9qTi9m9X3iix
/*
•X9sxed 9qq oqux 3uxjq.s 9qq ss9j -3uxxqs qndux tits xoj qduioxj *
*
" 0 /9 S T e d " X 9 q % I 9 U I 9 X 3 lI X *
*
***************************************************************************/
• 9SXT2d“ XTBqU9UI9X0UX
‘(
qaoq9
‘ ( [ ‘ ■<‘®ITd‘ t :®ITÎ JTBUiurex3 qu9asxx9-uofi
I
u ix o j 3 o x o x d xremjxBj) u x ® */,*/, (9 x x d )9 x x d u io o “ 3 p
*S9XTlX“XBUIUrex3~93xTld
‘ (®ITd)sîiSTX®"®ITJ:
)
- : ( 9 X x j ) 9 S X 9 d ” x^Q^^®ï“®-i3UT
/*
•0/®SXTBd“XT2qU9ra9XOUX *
XTT23 U9 qq ‘u u o j 3oxoxj ireuijx^D ux 9XTJ vhoxj X9 tmirex3 Xoti9 pu 9 <j9 p 9 p 9 oq *
*
■(®ITi+)®s^«I“I^3^ti9m9Xoux
*
*
++***+************++*******************************************************/
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
' (( s 9 X T f ) X x 9 x q x x ) p e p s o x " e z T i8 ii9
' (qxx)pepT20x"e% Ti8U9
* (9XTduiOO“ 3 p ) p 9 p 9 0 X “ 8-Itl8U 9 - :
SNOiiVTiviDaa a v o i %
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
Z661 ‘ 8 ^^8Ti3tiv O'T :A H 01SIH N0ISH3A %
%
X9SX9d %
Aou9 pu 9 d 9 p 90np9X-qjxiis dn-uioq:ioq XTeq.Ti8 ui9 X0 UX uv : NOLldiaOSaa %
%
X9S9XJ 'M U9UIX0M : AS. NaiXIdW %
%
Xd*9onp9X“ q3:xqs“x®^Ti®ui9JOiix ^aWVNaild %
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
/*********************** ******************************************** *********
*
* ibu_parse(+Input).
*
* The top level parse predicate,
*/
ibu_parse(Input)
ibu_parse_loop(Input, □ ),
w r i t e ( ’Parse succeeded’),
nl.
ibu_parse(_)
w r i t e ( ’Parse failed’),
nl.
/****************************************************************************
*
* ibu_parse_loop(+Input,-Result).
*
* The main parse loop. There are three possibilities: termiinate,
* reduce, and shift. Result reporting is suppressed here too emphasize
* the simplicity of the algorithm,
*/
ibu_parse_loop( [.] , [dr(Root, [],[])]) */,'/,TERMINNATE
root(Root),
ibu_p 2irse_loop(Input, [First I [Second IRest] ] ) :- */,*/,REDUCEE
reduce_inc(First,Second,Result),
ibu_parse_loop(Input,[Result IRest] ).
ibu_parse_loop( [Word IRest] ,Stack) :- */,*/,SHIFT '
word_class(Word,Class),
drule(Class,Before_Deps,After_Deps),
reverse(Before_Deps,Before_Depsl),
ibu_parse_loop(Rest,[dr(Class,Before_Depsl,After_Depps)I Stack]),
/****************************************************************************
*
* reduce_inc(+StackTop,+StackNext,-NewTop).
*
* The rules of reduction. The second and third rules basiccally do the
* same thing but two clauses are required because of the wway in which
* Prolog constructs lists,
*/
reduce_inc(dr(X, [YI Alpha] ,Beta) ,dr(Y, [],[]),dr(X, Alpha,Betaji) ),
reduce_inc(dr(X, □ ,Alpha),dr(Y, □ ,[X]) ,dr(Y,[],Alpha)),
reduce_inc(dr(X, □ ,Alpha) ,dr(Y, [] , [XI Beta] ) ,dr(Y, [] , [Alpha I]|Beta] ) ),
336
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% FILENAMME: lib.pl
%
% WRITTENf BY: Norman M. Fraser
%
% DESCRIPTTION: A library of mostly general-purpose predicates.
% Originally designed for use with a variety
% of programs making use of dependency grammars,
% hence the presence of more specific predicates
% such as gpf_rules_present/0.
%
% VERSIONf HISTORY: 1.0 August 8, 1992
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
y****************************************************************************
*
* append(+**Listl,+*List2,+*Result).
*
* Append Liistl and List2 to form Result. Can also be used in reverse
* to split ]Result into pairs of sub-lists.
♦/
/*
append( □ ,Liist.List).
append( [HeadilTaill].List,[Head|Tail2]) :-
appesnd(Tail1 .List,Tail2).
*/
/****************************************************************************
*
* assert_ifJ_new(+Clause).
*
* If clause ! exists in the database then do nothing; otherwise add it.
*/
assert_if_neww(Clause) :-
Clausse =,. [Head IBod y ] .
clausse(Head,Body).
!.
assert_if_neww(Clause) :-
asserrt(Clause).
I
/****************************************************************************
*
* concat(?Prcefix.+Suffix,?Whole).
*
* Append a clharacter string to an atom.
*/
coneat(Prefix;,SuffixChars,Whole) :-
ncime(IPrefix.PrefixChars).
337
append(PrefixChars.SullixChars,WholeChars),
nsone (Whole,WholeChars ).
/***************************************************************************
*
* cross_product(+Listl,+List2,-Result).
*
* Produces the cross product of two lists. List 1 and List2.
*/
cross .product ( □,_,[]).
cross_product([H|T],In,Out)
embedded_x_product(In,H ,Intermedl),
cross_product(T,In,Intermed2),
append(Intermedl,Intermed2,Out).
% embedded_x_product/3.
embedded_x_product ([],_,□).
embedded_x_product( [HIT],Const, [[Const,H]I Result]) :-
embedded_x_product(T,Con s t ,Result).
/***************************************************************************
*
* dot/0.
*
* Write a dot to the standard output. Used for registering activity
* lengthy processes.
♦/
dot ;-
write(user,’. ’),
flush.output(user).
/***************************************************************************
*
* each_member(+List,+Predicate).
*
* Applies a Predicate of arity=l to each item in List. Predicate
* will normally have side effects. For example, a typical usage
* would be to write each member of a list: each.member(List,write).
*/
each_member( □ ,_).
each_member([Argument IRest],Predicate)
Term =.. [Predicate,Argument],
call(Term),
each_member(Rest.Predicate).
/***************************************************************************
*
* purge_grammar_rules/0.
*
* Retract dependency grammar rules (of all formats) from the Prolog
* database.
338
*/
purge_grammar_rules :-
r e t r a c t a l l ( d r u l e ,
retractall(gpl_sat_drule(_,_,_)),
retractall(ff_drule(_,_)),
retractall(rff_drule(_,_)),
r e t r a c t a l K r o o t (_) ),
retractall(word_class(_,_)).
y***************************************************************************
*
* read_in(-ListOfAtoms).
*
* Read a sentence terminated by a legitimate last character from the
* standard input. Convert input to lower case and filter excluded
* characters. Return a list of atoms terminated by a fullstop.
*
* From Clocksin & Mellish (1987) Programming in Prolog. Berlin:
* Springer-Verlag. (3rd Edition). 101-103.
*/
read_in([Word IWords])
getO(Characterl),
readword(Characterl,Word,Character2),
restsent(Word,Character2,Words).
% Given a word and the word after it, read in the rest of the
% sentence.
restsent(Word,Character,[])
lastword(Word),!.
restsent(Wordl,Character1 , [Word21 Words]) :-
readword(Charact er1,W o r d 2 ,Charact er2),
restsent(Word2,Character2,Words).
339
% These cheuracters form words on their own.
single_character(33). % !
single_ charact e r (44)
s ingle_charact e r (46)
s ingle_character(58)
s ingle_character(59)
single_character(63)
% These characters can appear within a word. The second in_word clause
% converts characters to lowercase.
in_word(Character,Character) :-
Character > 96,
Character < 123. % a-z
in_word(Character,Character) :-
Character > 47,
Chsoracter < 5 8 . % 1-9
in_word(Characterl ,Chcuracter2) :-
Characterl > 64,
Characterl < 91,
Ch«uracter2 is Characterl + 32. % A-Z
in_word(39,39). % ’
in_word(45,45). % -
/***************************************************************************
*
* reverse(+ForwêirdList ,-BackwaxdList).
*
* Reverse ForwardList to produce BackwsurdList.
*/
reverse(In,Out) :-
reverse (In, [] ,Out).
reverse([],G u t ,O u t ) .
reverse( [First 1Rest],Temp,Out) :-
reverse(Rest,[First ITemp],Out).
y***************************************************************************
*
* writeln(+Data).
*
* Write Data to the standard output ending with a newline, where Data
* is either an atom or a list of atoms.
*/
writeln( [] ) :-
nl.
writeln([H|T]) :-
write(H),
writeln(T).
340
writeln(X)
write(X),
nl.
/***************************************************************************
*
* write_sentence_list(List).
*
* List is a list of atoms. Write each atom to the standard output,
* separated by a space character.
*/
write_sentence_list([])
nl.
write_sentence_list([First IRest]) :-
write(First),
write(' *),
write_sentence_list(Rest).
341
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% FILENAME: map_to_dcg,pi
%
% WRITTEN BY: Norman M. Fraser
•/
y DESCRIPTION: Map a Gaifman-format dependency grammar into
y a definite clause grammar.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% LOAD.DECLARATIONS
:- ensure_loaded(lib).
:- ensure_loaded(dg_compile).
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% DYNAMIC PREDICATE DECLARATIONS
:- dynêunic max_no_deps/2.
/***************************************************************************
*
* map_to_dcg(+InFile,+OutFile).
*
* Read a Gaifman format dependency grammar from InFile. Write a definite
* clause grammar to OutFile.
*/
map_to_dcg(InFile,OutFile) :-
dg_compile(InFile),
tell(OutFile),
write('%%% DCG GENERATED FROM THE DEPENDENCY GRAMMAR: ’),
write(InFile),
writeC’ */,*/,*/.’),
nl, nl,
write(':- ensure_loaded(lib).’),
nl, nl,
write( ’*/,•/, PARSE PREDICATES ’) ,
nl,
construct_call,
nl, nl,
retractall(max_no_deps(_,_)),
write(»•/;/, RULES'),
nl,
construct_rules,
nl, nl,
write('%% WORD CLASS ASSIGNMENTS’),
nl,
construct_assignments,
told.
342
/ ***************************************************************************
*
* construct_call/0.
*
* Construct a ’dcg_parse* predicate for parsing with the grammar.
*/'
coinstruct_call
write( ’dcg.paxse
begin_new_line,
write( *write(*’Please type the sentence to be parsed’’) , ’),
begin_new_line,
write( ’n l , ’),
begin_new_line,
write( ’read_in(Input),’),
begin_new_line,
w r ite( ’dcg_parsel(Input).’),
nl, nl,
construct_embedded_call.
/***************************************************************************
*
* construct_embedded_call/0.
*
* Construct a parse predicate for each different type of root
* allowed by the DG.
♦/
construct_embedded_call :-
retract(root(Root)),
write( ’dcg_parsel(Input) :-’),
begin_new_line,
write( ’phrase(rule_’),
write(Root),
write( ’(Tree),I n p u t ,[.]),’),
begin_new_line,
write ( ’write ( ’ ’PARSE SUCCEEDED : ” ) , ’),
begin_new_line,
write( ’write(Tree),’),
begin_new_line,
write( ’nl, n l . ’),
nl,
construct_embedded_call.
construct_embedded_call :-
write( ’dcg_parse :-’),
begin_new_line,
write ( ’writ e ( ’ ’PARSE FAI L E D ” ) , ’),
begin_new_line,
write( ’nl, n l . ’),
nl.
343
/**************************************************************#**********:**
*
* begin_new_line/0.
*
* Initialize a new line of Prolog code.
*/
begin_new_line
nl,
t ab ( 8 ) .
/**************************************************************:***********'**:
*
* construct_rules/0.
*
* Add a DCG rule for every DG rule in the grammar. Ensure that DCG
* rules return a parse tree as their result.
*/
construct_rules
retract(drule(Head,Pre,Post)),
writ e ( ’r u l e _ ’) ,
write(Head),
wri t e( ’(X) — > ’),
dep_write(Pre,’A ’,1),
write(' ’),
write( *word_’),
write(Head),
write( ’,’),
dep_write(Post,’B ’,1),
nl.
tab ( 8 ) ,
write( ’{ X =.. [’) ,
write( ” ” ) ,
write(Head),
write( * ’*’),
retract(max_no_deps('A ',Amax)),
write_exs( ’A ’,1, Ama x ) ,
write(*,*’),
retract(max_no_deps( ’B ’,Bmax)),
write_exs( ’B *,1,B m a x ) ,
w ri t e C ’] } . ’),
nl,
construct_rules.
construct_rules.
/***************************************************************************
*
* dep_write/3.
*
* Map a list of dependents for a head onto a list of calls to DCG
* rules.
*/
dep_write([].Prefix,N)
344
aj.ssert(max_no_deps(Prefix,N)),
dep_writej ( [[First IRest] ,Pre f i x ,M) :-
wri.te(* ’),
wri.te(’rule_*),
wrri.te(First) ,
wrri;te( ’( ') ,
w:ri:te(Pref ix) ,
w]ri te( M ) ,
wa:iite(*),’),
N is M+1,
d«ep_ write (Rest, Prefix, N ) .
/ ***************************************************************************
*
* write_exs/3.
*
* Write result variables from all DCG rules which are called within
* some rmle).
*/
write_exs(_,Max,Max) .
write_exs( P refix,M,Max)
w r i t « ( » ,»),
write( P r e f i x ) ,
write(M),
N is M+1,
write_exs(Prefix,N,Max).
/ * * * * * * * * * * * * i | ( : | i ^ : ( c * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
*
* construct__assignments/0.
* construct_assignraents/2.
*
construct_assignments(□ ,_)
nl.
construct_assignments([WordlRest],Class)
write( 'word.’) ,
write(Class),
write( ’ — > [ ’),
write(Word),
write(’] . ’),
nl,
construct_assignments(Rest,C l a s s ) .
345
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
FILENAME: nmf_chaxt.pi
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% % %% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %% %
Example code from the book "Natural Language Processing in Prolog"
published by Addison Wesley
Copyright (c)1989, Gerald Gazdar & Christopher Mellish.
% % %% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
:- ensure_loaded(dg_compile).
:- dynamic edge/4.
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
%
% buchartl.pl A bottora-up chart parser
%
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
%
% This new initialization predicate loads a dependency grammar (as
% defined in File) in full form.
%
initialize_dchart(File) :- %% NEW PREDICATE
346
(
file_exists(File),
purge _ grammar,rule s ,
dg_compile(ff ,File) '/,*/, load a DG in full form
I
writeln([’ERROR! Non-existent grammar file: ’ .File,’. ’]),
abort
).
dchart_parse(VO,Vn,String)
start_chart(VO,Vn,String). % defined in chrtlibl.pl
%
a d d _ e d g e [’* ’],_). */.*/. NEW CLAUSE - no dependents
add_edge(VO,VI.Category.Categories.Parse)
edge(VO,V1.Category.Categories.Parse).I.
add_edge(Vl,V2.Categoryl.[].Parse)
assert_edge(Vl.V2.Categoryl.[].Parse).
foreachCrule(Category2.[Categoryl|Categories]).
add_edge(Vl.VI,Category2.[CategorylI Categories].[Category2])).
foreach(edge(VO.V I ,Category2.[CategorylI Categories].Parses).
add_edge(VO,V2.Category2.Categories.[ParseI Parses])).
add_edge(VO.VI.Categoryl.[Category21 Categories].Parses)
assert_edge(VO,VI.Categoryl.[Category21 Categories].Parses).
foreach(edge(Vl.V2.Category2.[].Parse),
add_edge(VO.V2,Categoryl.Categories.[ParseI Parses])).
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
%
% chrtlibl.pl Library predicates for database chart parsers
%
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
%
start_chart
uses add_edge (defined by particular chart parser) to insert inactive
% edges for the words (and their respective categories) into the chart
%
start_chart(VO.VO.[]).
start_chart(VO.Vn.[WordlWords])
VI is VO+1,
foreach(word(Category.Wor d ) .
add_edge(VO,Vl.Category,[].[Word.Category])).
start_chart(VI.Vn.Words).
% test
% allows use of test sentences (in examples.pl) with chart parsers
%
test(String)
VO is 1.
% initial(Symbol). %% OLD VERSION
root(Symbol). %% NEW VERSION
dchart_parse(VO.Vn.String). */,*/. NAME CHANGE
foreach(edge(VO.Vn.Symbol.[].Parse).
mwrite(Parse)),
retractall(edge(_,_._._,_)).
347
%
foreach(X,Y)
X,
do(Y),
fail.
foreach(X,Y)
true.
do(Y) Y,!.
%
mwrite(Tree)
mirror(Tree,Image),
write(Image),
nl.
%
mirror ([],[]) !.
mirror(Atom,Atom)
atomic(Atom).
mirror( [XIIX2],Image)
mirror(XI,Y2),
mirror(X2,Yl),
append(Y1,[Y2],Image).
%
% assert.edge
% asserta(edge(...)), but gives option of displaying nature of edge created
%
assert_edge(VI,V2,Categoryl,[],Parsel)
asserta(edge(Vl,V2,Categoryl,[],Parsel)),
%
dbgwrit e(inactive(Vl,V2,Categoryl)).
assert_edge(VI,V2,Category1 , [Category21 Categories],Parsel)
asserta(edge(Vl,V2,Categoryl,CCategory21 Categories],Parsel)),
%
/**************************************************************************/
%
% *--- >* an arrow for rules that distinguishes them from DCG ( ’— > ’) rules
%
?- op(255,xfx,--->).
%
word(Category,Word)
% (Category > [Word]). %% OLD VERSION
348
word_class(Word,Category) . */.*/• NEW VERSION
7.
7rule(Mother,List_of.daughters) %% OLD VERSION
% (Mother --- > Daughters),
% not(islist(Daughters)),
% conj tolist(Daughters,List_of.daughters).
rule(Head,['*']) %% NEW VERSION
ff.drule(Head,[Head]).
rule(Head,Dependents)
ff.drule(Head,Dependents),
Dependents \== [Head].
%
% conjtolist - convert a conjunction of terms to a list of terms
% (NOW REDUNDANT)
%conjtolist((Term,Terms), [Term1List.of.terms]) !,
% conjtolist(Terms,List.of.terms).
y.conj tolist (Term, [Term] ).
%
% islist(X) - if X is a list, CAM 3rd ed. p52-S3
%
islist ([]) !.
islist([.|.]).
%
% read.in(X) - convert keyboard input to list X, CAM 3rd ed. plOl-103
%
read.in([WordlWords])
getO(Characterl),
readword(Characterl,W o r d ,Character2),
restsent(Word,Character2,W o r d s ) .
%
restsent(Word,Character,[])
lastword(Word),!.
restsent(Wordl,Characterl,[Word21 Words]) :-
readword(Characterl,Word2,Character2),
restsent(Word2,Character2,Words).
%
readword(Characterl,W o r d ,Character2)
single.character(Characterl), !,
name(Word,[Characterl]),
getO(Character2).
readword(Characterl,Word,Character2)
in.word(Charac t er1 ,Character3),!,
getO(Ch2Lracter4) ,
restword(Character4,Characters,Character2),
neune(Word, [Character31 Characters] ) .
readword(Character1,W o r d ,Character2)
getO(Ch2uracter3) ,
readword(Character3,W o r d ,Character2).
%
restword(Characterl,[Character21 Characters],Character3)
in.word(Characterl,Character2),!,
getO(Character4),
restword(Character4,Characters,Character3).
restword(Character,[],Character).
%
349
single_character(33). % !
single_character(44). % ,
single_character(46). % .
single_character(58). % :
single_character(59). % ;
single_character(63). '/, ?
%
in_word(Character,Character)
Character > 96,
Character < 123. % a-z
in_word(Character,Character)
Character > 47,
Character < 58. % 1-9
in_word(Characterl,Character2)
Characterl > 64,
Characterl < 91,
Character2 is Characterl + 3 2 . % A-Z
in_word(39,39). % ’
in_word(4S,4S). % -
%
lastwordC’.’)•
lastwordC * !’).
lastwordC’? *).
%
testi
wri t e ( ’End with period and < C R > ’),
read_in(Words),
appendCString,[Period],Words),
nl,
test(String),
nl,
testi.
%
350
dbgon. % retnracct this to switch dbg tracing off
testi :-
test( [kiim,died] ).
test2 :-
test( Csaandy,s a w ,a ,duck] ) .
tests :-
test ( Ckiim,k n e w ,seindy,k n e w ,l e e ,died] ).
test4 :-
test( ['thie,woman,gave,a,duck,to,h e r ,mam] ) .
tests
test( [!le?e,handed,a,duck,that,died,to,t h e ,woman] ) .
%
/* * * * * * * * * * * * * # * % * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
%
% Necessaury acddlition for Quintus Prolog compatibility
%
/*************#*:<***********************************************************/
not(X) :-
\+X.
351
%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%
% FILENAME: shift_reduce.pl
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% LOAD DECLARATIONS
% library(files) is a Quintus Prolog library. To run with other
% prologs replace call to file_exists/l in enumerate/1 with the
% local equivalent.
%
:- ensure_loaded(library(files)).
:- ensure_loaded(lib).
:- ensure_loaded(dg_compile).
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
/***************************************************************************
*
* sr_reduce(+File).
*
/***************************************************************************
*
* sr_recognize/0.
*
* An alternative top level predicate. Assumes a Gaifman dependency
* grammar in saturated reversed full form has already been loaded.
*/
sr_recognize :-
352
(
graminar_present(rlf_sat,_)
I
writelnC’ERROR! Saturated reversed full form DG not loa d e d ’),
abort
),
read_in(Input),
sr_recognize_loop(Input,[]).
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* sr_recognize_loop(+String,+Stack).
*
* The main program loop. Clauses 1 and 2 trap the succeed and fail
* cases. Clause 3 attempts to reduce the stack. If all else fails,
* clause 4 shifts the next word from the input onto the stack.
*/
sr_recognize_loop([.],[TreeRoot]) :-
TreeRoot =.. [Root I_],
root(Root),
writelnC’RECOGNIZED’).
sr_recognize_loop([.],[_])
writelnC’NOT RECOGNIZED’).
sr_recognize_loopCInput,Stack) :- % Reduce
sr_reduce(Stack,Result),
sr_recognize_loop(Input,Result).
sr_recognize_loop([WordI R e s t ] ,Stack) % Shift
word_class(Word,Class),
sr_recognize_loop(Rest,[Class IStack]).
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
*
* sr_reduce(+BeforeStack,-AfterStack).
*
* Perform reductions on BeforeStack as licenced by dependency grammar
* rules in saturated reversed full form.
*/
sr_reduce([],_)
!,
fail.
sr_reduce(Stack,[Head IResult]) :-
appendCStr,Result,Stack),
rff_sat_drule(Head,Str).
353
A .4 S a m p le gra m m a r
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FILENAME: grammarl
(i) *(X)
(ii) X(*)
(iii) X(Y1,Y2,...,Yi,*,Yj...,Yn-l,Yn)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%:%%%
% EXAMPLES
%
7.
% SENTENCE ROOT
%
*(DTV)
*(IV)
♦ (TV)
354
•/ % DEPENDENCY RULES
•/ %
Ü A(*)
EDet(*,N)
DDTV(Det,*,Det,Det)
D DTV(Det,♦,Det,Prep)
IIV(Det,*,Prep)
NN(*)
NN(A,*)
NN(A,A,*)
NN(*,Prep)
NN(A,*,Prep)
NN(A,A,*,Prep)
PPrep(*,Det)
TTV(Det,*,Det)
TTV(Det,*,Det ,Prep)
DTTV: {gave}
355
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% FILENAME: grammarl.deg
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
:- ensure_loaded(lib).
dcg_parsel(Input) :-
phrase(rule_DTV(Tree),Input,[.]),
w rite( ’PARSE SUCCEEDED : ’),
write(Tree),
nl, nl.
dcg_parsel(Input) :-
phrase(rule.IV(Tree),Input,[.]),
write( ’PARSE SUCCEEDED : ’),
write(Tree),
nl, nl.
dcg_parsel(Input) :-
phrase(rule_TV(Tree),Input,[.]),
write( ’PARSE SUCCEEDED : ’),
write(Tree),
nl, nl.
dcg_parse :-
write(’PARSE FAILED’),
nl, nl.
%% RULES
rule_A(X) — > word.A,
{ X =.. C’A ’,*] }.
rule.Det(X) — > word.Det, rule.N(Bl),
{ X =.. [’D e t ’ ,*,B1] }.
rule.DTV(X) — > rule.Det(Al), word.DTV, rule.Det(Bl), rule.Det(B2),
{ X =.. C ’D T V ’,A1,*,B1,B2] >.
rule.DTV(X) — > rule.Det(Al), word.DTV, rule.Det(Bl), rule.Prep(B2),
{ X =.. C’D T V ’,A1,*,B1,B2] >.
rule.IV(X) — > rule.Det(Al), word.IV, rule.Prep(Bl),
{ X =.. C’l V ’,A1,*,B1] }.
rule.N(X) — > word.N,
{ X =. . C’N ’ ,*] }.
356
rule_N(X) — > m l e _ A ( A l ) , word_N,
•C X =. . [»N’ ,A1,*] }.
rule_N(X) — > rule_A(Al), rule_A(A2), word_N,
{ X =.. [»N’ ,A1,A2,*] }.
rule_N(X) — > word_N, rule_Prep(Bl),
{ X =.. [»N’ >.
rule_N(X) — > rule_A(Al), word_N, rule_Prep(Bl),
■C X =. . [’M ’ ,A1,*,B1] >.
rule_N(X) — > nile_A(Al), rule_A(A2), word_N, rule_Prep(Bl),
{ X =.. ['N',A1,A2,*,B1] }.
rule_Prep(X) — > word_Prep, rule_Det(Bl),
•C X =. . [’Prep’ ,*,B1] }.
rule_TV(X) — > rule.Det(Al), word.TV, rule.Det(Bl),
{ X =.. [’T V ’ ,A1,*,B1] }.
rule.TV(X) — > rule.Det(Al), word.IV, rule.Det(Bl), rule.Prep(B2),
{ X =.. [’T V ’,A1,*,B1,B2] }.
357
B ibliography
358
Atwell, E ., T . O ’Donoghue, and C. Souter (1989). T he COMMUNAL
R A P: a probabilistic approach to n atu ral language parsing. Technical
rep o rt, U niversity of Leeds.
Bar-H illel, Y. (1953). A quasi-m athem atical notation for syntactic de
scription. Language, 29: 47-58. Also in: Y. Bar-Hillel (ed) Language and
Inform ation. Reading, Mass.: Addison-Wesley. 61-74.
359
Chomsky, N. (1962). A transformational approach to syntax. In Proceed
ings o f the Third Texas Conference on problems o f linguistic analysis in
English, pages 124-58, Austin.
Chomsky, N. (1981). Lectures on Government and Binding. Foris, Dor
drecht.
Clocks in, W. and C. Mellish (1987). Programming in Prolog. Springer-
Verlag, Berlin, third edition.
Covington, M. A. (1984). Syntactic Theory in the High Middle Ages:
Modistic models of sentence structure. Cambridge University Press, Cam
bridge.
Covington, M. A. (1986). Grammatical theory in the middle ages. In
T. Bynon and F. Palmer, editors. Studies in the History o f Western Lin
guistics. Cambridge University Press, Cambridge.
Covington, M. A. (1988). Parsing variable word order languages with
unification-based dependency grammar. Technical Report ACMC 01-0022,
Advanced Computational Methods Center, University of Georgia.
Covington, M. A. (1990a). A dependency parser for variable word order
languages. Technical Report AI-1990-01, Artificial Intelligence Program,
University of Georgia.
Covington, M. A. (1990b). Parsing discontinuous constituents in depen
dency grammar. Computational Linguistics, 16: 234-6.
Covington, M. A., D. Nute, and A. Vellino (1987). Prolog Programming
in Depth. Scott, Foresman, Glenview, Illinois.
Curry, H. B. and R. Feys (1958). Combinatory Logic, volume 1. North
Holland, Amsterdam.
Dahl, Osten. (1980). Some arguments for higher nodes in syntax: a reply
to Hudson’s ‘Constituency and Dependency’. Linguistics, 18: 485-8.
Danieli, M., F. Ferrara, R. Gemello, and C. RuUent (1987). Integrating
semantics and flexible syntax by exploiting isomorphism between gram
matical and semantical relations. In Proceedings o f the Third Conference
o f the European Chapter o f the Association for Computational Linguistics,
pages 278-83, Copenhagen.
de Groot, A. W. (1949). Structurele Syntaxis. Service, The Hague.
Devos, M., G. Adriaens, and Y. Willems (1988). The Parallel Expert
Parser (PEP): a thoroughly revised descendant of the Word Expert Parser
(W EP). In COLING-88, pages 142-7.
360
Dowty, D. R. (1982). G ram m atical relations and M ontague gram m ar. In
P. Jacobson and G. Pullum , editors, The Nature o f Syntactic Representa
tion. D. Reidel, Dordrecht.
Dowty, D. R. (1988). Type raising, functional com position and non
constituen t conjunction. In R. Oehrle, E. Bach, and D. W heeler, ed
itors, Categorial G ram m ar and Natural language Structures. D. Reidel,
D ordrecht.
Dowty, D. R., R. E. Wall, and S. Peters (1981). Introduction to Montague
Sem antics. D. Reidel, D ordrecht, Holland.
361
Flickinger, D. P. (1987). Lexical rules in the hierarchical lexicon. PhD
thesis, Stanford.
362
Garey, H. B. (1954). Review of Lucien Tesniëre: Esquisse d ’une Syntaxe
S tructurale. Language, 30: 512-13.
G iachin, E. P. and C. R ullent (1989). A parallel parser for spoken n atu ral
language. In IJG AI-89, pages 1537-42, D etroit.
Haddock, N. J. (1987). Increm ental in terp retatio n and com binatory cate
gorial gram m ar. In Proceedings o f the Tenth International Jo in t Confer
ence on Artifical Intelligence, pages 661-3, Milan.
363
Haigh, R., G. Sampson, and E. Atwell (1988). Project APRIL — a
progress report. In Proceedings o f the 26th Annual Meeting o f the A s
sociation for Computational Linguistics, pages 104-112, Buffalo.
364
Hays, D. G. and T. W. Ziehe (1961). Studies in m achine translation 10:
R ussian sentence-structure determ ination. Technical R eport RM-2538,
T he R and Corporation, S anta Monica, Ca.
Hepple, M. (1987). M ethods for parsing com binatory gram m ars and the
spurious am biguity problem. M aster’s thesis. U niversity of Edinburgh.
365
Huddleston, R. D. (1988). English Grammar: An Outline. Cambridge
University Press, Cambridge.
366
Hudson, R. A. (1989c). Towards a com puter-testable W ord G ram m ar of
English. In UCL Working Papers in Linguistics, Volume 1, pages 321-39.
U niversity College London.
367
K acnel’son, S. (1948). 0 gram m aticeskoj kategorii. Vestnik Leningrad-
skogo Universiteta, 2: 114-134.
Kay, M. (1986). A lgorithm schem ata and d a ta stru ctu res in sy n tactic pro
cessing. In B. J. Grosz, K. Sparck Jones, and N. L. W ebber, editors. Read
ings in Natural Language Processing, pages 35-70. M organ K aufm ann, Los
Altos, CA. (F irst appeared in 1980).
K ornai, A. and G. K. Pullum (1990). T he X -bar theory of phrase stru ctu re.
Language, 66: 24-50.
Lam bek, J. (1958). T he m athem atics of sentence stru ctu re. Am erican
M athem atical Monthly, 65: 154-70.
368
Laurie, S. (1893). Lectures on Language and Linguistic Method in the
School Jam es Thin, Edinburgh.
M arslen-W ilson, W. and L. Tyler (1980). T he tem poral stru ctu re of spo
ken language understanding. Cognition, 8: 1-74.
369
M atsunaga, S. and M. Kohda (1988). Linguistic processing using a depen
dency stru ctu re gram m ar for speech recognition and understanding. In
COLING-88, pages 402-7, B udapest.
Nichols, J. (1986). H ead-m arking and dependent-m arking gram m ar. Lan
guage, 62: 56-119.
370
N iederm air, G. T. (1986). Divided and valency-oriented parsing in speech
understanding. In COLING-86, pages 593-5, Bonn.
371
Pollard, C. and I. A. Sag (1988). Information-based Syntax and Semantics.
CSLI Lecture Notes 13. CSLI, Stanford, CA.
372
R. C. Schank and C. K. Riesbeck, editors (1981). Inside Com puter Under
standing: Five programs plus miniatures. Lawrence E arlbaum Associates,
Hillsdale, NJ.
373
sp arck Jones, K. and M. K ay (1973). Linguistics and Inform ation Science.
Academic Press, London.
Steed m an, M. J. (1990). G ram m ar, in terp retatio n , and processing from
the lexicon. In W. M arslen-W ilson, editor. Lexical Representation and
Process. M IT Press, Cam bridge, MA.
374
Tesnière, L. (1959). Élém ents de Syntaxe Structurale, Librairie Klinck-
sieck, Paris.
375
Wilks, Y. (1975). An intelligent analyser and understander of English.
Communications o f the Association fo r Computing Machinery, 18: 264-
74.
376