0% found this document useful (0 votes)

35 views377 pages

Dependency Parsing

This document is the thesis submitted by Norman MacAskill Fraser to the University College London for the degree of PhD in January 1993. The thesis introduces dependency grammars, their formal properties, origins in linguistic theory, and use in parsers for natural language processing. It provides a survey of 12 published dependency parsing algorithms, highlighting similarities and differences between dependency parsing and mainstream phrase structure grammar parsing. The thesis outlines elements of a taxonomy for dependency parsing and includes computer implementations of original and established dependency parsing algorithms in an appendix.

Uploaded by

mahnooryameen321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views377 pages

Dependency Parsing

Uploaded by

mahnooryameen321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 377

D ependency Parsing

N o rm a n M acA sk ill Fraser

Thesis subm itted for the degree of PhD

U niversity College London
January 1993

1
ProQuest Number: 10106699

All rights reserved

INFORMATION TO ALL USERS

The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.

uest.
ProQuest 10106699

Published by ProQuest LLC(2016). Copyright of the Dissertation is held by the Author.

All rights reserved.

This work is protected against unauthorized copying under Title 17, United States Code.
Microform Edition © ProQuest LLC.

ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 48106-1346
A b stract

Syntactic structure can be expressed in terms of either constituency or de

pendency. Constituency relations hold between phrases and their constituent
lexical or phrasal parts. Dependency relations hold between individual words.
Almost all results in formal language theory relate to constituency grammars,
of which the phrase structure grammars are best known. In the realm of natu
ral language description, almost all major linguistic theories express syntactic
structure in terms of constituency. This dominance carries over into natural
language processing, where most parsers are designed to discover the vertical
constituency relations which hold between words and phrases, rather than the
horizontal dependency relations which hold between pairs of words.
This thesis introduces dependency grammars, their formal properties, their
origins in linguistic theory and, particularly, their use in parsers for natural lan
guage processing. A survey of dependency parsers — the most comprehensive
to date — is presented. It includes detailed discussions of twelve published de
pendency parsing algorithms. The survey highlights similarities and differences
between dependency parsing and mainstream phrase structure grammar pars
ing. In particular, it examines the hypotheses that (i) it is possible to construct
a fully functional dependency parser based on an established phrase structure
parsing algorithm without altering any fundamental aspects of the algorithm,
and (ii) it is possible to construct a fully functional dependency parser using
an algorithm which could not be applied without substantial modification in
a fully functional phrase structure parser.
Elements of a taxonomy of dependency parsing are outlined. These include
variables in origin, manner, order, and focus of search, as well as in the number
of passes made during parsing, techniques for the management of ambiguity,
and the use of an adjacency constraint to limit search.
Computer implementations of a number of original dependency parsing
algorithms are presented in an Appendix, together with new implementations
of established algorithms.
C ontents

A cknowledgem ents 13

A bbreviations 15

1 Introduction 16
1.1 Scope of the t h e s i s ...................................................................................16

1.2 C hapter o u tlin e ........................................................................................ 22

2 D ependency grammar 23
2.1 O verview ...................................................................................................... 23

2.2 Gaifman g ra m m a rs .................................................................................. 24

2.2.1 D efinitions.......................................................................................24

2.2.2 A recognizer for Gaifman g ra m m a rs........................................ 30

2.2.3 Representing dependency stru c tu re s.........................................32

2.2.4 The generative capacity of Gaifman g ra m m a rs ..................... 36

2.3 Beyond Gaifman grammars ................................................................. 41

2.4 Origins in Hnguistic th e o r y ................................... 43

2.5 Related gram m atical form alism s...........................................................51

2.5.1 Case g ra m m a r................................................................................ 52

2.5.2 Categorial grammar ................................................................... 53

2.5.3 Head-driven phrase structure g r a m m a r .................................. 57

2.6 S u m m a r y .................................................................................................. 58

3 D ependency parsers 60
3.1 Dependency in com putational lin g u is tic s ...........................................61

3
3.1.1 M achine translation s y s te m s .................................................... 61

3.1.2 Speech understanding s y s te m s ..................................................... 63

3.1.3 O ther a p p lic a tio n s ......................................................................... 64

3.1.4 Im plem entations of t h e o r i e s ........................................................ 64

3.1.5 E xploratory s y s t e m s ...................................................................... 65

3.2 PARS: Parsing A lgorithm R epresentation S c h e m e .............................69

3.2.1 D ata s t r u c t u r e s ................................................................................ 69

3.2.2 E x p r e s s io n s ....................................................................................... 71

3.3 S u n u n a r y ........................................................................................................75

4 T he R A N D parsers 76
4.1 O verview ........................................................................................................... 76

4.2 T h e bottom -up a l g o r i t h m ......................................................................... 78

4.2.1 Basic p r in c ip le s ................................................................................ 78

4.2.2 T he parsing a l g o r i t h m ...................................................................79

4.3 T h e top-dow n a l g o r i t h m .............................................................................85

4.3.1 T he parsing a l g o r i t h m ...................................................................85

4.4 S u m m a r y ........................................................................................................88

5 H ellw ig ’s P L A IN sy stem 90
5.1 O verview ........................................................................................................... 90

5.2 D ependency R epresentation Language ...................................................91

5.2.1 T h e form of DRL e x p r e s s io n s ..................................................... 91

5.2.2 W ord order c o n s tr a in ts ................................................................... 94

5.2.3 T he base l e x i c o n ............................................................................. 96

5.2.4 T h e valency lexicon . ................................................................... 96

5.3 T h e parsing a l g o r i t h m ................................................................................ 98

5.4 T h e well-formed substring t a b l e ............................................................. 102

5.5 S u m m a r y ...................................................................................................... 106

6 T h e K ielikone parser 107
6.1 O verview ...................................................................................................... 107

6.2 Evolution of the p a r s e r ........................................................................... 109

6.2.1 T he earliest version: two way finite a u t o m a t a ..................109

6.2.2 A gram m ar representation language: D P L ............................. 113

6.2.3 C onstraint based gram m ar: F U N D P L ..................................... 115

6.3 T he p a r s e r ................................................................................................... 120

6.3.1 T he g r a m m a r ................................................................................. 120

6.3.2 B lackboard-based c o n tro l............................................................. 121

6.3.3 T he parsing a l g o r i t h m ................................................................ 123

6.3.4 A m b ig u ity ........................................................................................128

6.3.5 Long distance d e p e n d e n c ie s ...................................................... 128

6.3.6 Statistics and p e rfo rm a n c e ..........................................................129

6.3.7 O pen q u e s ti o n s .............................................................................. 130

6.4 S u m m a r y ....................................................................................................132

7 T h e DLT M T sy stem 134

7.1 O verview ....................................................................................................... 134

7.2 D ependency gram m ar in D L T ................................................................ 137

7.3 An ATN for parsing d e p e n d e n c ie s ........................................................140

7.4 A probabilistic dependency p a r s e r ........................................................143

7.5 S u m m a r y ....................................................................................................149

8 L exicase parsers 151

8.1 O verview ....................................................................................................... 151

8.2 Lexicase t h e o r y ..........................................................................................152

8.2.1 D ependency in L e x i c a s e ............................................................. 153

8.2.2 Lexical entries in L e x ic a s e .......................................................... 159

8.3 Lexicase p a rs in g ..........................................................................................164

8.3.1 S tarosta and N om ura’s p a r s e r ................................................... 164

8.3.2 Lindsey’s p a r s e r ............................................................................. 170

8.4 Sum m ary .................................................................................................... 172

9 W ord G ram m ar parsers 174

9.1 O verview ........................................................................................................ 174

9.2 W ord G ram m ar t h e o r y ............................................................................. 175

9.2.1 Facts about words ...................................................................... 175

9.2.2 Generalizations about w o r d s ..................................................... 181

9.2.3 A single-predicate s y s t e m .........................................................186

9.2.4 Syntax in W G ................................................................................ 187

9.2.5 Semantics in Word G r a m m a r .................................................. 191

9.3 W ord G ram m ar parsing ..........................................................................193

9.3.1 F raser’s p a r s e r................................................................................ 194

9.3.2 H udson’s p a r s e r .............................................................................208

9.4 Sum m ary .................................................................................................... 215

10 C o v in g to n ’s parser 217
10.1 O verview ........................................................................................................ 217
10.2 Early dependency g r a m m a r ia n s ............................................................ 217

10.3 Unification-based dependency g r a m m a r ................................................ 218

10.4 C ovington’s p a r s e r ....................................................................................220

10.5 S u m m a r y .....................................................................................................228

11 T h e CSELT la ttice parser 230

11.1 O verview ........................................................................................................ 230

11.2 T he problem: lattice p a r s in g ................................................................... 231

11.3 T he solution: the SYNAPSIS parser ....................................................235

11.3.1 Overview of SYNAPSIS ............................................................ 235

11.3.2 D ependency gram m ar ................................................................238

11.3.3 Caseframes .................................................................................... 240

11.3.4 Knowledge s o u r c e s ...................................................................... 241

6
11.3.5 T he sequential p a r s e r .................................................................. 243

11.3.6 T he parallel p a r s e r ......................................................................249

11.4 S u m m a r y ...................................................................................................251

12 E lem en ts o f a taxon om y o f d ep en d en cy parsing 254

12.1 Search origin ............................................................................................ 254

12.1.1 B ottom -up dependency p a r s i n g .............................................. 256

12.1.2 Top-down dependency p a r s in g ..................................................261

12.1.3 Mixed top-down and bottom -up dependency parsing . . 269

12.2 Search m anner .........................................................................................271

12.3 Search o r d e r ................................................................................................272

12.4 N um ber of p a s s e s ..................................................................................... 275

12.5 Search f o c u s ................................................................................................276

12.5.1 Network n a v ig a tio n ...................................................................... 277

12.5.2 P air s e l e c t i o n ................................................................................ 277

12.5.3 Heads seek d e p e n d e n ts ..............................................................278

12.5.4 D ependents seek h e a d s ..............................................................278

12.5.5 Heads seek dependents ordependents seek h e a d s ........ 279

12.5.6 Heads seek dependents anddependents seek heads . . . . 279

12.5.7 Heads seek dependents thendependents seek heads . . . . 279

12.5.8 D ependents seek heads th en h eads seek dependents . . . . 281

12.6 A m biguity m anagem ent ........................................................................281

12.7 A djacency as a constraint on s e a r c h ....................................................288

12.8 S u m m a r y ................................................................................................... 289

13 C onclusion 292
L ist o f F igures

2.1 stem m a for Sm art people dislike stupid robots................................. 33

2.2 tree diagram (D -m arker) for Sm art people dislike stupid robots . 33

2.3 arc diagram for Sm art people dislike stupid r o b o t s .......................34

2.4 dependency tree for * S m a rt people stupid dislike r o b o ts .............35

2.5 arc diagram for *Smart people stupid dislike ro b o ts.......................35

2.6 D ependency stru ctu re of Old sailors tell tall t a l e s ....................... 36

2.7 F irst phrase stru ctu re analysis of They are racing horces . . . . 39

2.8 Second phrase stru ctu re analysis of They are racing horces . . . 39

2.9 D ependency stru ctu re for They are racing horses. T he sentence

root is racing....................................................................................................40

2.10 syntactic stru ctu re in DG (a) and in H PSG (b) 58

3.1 dependency-based NLP p r o je c ts ............................................................. 68

5.1 stem m a showing a simple dependency s t r u c t u r e ...............................92

5.2 Hellwig’s W EST for Flying planes can be d a n g e r o u s ...................... 104

6.1 a functional dependency s t r u c t u r e ........................................................ 110

6.2 left and right context s t a c k s ...................................................................112

6.3 a D PL definition of S u b j e c t ...................................................................115

6.4 the general form of functional s c h e m a ta .............................................. 117

6.5 a schem a for Finnish transitive v e r b s ..................................................118

6.6 th e binary relation ‘S ubject’ ...................................................................118

6.7 the ‘S y n C at’ c a te g o ry ................................................................................ 119

6.8 architecture of the Kielikone p a r s e r ..................................................... 122

6.9 the Kielikone parser control strategy a u t o m a to n .............................. 126

7.1 the D istributed Language T ranslation system ................................. 137

7.2 dependency analysis of the sentence Whom did you say it was

given t o ? ........................................................................................................139

7.3 th e use of comma in coordinate stru ctu re a n a ly s e s ...........................140

7.4 an ATN for parsing D anish s e n te n c e s .................................................. 142

7.5 an ATN for parsing D anish s u b je c ts ......................................................143

7.6 a dependency link network for the sentence You can remove the

document from the d r a w e r ......................................................................148

8.1 a syntactic stru ctu re w ith em pty nodes 155

8.2 a syntactic stru ctu re w ithout em pty n o d e s ........................................ 155

8.3 a syntactic stru ctu re constrained by the one-bar constraint . . .1 5 8

8.4 a Lexicase syntactic s t r u c t u r e ................................................................ 159

8.5 com ponents of Starosta & N om ura’s Lexicase p a r s e r ....................... 164

8.6 a m aster entry showing the intersection of th e feature sets of

two hom ographie w o rd s .............................................................................171

9.1 dependency stru ctu re of Ollie obeyed R o n n i e ..................................... 177

9.2 p a rt of th e W G ontological h i e r a r c h y .................................................. 181

9.3 p a rt of the W G word type h ie ra rc h y ......................................................182

9.4 p a rt of th e W G gram m atical relation h ie ra rc h y ..................................184

9.5 a W G dependency a n a ly s is ....................................................................... 187

9.6 the use of constituency in W G ................................................................ 188

9.7 a stru c tu re perm itted by W G ’s version of a d ja c e n c y ........................189

9.8 the use of visitor links to bind an extracted elem ent to th e m ain

v e r b ...............................................................................................................189

9.9 the use of the visitor link to relate th e extracted elem ent to the

m ain verb as its o b j e c t .............................................................................190

9.10 the use of visitor links to interpret the object of an em bedded

s e n te n c e ........................................................................................................191

9.11 sem antic stru ctu re is very similar to syntactic stru ctu re in W G . 192

9.12 a prohibited dependency s tr u c tu r e ....................................................... 203

9.13 with a telescope depends on s a w ...........................................................215

9.14 with a telescope depends on the man .................................................215

11.1 a simple lattice for the u ttered words I k n o w ...................................232

11.2 a SYNAPSIS casefram e............................................................................241

11.3 a SYNAPSIS dependency r u l e ..............................................................242

11.4 another SYNAPSIS c a s e fra m e .............................................................. 242

11.5 a SYNAPSIS knowledge s o u r c e ...........................................................242

11.6 a simplified DI showing jolly s l o t s ....................................................... 249

11.7 a single parse t r e e ......................................................................................250

11.8 a distributed representation of th e same parse t r e e .........................251

12.1 PSG and DG analyses of the sentence Tall people sleep in long

beds ...............................................................................................................258

12.2 phrase stru ctu re of A cat sleeps on the c o m p u te r ............................ 263

12.3 dependency stru ctu re of A cat sleeps on the c o m p u te r ..................264

10
List o f T ables

2.1 Subtrees in Figure 2.6 37

2.2 C om plete subtrees in Figure 2 . 6 ...............................................................37

2.3 C om plete subtree labels in Figure 2 . 6 .................................................... 38

2.4 Subtrees and com plete subtrees in the DG analysis of the sen

tence They are racing horses shown in Figure 2.9. Only com

plete subtrees are labelled........................................................................... 40

2.5 C onstituents in th e phrase stru ctu re analysis of the sentence

They are racing horses shown in Figure 2 . 7 ..........................................41

4.1 m ain features of H ays’ bottom -up dependency p a r s e r ...................... 88

4.2 m ain features of H ays’ top-dow n dependency p a r s e r ......................... 89

5.1 m ain features of Hellwig’s dependency p a r s e r ..................................... 106

6.1 m ain features of the Kielikone dependency p a r s e r ........................... 133

7.1 different dependency links retrieved from the B K B ...........................146

7.2 m ain features of the DLT ATN dependency p a r s e r ........................... 150

7.3 m ain features of the DLT probabilistic dependency parser . . . .1 5 0

8.1 m ain features of S tarosta and N om ura’s Lexicase p a r s e r .................173

8.2 m ain features of Lindsey’s Lexicase p a r s e r ........................................ 173

9.1 inheriting properties for w l ....................................................................185

9.2 m ain features of Fraser’s Word G ram m ar p a r s e r .............................. 216

9.3 m ain features of H udson’s W ord G ram m ar p a r s e r ...........................216

11
10.1 m ain features of Covington’s first two dependency parsers . . . . 229

11.1 m ain features of the SYNAPSIS dependency p a r s e r ...................... 253

12.1 origin of search—s u m m a r y .................................................................... 255

12.2 m anner of search—s u m m a r y ................................................................. 272

12.3 order of search—s u m m a r y .................................................................... 273

12.4 num ber of passes—s u m m a r y ................................................................. 276

12.5 focus of search—s u m m a r y .................................................................... 277

12.6 am biguity m anagem ent—s u m m a r y ....................................................282

12
A ck n ow led gem en ts

This thesis m ay bear one nam e on its title page but it represents an in

vestm ent of tim e and effort, of wise advice and honest criticism , of practical

support and unfailing love on the p art of many people. I am grateful to them

all.

F irst m ention m ust go to Dick Hudson, who has been so much m ore th a n

ju st a thesis supervisor. Over the years he has selflessly given me his tim e,

enthusiasm and insight. He has listened patiently to all of my hair-brained

ideas and helped me to have fewer of them . My heartfelt thanks go to him

and to his family, Gay, Lucy and Alice, who have never failed to respond

positively to my all too frequent disruptions of their dom estic hves.

I am very grateful to Neil Sm ith and all members of th e D ep artm en t of

Phonetics and Linguistics a t University College London for supporting me so

well during my tim e in their m idst. Special thanks are due to M ark Huckvale,

Monika Pounder, and a num ber of members of the Word G ram m ar sem inar,

including Billy Clark, John Fletcher, and And Rosta. I have also benefited

enorm ously from the support and encouragem ent I have received as a m em ber

of the Social and C om puter Sciences Research G roup a t the U niversity of Sur

rey. I am grateful to all members of the group, and especially to Nigel G ilbert

for enabling me to fit thesis-w riting into a hectic research schedule, and to

Scott M cG lashan for his expert assistance w ith the HTjgX/ ty p esettin g pack

age. I have gained much from discussions w ith other people a t th e U niversity

of Surrey, particularly G rev C o rb ett and Ron K nott.

T he finishing touches were added while I was a m em ber of th e Speech

and Language Division of Logica Cam bridge Ltd. I am grateful to Jerem y

Peckham for his persistent belief in the value of NLP research and for his

practical support, and to Nick Youd, Simon T hornton, Trevor Thom as and

Ave W rigley for daily stim ulus.

13
A significant portion of this thesis is devoted to dissecting other people’s

dependency parsers. I would not have been able to do so w ithout the help of

those individuals who m ade otherwise unobtainable inform ation available to

me. M any of them have read drafts of parts of the thesis, and their comments

have been invaluable. They include Doug Arnold, Paulo Baggia, M ichael Cov

ington, P e te r Hell wig, G erhard Niederm air, Claudio R ullent, Klaus Schubert,

Stan S tarosta and Job van Zuijlen.

I have lost track of the num ber of friends and relations who have helped

me by providing practical support, by telling me to get on w ith it, and by

making me laugh. T he generous gift of Ian and M air B unting, who provided

the perfect retreat in which to work w ithout fear of interruption, has hastened

the com pletion of this thesis by an enormous am ount. Likewise, th e practical

support of Jim and Rilla Cannon, whose hospitality knows no bounds. My

family have provided the sort of long-distance support which always feels close

a t hand.

M ost of all, I want to thank Sarah for p u ttin g up w ith my no ctu rn al w riting

habits, for believing th a t I really would finish this thing, and for being my

friend.

T hank you very much.

14
A b b reviation s

APSG augm ented phrase stru ctu re gram m ar

ATN augm ented transition network
BFP best fit principle
COG com binatory categorial gram m ar
CD conceptual dependency
C FPSG context-free phrase stru ctu re gram m ar
CG categorial gram m ar
CNF Chom sky norm al form
DCG definite clause gram m ar
DDG daughter dependency gram m ar
DG dependency gram m ar
DUG dependency unification gram m ar
FUG functional unification gram m ar
GB governm ent-binding theory
GPS G generalized phrase stru ctu re gram m ar
H PSG head-driven phrase stru ctu re gram m ar
ID im m ediate dom inance
LFG lexical-functional gram m ar
LP linear precedence
MT m achine translation
NLP n a tu ral language processing
PSG phrase stru ctu re gram m ar
TAG tree-adjoining gram m ar
UCG unification categorial gram m ar
W FST well-formed substring table

15
C h ap ter 1

In tro d u ctio n

T he intuitive appeals of the two theories cannot be discussed, since

intuitions are personal and irrational. (Hays 1964: 522)

1.1 S c o p e o f th e th e sis

There are, in contem porary linguistic theory, two different views of gram m at

ical relations. T he first of these sees relations of gram m atical dependency as

basic: syntactic structures are essentially networks of gram m atically related

entities. T he second view denies gram m atical relations basic statu s, instead

seeing them as being derived from more fundam ental stru ctu res, such as con

stitu e n t structures. This la tte r view has predom inated th roughout most of

this century, first in I m m e d ia te C o n s ti t u e n t (IC) analysis (Bloomfield 1914,

1933), and later, from the mid-1950s onwards, in P h r a s e S t r u c t u r e G r a m

m a r (PSG ) (Chom sky 1957).

T he dom ination of constituency-based approaches has not been limited

to theoretical linguistics. In com putational linguistics also, th e overwhelm

ing m ajo rity of proposals which posit a distinct syntactic layer assum e th a t

th a t layer is based on constituent stru ctu re ra th e r th a n dependency structure.

This asym m etry can not legitim ately be a ttrib u te d to any established results

showing the superiority of one system over th e o ther in respect of descriptive

adequacy, or any other substantive function: no such results exist. However,

this is not to say th a t the asym m etry is inexplicable. A lthough th e notion of

16
gram m atical dependency is alm ost as old as the study of gram m ar, it has, for

m ost of its existence rem ained ju st th at: a notion.

T he first rigorous form alization of a dependency gram m ar (DG) cam e ju st

over th irty years ago (see Gaifm an 1965), a few years after th e first form aliza

tion of the class of PSGs (Chomsky 1956). By the tim e th e formal definition of

a DG was published in a wide circulation journal, the corresponding definitions

of PSG had been in the public dom ain for a decade, w ith large in ternational

program m es of research in formal language theory and theoretical linguistics

building on a PSG foundation. DG as an expHcitly articulated system thus

entered an arena in which PSG was already well-established. Given th a t the

earliest published formal accounts of DG established its equivalence (weak and

strong) w ith context-free PSG (C FPSG )^, there was little incentive to ab an

don th e now fam iliar and w ell-understood formalism in favour of the unfam iliar

and com paratively less-well understood formalism.

A rem arkable situation now obtains. Formal work in DG is virtually frozen

in the sta te it was in around the mid-1960s, w ith only a handful of groups

around the world making any (m odest) advances since then (hardly any of

which has ever been published in English). In contrast, a much larger —

though still m odest by PSG standards — num ber of theoretical linguists con

tinues to assum e some version of DG as the foundation of syntactic stru c

ture. U nfortunately, alm ost all linguistic theories based on DG have departed

to some extent from the terra firm a of formal definition.^ Since th e choice

of DG as basic is a m inority preference, those making the choice have gone

to some lengths to argue the case for DG rath er th an PSG (for example,

H udson 1984: 92-8, forthcoming; S tarosta 1988: 35-6). T he opposite is gen

erally not found: proponents of theories based on PSG do not typically support

th e choice of PSG w ith argum ents for the superiority of PSG over DG (but

^Given a definition of equivalence to be described in Chapter 2 below.

^The passing allusion to Pullum’s (1985) iconoclastic paper ‘Assuming some version of
the X-bar theory’ is thus intentional.

17
see the d ebate in H udson 1980a; D ahl 1980; H udson 1980b; H ietaran ta 1981;

and H udson 1981b for some responses to argum ents against PSG ).

T he principal argum ent offered by proponents of DG is th a t PSG ap

proaches introduce a redundant layer of structure. Lexical-Functional G ram

m ar (LFG ) offers a particularly clear illustration of this, w ith its c-structure

(constituent structure) and separate f-structure (functional stru ctu re), th e la t

te r being constructed by reference to th e former (K aplan and B resnan 1982).

In a DG approach a single stru ctu re suffices. T he position adopted by m any

advocates of PSG is th a t it is unnecessary, not to say impossible, to argue

against moving targets such as the underform alized versions of DG on offer.

This is to present the issues as being neatly polarized. In fact, m ost lin

guists nowadays work w ith hybrid systems which express b o th dependency

and constituency in a single stru ctu re, albeit one which owes more to the

PSG trad itio n th an to th e DG tradition. T h e m ost w idespread exam ple is

X gram m ar (originally proposed by Harris 1951) which augm ents a C FPSG

by distinguishing one elem ent in each constituent as th e h e a d of th a t con

stitu e n t. However, th ere are com plications here since a num ber of syntactic

theories have been charged w ith uncritically adopting unform alized versions of

X theory (P ullum 1985; K ornai and P ullum 1990) — th e very charge laid a t

th e door of certain DG theories!

T he general paucity of formal results concerning DG carries over from

theoretical to com putational linguistics. Here DG is scarcely m entioned, far

less argued against. In the sm all num ber of cases in which it achieves passing

m ention, the sam e reasons for not using DG are employed: first, th e only

existing formal results show th e equivalence of DG and C FPS G so th e re is

no incentive to work w ith the less fam iliar system; second, alm ost n othing

else is known form ally ab o u t DG so until such tim e as additional solid results

becom e available there is no incentive to invest effort in trying to w ork w ithin

th a t framework.

18
Let us consider these points in turn. F irst, then, the equivalence of DG

and C FPSG . In their m onograph Linguistics and Inform ation Science Sparck

Jones and K ay provide a brief introduction to DG and th en furnish an account

for why DG is not m entioned again;

We have p u t phrase stru ctu re and dependency together in th e sam e

class because it is easy to show th a t the differences between th em

are trivial from alm ost every point of view (see G aifm an 1965).

It is also possible to w rite gram m atical rules in a suitable no

ta tio n which describes a single language and which assigns to

each sentence of th a t language both phrase-structure and depen

dency trees (see K ay 1965; Robinson 1967). In this p ap er we shall

m ake no further references to dependency gram m ar, intending w hat

we say ab o u t phrase-structure gram m ar to be understood as ap

plying also to dependency w ith occasional m inor m odifications”

(Sparck Jones and Kay 1973: 83-4).

Sparck Jones and K ay’s observation th a t it is possible to devise a m eta

form alism which includes bo th dependency and constituency inform ation is

useful from a descriptive point of view. However, th e point it misses is th a t the

equivalence of the formalisms or the possibility of devising a m eta-form alism

leaves open th e question of w hether phrase stru c tu re parsing and dependency

parsing can be achieved by means of identical algorithm s. This is a question

which has hardly ever been raised in the literature. H ays’ claim th a t “a phrase-

stru c tu re parser can be converted into a dependency parser w ith only a m inor

a lteratio n ” (Hays 1966b: 79) is presented w ithout argum ent or illu stratio n so

its statu s is, a t best, uncertain. A sem inal text in com puter science bears the

title Algorithm s + Data Structures = Programs (W irth 1975). It is well u nder

stood th a t a change in d a ta stru ctu re may necessitate a change in algorithm

if th e net effects of the program are to rem ain co n stan t. “T h e developm ent of

th e algorithm ...is intim ately linked to the choice of an ap p ro p riate d a ta stru c

19
tu re ” (Goldschlager and Lister 1982: 65). Thus it cannot be taken for granted

a priori th a t fam iliar phrase stru ctu re parsing algorithm s will m ap effortlessly

into the dependency parsing domain.

T he second criticism of DG in com putational linguistics is th a t where DG

has been employed, for example in parsing system s, th e resulting systems

have not been constructed on a principled or even well-defined foundation.

W inograd writes:

T he formal theory of dependency gram m ar has em phasized ways

of describing structures rath er th an how th e system ’s perm anent

knowledge is structured or how a sentence is processed. It does not

address in a system atic way the problem of finding th e correct de

pendency stru ctu re for a given sequence of words. In system s th a t

use dependency as a way of characterizing stru ctu re, th e parsing

process is generally of an ad hoc n atu re (W inograd 1983: 75).

Once again, this claim is presented w ithout fu rth er argum ent or evidence.

T he absence of empirical d a ta which characterizes these claims is not as

surprising as it m ight first seem when it is understood th a t the num ber of

dependency parsing systems in existence is severely lim ited in com parison with

th e num ber of phrase stru ctu re parsing system s. It is also th e case th a t those

descriptions of dependency parsing system s which have appeared in p rin t have,

on th e whole, been published in relatively obscure sources or have only been

circulated privately. Some accounts have been terse to th e point of leaving most

of the detail unreported. No survey or com parative account of dependency

parsers is currently in existence.

One of th e chief objectives of this thesis is to fill this gap in th e literatu re

by presenting an extensive survey of existing dependency parsing system s, the

first such survey to be prepared.

T he availability of this survey m aterial presents a unique op p o rtu n ity to

consider from a base of empirical fact how th e parsing algorithm s employed

20
in dependency parsing compare w ith those which are widely used and well-

understood in phrase stru ctu re parsing. This stu d y focuses on two hypotheses:

H y p o th esis 1
It is possible to construct a fully functional dependency parser

based directly on an established phrase stru ctu re parsing algorithm

w ithout altering any fundam ental aspects of the algorithm .

This hypothesis is a strong version of H ays’ (1966b: 79) claim. It is m otivated

by G aifm an’s definition of strong equivalence between DG and PSG which

guarantees some m easure of stru ctu ral correspondence a t each point in the DG

and PSG parse trees (see C hapter 2 below). However, it is not the strongest

possible hypothesis, since it stops short of predicting th a t a dependency parser

can be constructed based on any phrase stru ctu re parsing algorithm .

H y p o th esis 2
It is possible to construct a fully functional dependency parser using

an algorithm which could not be used w ithout su b stan tial modifi

cation in a fully functional conventional phrase stru ctu re parser.

This hypothesis is m otivated by an appreciation of the particular way in which

DG rules encode inform ation, as com pared w ith the way in which PSG rules

encode inform ation.

As I have previously noted, m ost linguistically m otivated DGs have pro

ceeded beyond th e limits of w hat has been defined in a m athem atically rigor

ous way. It is impossible to undertake a survey of dependency parsing systems

w ithout encountering some of these devices of unknown form al power. W hile

noting in passing these extensions where relevant, I shall concentrate my an al

ysis on the parsing of the context free backbone of these theories (i.e. th a t

which can be m apped onto a Gaifm an gram m ar). I shall not be concerned

in this thesis to make any qualitative judgem ents between DG and PSG qua

descriptive devices.

21
1.2 C h a p te r o u tlin e

W hat follows divides conceptually into three parts,

1. C hapter 2 introduces dependency gram m ar. It presents a form al account

of DG and outlines the equivalence relation used to com pare DG w ith

PSG. The developm ent of DG from its origins in th e classical world

through to the present day are charted in th e la tte r p art of the chapter.

2. C hapters 3 to 11 present th e most detailed review and critique of de

pendency parsers yet assembled. C hapter 3 describes th e grow th of the

use of DG in com putational systems for n a tu ral language processing.

C hapters 4 to 11 are each devoted to the description and evaluation of

a different dependency parser or closely related family of dependency

parsers. T he chapters are arranged in approxim ate chronological order;

the oldest parser is presented first and the m ost recent parser is presented

last. Needless to say, the developm ent phases of some parsers overlapped

so the ordering of chapters m ust be regarded as no m ore th a n a rough

guide to the relative age of th e systems reported therein.

3. Finally, draw ing heavily on the preceding analyses of existing depen

dency parsers. C hapter 12 sets out some elem ents of a first taxonom y of

dependency parsing, defines some technical vocabulary for th e field and

specifies the range of relevant variables. T he two hypotheses stated above

are exam ined in C hapter 13 in light of th e survey of existing dependency

parsing algorithm s.

22
C h ap ter 2

D e p en d en cy gram m ar

“It all depends.”

C.E.M . Joad,
BBC Radio ‘Brains T ru st’,
1942-1948

2.1 O v e r v ie w

Before proceeding w ith a survey of parsing systems based on DG it is necessary

to be clear ab o u t exactly w hat a DG is. One of the dangers when working

w ith a notion like gram m atical dependency is th a t it can come to m ean all

things to all people. T he purpose of this chapter is therefore to furnish an

unam biguous definition of DG, to introduce some terminology, and to review

where system s approxim ating to this definition of DG have been employed in

theoretical linguistics.

Section 2.2 introduces G aifm an gram m ars, the only version of DG to be

defined w ith full m athem atical rigour. Accordingly, these system s are taken as

a stable reference point in this thesis. T he formal properties of G aifm an gram

m ars are defined, together w ith a decision procedure for determ ining w hether

or not a given string is accepted or rejected by an a rb itra ry G aifm an gram m ar.

A lternative conventions for portraying dependency structures diagram m ati-

cally are introduced. A lthough there is insufficient space here to reproduce

th e ra th e r lengthy proof which establishes the strong equivalence of DG and

23
PSG , the equivalence relation employed is described and scrutinized.

In practice, very few — if any — linguists have used G aifm an’s system

in the description of n atu ral language w ithout making use of various aug

m entations of unknown formal power. These augm entations are flagged in

Section 2.3. Those which m ust necessarily be exam ined in th e course of the

survey of dependency parsing systems are described in greater detail in later

chapters. Section 2.4 charts the origins and developm ent of DG in linguistic

theory.

In Section 2.5, three gram m atical formalisms bearing some similarities to

DG are identified, namely Case G ram m ar, C ategorial G ram m ar, and Head-

D riven P hrase S tructure G ram m ar. A lthough a full description of these fram e

works is not appropriate here, their basic concepts are introduced and some

reasons for excluding them from this study are provided.

2.2 G a ifm a n g ra m m a rs
2 .2 .1 D e fin itio n s

T he first formal definition of DG was offered by H aim G aifm an (1965). In this

section, I present his definition along w ith illustrative examples.^

D efinition

A d e p e n d e n c y g r a m m a r A is a 5-tuple

A = (T,C,X,7e,^)

where

1. T is a finite set of word symbols, i.e. th e term inal symbols. For th e p u r

poses of exposition, the letters u, v, w, x, y, z, w ith or w ithout subscripts,

will denote members of this set.

^In this re-presentation of Gaifman’s (1965) work, the logic and substance of his definition
is maintained but the manner of exposition has been altered to render the material more
transparent.

24
2. C is a finite set of category symbols. For the purposes of exposition,

the letters U, V, W, X, Y, Z, w ith or w ithout subscripts, will denote

members of this set.

3. ^ is a set of assignm ent rules, whose elements are all members of T x C.

Every word belongs to a t least one category and every category m ust

have a t least one word assigned to it. A word m ay be assigned to more

th a n one category.

4. 7^ is a set of rules which give for each category th e set of categories

which m ay derive directly from it w ith their relative positions. For each

category X , there is a finite num ber of rules of the form

(where Yi to Yn are members of C) indicating th a t Yi • • • may de

pend on X in th e order given, where marks th e position of X in

the sequence. A rule of the form X{*) allows X to occur w ithout any

dependents.

5. ^ is a subset of C whose members are those categories which m ay govern

a sentence, i.e. the s ta rt symbols.

E x a m p l e

A i is an exam ple of a dependency gram m ar, where A i = ({people, robots,

dislike, sm art, stupid} , {N, V, A ) , {(people, N), (robots, N), (dislike, V),

(sm art, A), (stupid. A) } , {N(*), N(A ,*), V(N,*,N), A(*) }, {V} ).

C o n v e n t io n

By convention, the fact th a t some % is a m em ber of Q may be indicated

thus: *(% ).

25
Following this convention, G of A i may be represented as *(V).

C o n v e n t io n

By convention, A m ay be represented as follows: for each d istinct category

X m C create a correspondence of the form X : L where L is th e set of all

words X such th a t ( t , X ) is in A.

Thus, A of A i m ay be represented as {N:{people, robots}, V:{dislike} ,

A :{sm art, stupid}}.

C o n v e n t io n

To improve readability, a gram m ar of type A may be represented by w riting

each m em ber of ^ on a line by itself, followed by each m em ber of 7?. on a line by

itself, followed by each member of ^4 on a line by itself. T and C are im plicitly

defined in A .

Thus, A i m ay be represented as follows:

*(V)
N(*)
N(A,*)
V(N,*,N)
A(*)
N:{people, robots}
V:{dishke}
A :{sm art, stupid}

T he next definition elucidates the relationship betw een sentences of a lan

guage A and the gram m ar of type A which generates A.

In this definition it is necessary to make reference to occurrences of words

or categories in a sequence. An occurrence is an ordered pair (a:,z), w here x is

the word or category and i is the position num ber of x in th e sequence. P , Q

and P , w ith or w ithout subscripts denote occurrences of words or categories.

If P = (AT, z) th en P (P ), the sequence num ber of P , is defined to be z; P is

26
said to be of category X .

D EFINITIO N

A s e n te n c e XiX 2 • • • is analyzed by a gram m ar of type A iff th e following

are true:

1. A sequence of categories X 1 X 2 ' ' ’ Xm can be formed such th a t Xi is of

category Ai for 1 <i<m.

2. A 2-place relation d can be established between pairs of words in X1 X2 • - • Xj

P dQ signifies the fact th a t P depends on Q, i.e. th e relation d holds be

tween P and Q.

For every d we define another relation d* where Pd*Q iff there is a se

quence Pq, Pi " ■Pn such th a t Pq = P^ P^ = Q and PidPi^i for every

0 < 2 < n — 1.

T he relation d is constrained in th e following ways:

(a) For no P , Pd*P.

(b) For every P , there is a t most one Q such th a t PdQ .

6"(P) < 5'(Q) or 5"(P) > 5 '(P ) > S (Q )), then Pd-'Q.

(d) T he whole set of occurrences is connected by the relation d.

3. If P is an occurrence of Xj and if th e occurrences th a t depend on

it are P i ,P 2 ---Pn, also, if Ph is an occurrence of where h =

1 ’ "71, and the order in which these words occur in the sentence is

5 5 ‘ 5 "" ’ 1 ) then A j(A jj • • • X{^ * " ' ' ^ i n ) is a

rule of R. In the case th a t no occurrence depends on P, A j(*) is a rule

of R.

27
4. T he occurrence which governs th e sentence (i.e. which depends on no

other occurrence) is an occurrence of a word whose category is a m em ber

OÎÇ.

T he stru ctu re corresponding to a sentence of a language generated by a

gram m ar of type A is called a dependency tree.

D efinition

A d e p e n d e n c y tr e e for a sentence Xi - - • Xn consists of th e string of cate

gories X i ’ ' ' X nt together w ith the relation d.

D EFINITION

A la n g u a g e is weakly generated by a dependency gram m ar iff for every

sentence in th a t language there is a corresponding dependency tree and no

dependency tree exists for a sequence of words which is not a sentence. A lan

guage is strongly generated by a dependency gram m ar iff it is weakly generated

by th a t dependency gram m ar and, for every syntactically correct in te rp re ta

tion, and only for these, there are corresponding dependency trees.

T he above definitions can be sum m arized inform ally as follows. In the

stru c tu re corresponding to a sentence of a language generated by a dependency

gram m ar of type A:

1. one and only one occurrence is independent (i.e. does not depend on any

other);

2. all other occurrences depend on some element;

3. no occurrence depends on more th a n one other; and

4. if A depends directly on B and some occurrence C intervenes betw een

them (in linear order of string), th en C depends directly on A or on B

or on some other intervening elem ent (Robinson 1970: 260).

28
To aid discussion, I shall adopt the following terminology. All occurrences of

words in a sentence shall be called w o rd s. W here the intention is to refer

to words in the lexicon, this will be stated explicitly. T he single independent

word in a sequence (i.e. the word which depends on no other) shall be called

the r o o t. One word W i is said to be a s u b o r d in a te of another word W 2

if W i depends on W 2 or on another subordinate of W 2 , i.e. W i depends di

rectly or indirectly on W 2 . T he word on which another word depends shall be

called its h e a d . T he requirem ent th a t a head-dependent pair either be next to

each other or separated by direct or indirect dependents of themselves (point

4 above) is known as the a d ja c e n c y c o n s tr a in t.

E x a m p l e

Given these definitions, the sentences in (1) belong to the language defined

by A i, whereas th e sequences in (2) are outside of th a t language. (By conven

tion, sequences which are not well-formed in respect of a particu lar gram m ar

are prefixed by ‘*’).

(1) a People dislike robots.

b Stupid people dislike sm art robots,
c Sm art robots dislike people,
d People dishke sm art people.

(2) a * Sm art people dislike,

b *Stupid dislike robots,
c *Stupid robots.
d * R obots people dislike,
e * R obots sm art dislike people.

E xam ple (2a) is ill-formed because dislike is a V, and Vs require two depen

dents, one preceding and one following. In this case, no following dependent is

present. Exam ple (2b) is ill-formed because all of the words are not connected

together by dependency. T he sequence is divided into two parts: stupid (which

requires a head) and dislike robots (which requires a preceding dependent of

29
category N for dislike). None of the words in (2c) is missing a dependent. How

ever, the independent word robots is of category N, b u t only words of category

V m ay govern a sentence. In example (2d), none of th e words is missing a de

pendent and the independent element dislike belongs to th e required category

V. However, the dependents of V are required to occur one on either side of

V, whereas here they bo th occur before it. Exam ple (2e) is ill-formed because

of the inappropriate position of smart. E ither it is a dependent of robots, in

which case it should precede th a t word, or it is a dependent of people. If it is

a dependent of people then it precedes it as it ought, b u t sm art and people are

separated by the word dislike, which is dependent on neither.

I shall henceforth refer to dependency gram m ars of type A as G a ifm a n

G ra m m a rs.

2*2.2 A r e c o g n iz e r for G a ifm a n g r a m m a r s

So far, I have characterized G aifm an gram m ars in term s of constraints on the

well-formedness of gram m ar rules and dependency structures. In this section

I describe a decision procedure — a re c o g n iz e r — which accepts all and only

th e well-formed strings of the language described by a G aifm an gram m ar. T he

recognizer is based on one described by Hays (1964: 516-17).

T he principal d a ta stru ctu re used by the recognizer is a table. To determ ine

w hether or not a string is generated by a G aifm an gram m ar A proceed as

follows:

1. S tartin g from 1, and counting upwards in units of 1, assign an integer to

each word in th e string, working from left to right. T he integer assigned

to a word shall be known as th e position of th a t word. Let M a x equal

the position of the rightm ost word.

2. Set up a table, having M a x positions, num bered from 1 to M a x . A cell

[a,h] shall occupy all the positions from Pa to Pb, where 1 < a < 6 <

M ax.

30
3. For each word Wi in the string retrieve all th e classes X i to assigned

to th a t word by assignm ent rules of the form W : { X i , X ^ } in A. If

Pi is the position of VFj, write Xi to Xn in th e table at cell

4. For each word class X a t cell [j, j] in the table (1 < j < M a x ) determ ine

w hether a rule of the form % (*) exists in A. If so, insert %(*) in the

table a t cell

5. Let y be a variable. Set V = 2.

6. Consider each sequence of V adjacent cells in the table. For each se

quence which consists of exactly one word class symbol X and V -1 trees,

arranged in th e order

Fi ) X^ Yj , . . . , Y y —1

search in A j for a corresponding rule of the form:

..., Zi^ " " I Z y —\ )

If th e root of each tree Yn in th e table is identical to each dependent

Zn in the gram m ar rule then if Y\ is located a t cell and

Y y - i is located a t cell [FV-i,c/t, insert a new tree in th e table

occupying cell FV-i^ight]- T h e form of th e new tree should be as

follows:

^ ( T i , F 2 , ..., F^, *, F j , ..., FV -i)

7. If y = M a x then go to step 8, otherw ise increm ent V and go to step 6.

8. If a tree exists in the table occupying cell [ l,M a x ] th en succeed if th e

root of the tree is of type X and a rule of th e form *(A') exists in A.

O therw ise fail.

31
Hays presents his algorithm informally, so it has been necessary to recon

stru c t some of the details in the above account.

A Prolog im plem entation of this recognition algorithm can be found in the

file h a y s _ r e c o g n iz e r .p l in A ppendix A.3.

Hays also outlines a generative procedure for enum erating all the strings

generated by a G aifm an gram m ar (Hays 1964; 514-15). A Prolog implemen

ta tio n of a reconstructed version of H ays’ procedure can be found in the file

h a y s _ g e n e r a to r .p l in A ppendix A.3.

2 .2 .3 R e p r e s e n tin g d e p e n d e n c y s t r u c tu r e s

T here are a t least three conventions for presenting dependency structures di-

agram m atically: stemmas^ tree diagrams and arc diagrams.

T he first representational scheme — due to Tesniere (1959) — presents

words as nodes in a graph which is known as a s te m m a (see Figure 2.1,

for exam ple). Dependencies between word occurrences are signalled by links

betw een nodes. By convention, heads are located nearer the top of th e diagram

th an th eir dependents. T he first occurrence in a sentence is positioned furthest

to th e left in a diagram and the n th occurrence appears to the right of the

n - l t h occurrence and to the left of th e n + l t h occurrence. For simplicity,

category labels are usually om itted from diagram s of all types.

A lthough stem m as contain th e appropriate am ount of inform ation, they

can som etim es prove to be difficult to read, especially when th e sentences

represented are long and involve a lot of alternation betw een left-pointing and

right-pointing dependencies.

In th e second type of diagram , exemplified in Figure 2.2, dependency is

represented by the relative vertical position of nodes in a tree; if a line connects

a lower node to a higher node then th e sym bol corresponding to th e lower node

depends on th e one corresponding to th e higher node. I shall call diagram s of

this kind t r e e d ia g ra m s . They are also known as D - m a r k e r s .

T he th ird diagram m atic convention represents dependency relations by

32
dislike

robots

sm art stupid

Figure 2.1: stem m a for Sm art people dislike stupid robots

sm art people dislike stu p id robots

Figure 2,2: tree diagram (D-m arker) for Sm art people dislike stupid robots

33
'V '

Sm art people dislike stupid robots

Figure 2.3: arc diagram for Sm a rt people dislike stupid robots

m eans of directed arcs. I shall adopt the convention of directing arcs from

heads to dependents, although (unfortunately) there is no generally accepted

convention and it is not unusual to find examples in th e literatu re of arcs being

oppositely directed. I shall refer to diagram s of this kind as a rc d ia g ra m s .

Figure 2.3 is equivalent to Figures 2.1 and 2.2 in th e inform ation it expresses.

Some authors (such as M atthew s 1981) draw arc diagram s w ith the arcs

below the symbols in the sentence rath er th an above them as shown here.

H udson sometimes divides the arcs so th a t those having a designated func

tion appear below th e sentence symbols, whilst th e rest appear above them

(H udson 1988b: 202; page 189 below).

T he adjacency constraint is satisfied in the sentence Sm art people dislike

stupid robots^ as can be seen in th e dependency stru ctu re variously represented

in Figures 2.1, 2.2 and 2.3. T he constraint is violated in th e dependency

stru c tu re shown in Figure 2.4.

In Figure 2.4, S tu p id violates the constraint, stu p id is separated from its

head robots by d islik e which depends on neither stu p id nor robots^ neither

is it a subordinate of stu p id nor robot. In a tree diagram , the do tted line

which connects a word w ith its node is called its p r o je c tio n . N ote th a t in

Figure 2.2, links and projections do not intersect. Such tree diagram s and their

corresponding syntactic structures are said to be p r o je c tiv e . In Figure 2.4 a

link and a projection intersect a t precisely th e point where ill-formedness was

detected. D iagram s like Figure 2.4, and the corresponding syntactic structures

34
sm art people stupid dislike robots

Figure 2.4: dependency tree for * S m a rt people stupid dislike robots

Sm art people stupid dislike robots

Figure 2.5: arc diagram for *Sm art people stupid dislike robots

are said to be n o n -p r o je c tiv e .

T he vocabulary of projectivity is rooted in th e im agery of tree diagram s.

I shall henceforth make use of the more neutral term s a d ja c e n t and n o n -

a d ja c e n t.

T he arc diagram corresponding to Figure 2,4 is shown in Figure 2.5. Notice

th a t arcs never cross in arc diagram s of structures which satisfy th e adjacency

constraint, whereas arcs do cross where the structures violate th e adjacency

constraint. (T he only exception to this generalization is discussed below).

In general, I shall use arc diagram s to represent dependency structures;

when describing a particular dependency system reported in the literatu re I

shall use the representation norm ally employed by proponents of th a t system .

35
Old sailors tell tall tales

Figure 2.6: D ependency stru ctu re of Old sailors tell tall tales

2 .2 .4 T h e g e n e r a tiv e c a p a c ity o f G a ifm a n g r a m m a r s

As well as providing a formally explicit definition of one class of DG, Gaifman

went on to investigate the generative capacity of the class. He did this by

com paring his DG w ith phrase stru ctu re gram m ar.

He concluded th a t for every DG there is a strongly equivalent CFPSG

and for a subclass of C FPSG s (in which every phrase is a projection of a

lexical category) th ere is a strongly equivalent DG. His proof is too lengthy

to reproduce here; it can be found in G aifm an (1965). Definitions of strong

equivalence between th e two systems can be found in Hays (1961b) and in

G aifm an (1965: 320-25).

Let a s u b tr e e be a connected subset of a dependency tree. (This is w hat

Pickering and B arry (1991) have recently called a ‘dependency co n stitu en t’.)

Let a c o m p le te s u b tr e e consist of some elem ent of a tree, plus a ll other

elem ents directly or indirectly dependent on it. T hus, th e dependency tree

in Figure 2.6 includes the subtrees shown in Table 2.1. O f these, only those

shown in Table 2.2 are com plete subtrees.

A phrase stru c tu re and a dependency structure, b o th defined over the same

string, c o r r e s p o n d r e la tio n a lly if every constituent is coextensive w ith a

subtree and every com plete subtree is coextensive w ith a co n stitu en t. Two

stru ctu ral entities are c o e x te n s iv e if they refer to exactly th e sam e elements

in a string.

Let each subtree have a la b e l, where th e label is th a t word in th e subtree

36
Old
Old sailors
Old sailors tell
Old sailors tell tall tales
sailors
sailors tell
tell
tell tall tales
tell tales
tall
tall tales
tales

Table 2.1: Subtrees in Figure 2.6

Old
Old sailors
Old sailors tell tall tales
tall
tall tales

Table 2.2: Com plete subtrees in Figure 2.6

37
LABEL SUBTREE
Old Old
sailors Old sailors
tell Old sailors tell tall tales
tall tall
tales tall tales

Table 2.3: Com plete subtree labels in Figure 2.6

which depends on no other word in the same subtree. Labels for th e com plete

subtrees of the dependency tree shown in Figure 2.6 are given in Table 2.3.

Let each phrasal constituent in a PSG also have a label, where the label

is conventionally understood (for example, th e label of a noun phrase is often

given as ‘N P ’, etc).^

In dependency theory, a string is said to d e riv e fr o m th e label of th e

corresponding com plete subtree. In phrase stru ctu re theory, a string is said to

d e r iv e fro m th e label of the corresponding constituent. A label a c c o u n ts fo r

th e set of strings th a t derive from it. Two labels are s u b s ta n ti v e ly e q u iv a

l e n t if they account for the sam e set of strings.

A phrase stru c tu re and a dependency stru c tu re c o r r e s p o n d if (i) th ey

correspond relationally and (ii) every com plete subtree has a label which is

substantively equivalent to the label of th e coextensive constituent.

A DG is s tr o n g ly e q u iv a le n t to a PSG if (i) th ey have th e sam e te r

m inal alphabet, and (ii) for every string over th a t alp h ab et, every stru c tu re

a ttrib u te d by either gram m ar corresponds to a stru ctu re a ttrib u te d by th e

other.

Let us consider, by way of example, th e am biguous sentence (3), th e two

phrase stru c tu re interpretations of which are shown in Figures 2.7 and 2.8.

T h e linguistic plausibility of these analyses is not an issue here.)

(3) T hey are racing horces.

^All subtree and phrase labels must be unique within each sentence. If necessary this
can be effected by providing labels with unique integer subscripts.

38
p VP

AuxP V P

Aux

They are racing horses

Figure 2.7: F irst phrase stru ctu re analysis of They are racing horces

P P

V NP

A djP N

Adj

T hey are racing horses

Figure 2.8: Second phrase stru ctu re analysis of They are racing horces

39
They are racing horses

Figure 2.9: D ependency structu re for They are racing horses. T he sentence
root is racing.

LABEL SUBTREE
they they
are are
they racing
they are racing
racing they are racing horses
are racing
are racing horses
racing
racing horses
horses horses

Table 2.4: Subtrees and com plete subtrees in th e DG analysis of th e sentence

They are racing horses shown in Figure 2.9. O nly com plete subtrees are la
belled.

Now consider th e dependency stru ctu re in Figure 2.9. This includes the

subtrees shown in Table 2.4.

T he constituents in Figure 2.7 are shown in Table 2.5 (ignoring th e initial

category assignm ents).

Since every constituent in Figure 2.7 is coextensive w ith a subtree in Fig

ure 2.9 and every com plete subtree in Figure 2.9 is coextensive w ith a con

stitu en t, the structures correspond relationally. Since it is also th e case th a t

every com plete subtree has a label which is substantively equivalent to the

label of the coextensive constituent, th e structures correspond. Close exam i

nation of Figure 2.8 reveals th a t it also corresponds relationaly to Figure 2.9.

40
LABEL CO N S T I T U E N T

NP they
s they are racing horses
AuxP are
VP are racing
VP are racing horses
NP horses

Table 2.5: C onstituents in the phrase stru ctu re analysis of the sentence They
are racing horses shown in Figure 2.7

However, only Figures 2.7 and 2.9 share substantively equivalent labellings so

only these structures can be said to correspond.

2.3 B e y o n d G a ifm a n g ra m m a rs

In presenting his work on PSG s, Chom sky frequently and explicitly represented

them as a form alization of the stru ctu ralist Im m ediate C onstituent model (e.g.

Chom sky 1962). This claim has recently been contested by M anaster-R am er

and Kac (1990), thus highlighting some of the difficulties inherent in trying to

formalize a pre-existing linguistic notion faithfully.

T he issues are som ewhat clearer in the case of DG, since Gaifman, as

au th o r of th e form alization, makes no claims regarding its relation to any

existing notion other th an th a t em bodied in a RAND C orporation m achine

translation program . Hays, on the other hand, represents G aifm an’s work as

being a form alization of the hnguistic notion of dependency. For example,

following a discussion of the different linguistic notions underlying IC theory

and dependency theory in his 1964 Language paper, his sum m ary of w hat is

to follow includes the following statem ent:

Section 2 presents a form alism for the theory, identifying th e com

ponents of any dependency gram m ar (Hays 1964: 512, my em pha

sis).

41
I have been unable to find any discussions anyw here in th e literatu re which

investigate this assertion by reference to actual linguistic theories which claim

to be based on some notion of dependency.

W h at is noticeable is th a t few of the self-proclaimed dependency-based

theories of language have m ade use of G aifm an’s formalism. This contrasts

sharply w ith the uptake of Chom sky’s PSG formalism, and particularly C F

PSG. T he only DGs which incorporate a more or less intact version of Gaifman

gram m ar are those which use it as the base com ponent in a transform ational

gram m ar (Hays 1964: 522-4; Robinson 1970) or as th e transcription system

on one stra tu m of a stratificational gram m ar (Hays 1964: 522-4). Otherwise,

alternative quasi-formalisms are employed.

It is common to find versions of DG which make use of complex feature

structures instead of or as well as word category labels, w ith dependency rules

being allowed to m anipulate features in arb itrary ways (e.g. S taro sta 1988;

Covington 1990b). Consider the following illustrative exam ple of a dependency

rule for intransitive verbs which enforces subject-verb agreem ent (ad ap ted from

Covington 1990b: 234):

/ ' category : noun \

category : verb
person : X
person : X 5 *
num ber : Y
num ber : Y
\ case : n o m in a tiv e /

Here th e head is of syntactic category ‘verb’, of person ‘X ’ and num ber ‘Y ’.

Its single dependent m ust be a preceding nom inative case noun, also of person

‘X’ and num ber ‘Y ’. ‘X’ and ‘Y’ are variables over feature values.

This kind of augm entation could easily b e formalized as an extension to

G aifm an’s definition of DG. So long as the feature stru ctu res are sim ply a r

rangem ents of symbols draw n from a finite set, th e generative pow er rem ains

unchanged. T he proof is trivial: any arrangem ent of features m ay b e ‘frozen’

and treated as though it were an atomic symbol.^ This is directly analogous to

^Obviously, this is just a sketch of the proof. The proof itself would first have to define
precisely the notational extension to Gaifman’s formalism.

42
w hat happens when a PSG is augm ented by th e addition of feature structures.
G azdar has sum m arized this as follows:

If we take the class of context-free phrase stru c tu re gram m ars and

modify it so th a t (i) gram m ars are allowed to m ake use of finite

feature systems and (ii) rules are perm itted to m anipulate th e fea

tures in arb itra ry ways, then w hat we end up w ith is equivalent to

w hat we started out w ith (G azdar 1988: 69).

U nfortunately, all of the DGs which introduce feature structures also introduce

other extensions, whose effects on the generative capacity of th e gram m ars are

unknow n. For example, in H udson’s Word G ram m ar, a word m ay depend on

more th a n one head (Hudson 1990: 113-20). In S ta ro sta ’s Lexicase, certain

com plete subtrees (e.g. prepositional structures in English) have two roots, or

rather, a single root which is the union of th e features of two of th e words in

cluded in th e subtree (S tarosta 1988: 232-4). H udson offers a revised version

of the adjacency constraint whose definition includes a reference to m ultiple

heads (H udson 1990: 117), while Pericliev and llarionov (1986), Sgall et al.

(1986), Schubert (1987), and Covington (1990b) advocate abandoning the a d

jacency co n strain t altogether!

A thesis of this kind can not proceed w ithout giving some atte n tio n to

these theoretical extensions. However, as previously indicated, these features

m ust be regarded as lying on the periphery of th e study.

2 .4 O rig in s in lin g u istic th e o r y

T he concept of gram m atical dependency is found in some of the earliest known

gram m ars, for exam ple those of th e Greek scholars of th e A lexandrian School,

and especially Dionysius T h rax (c.lOO B.C.) whose work drew heavily on the

Stoic tra d itio n of linguistic studies. T h rax ’s T éch n t gram m atikë was th e in

spiration for th e gram m ar of the later A lexandrian scholar Apollonius Dysco-

43
lus (second century A.D.) whose work “foreshadowed the distinction of sub

ject and object and of later concepts such as governm ent...and dependency”

(Robins 1979: 37). The work of T h rax and Apollonius was further developed

by a num ber of L atin gram m arians, m ost notably Priscian (c. 450 A.D.). An

independent (earlier) strand of gram m atical study was pursued by th e Sanskrit

gram m arians, m ost notably Panini (some tim e betw een 600 and 300 B.C.). In

P a n in i’s gram m ar “the verb, inflected for person, num ber, and tense, was

taken as th e core of the sentence... O ther words stood in specific relations to

th e verb, and of these the most im p o rtan t were th e nouns in their different

case inflexions” (Robins 1979: 145).

P articularly clear early articulations of th e central concepts of dependency

can be found in the writings of th e medieval A rabic gram m arians, especially

those of the B asra and Baghdad schools. In A rabic gram m ar, a governor

was said to govern {^amila lit. ‘do, o p erate’) a governed (m a^m iil).

M any of the details of m odern DG are m ade explicit for th e first tim e in the

writings of Ibn A l-Sarraj (died 928A.D.). For example, a word m ay not depend

on more th a n one other. Sarraj writes:

It is not perm itted to have two governors governing a single item .

(Owens 1988: 43'*)

Heads and dependents were required to be adjacent. Again, Sarraj writes:

T he separation between the governor and th e governed by some

thing not related to either is disliked. (Owens 1988: 46)

This finds support in the writings of Ju rjan i (died 1078), who insists th at:

You cannot separate a governor and a governed w ith a foreign

elem ent. (Owens 1988: 49)

In common w ith m odern versions of dependency theory, governors could have

m any dependents, although dependents could have only one head. D ependency
"*A11 quotations use Owen’s translation and reference Owens (1988) rather than the orig
inal sources.

44
was unidirectional and there was no interdependence. T he m ediaeval A rabic

gram m arians also observed th a t, for A rabic a t least, governor-governed was

th e unm arked word order. A detailed guide to mediaeval A rabic gram m ar can

be found in Owens (1988).

D ependency is also found in the work of mediaeval E uropean scholars such

as the m odistic and speculative gram m arians, and especially, in the work of

M artin of D acia and Thom as of E rfurt (more details of their work can be

found on page 217ff below). According to H erbst et al. (1980; 33) who quote

Engelen (1975: 37), some of the central ideas of DG were used in G erm any

by M einer in the eighteenth century and later by others including Behaghel,

B iihler and Neum ann.

M ost com m entators agree th a t the m ost significant contribution to th e

developm ent of DG was m ade in the 1950s by the Frenchm an Lucien Tesnière.

Tesnière was the first person to develop a semi-formal ap p aratu s for describing

dependency structures. Tesniere’s ideas were initially collected in a slim and

ra th e r program m atic volume (Tesnière 1953) which was not very well received
by reviewers (for example, see G arey 1954). Tesnière died in 1954 but Jean

Fourquet edited his unpublished works into a single volume — Élém ents de

Syntaxe Structurale — which was published in 1959. This book presents a

coherent and com prehensive account of Tesnière’s work in DG.

Tesnière’s posthum ous volume consists of three parts labelled ‘la connex

ion’ (dependency), ‘la jonction’ (coordination) and ‘la tran slatio n ’ (word class

transform ation). He argued th a t whereas all other constructions could be

analysed in term s of word-word dependencies, coordinate constructions could

not. T his is now the stan d ard view am ongst dependency gram m arians.^ (A

few dependency gram m arians — including M el’cuk (1988: 26ff) — hold th a t

coordinate constructions can also be analyzed in term s of dependency). T he

‘Connexion’ section of the book presents in axiom atic fashion m any of the
®Brief descriptions of some approaches to coordination in DG can be found on
pages 139, 168, and 188. For a useful overview see Hudson (1988b).

45
principles which have come to define and to distinguish DG. For exam ple (my

translation, Tesnière’s emphases):

T he stru ctu ral connections establish relations of d e p e n d e n c e be

tween the words. As a rule, each link unites a s u p e r io r term w ith
an in fe rio r term.®
T he superior term is called the re g e n t. T h e inferior te rm is called
th e s u b o rd in a te .^
T he upw ard relation can be expressed by saying th a t th e subordi
n ate d e p e n d s on the regent, and th e downward relation by saying
th a t the regent c o m m a n d s or g o v e rn s the subordinate.®
In principle, a subordinate can only depend on a sin g le regent. In
contrast, a regent can command several subordinates.^

T he node formed by the regent which com m ands all th e sub o rd i

nates of the phrase is the n o d e o f n o d e s or c e n tr a l n o d e . It is
th e core of the phrase, of which it assures th e stru ctu ral u nity by
tying the separate elements into a single structure. It is identified
w ith the phrase.^® (Tesnière 1959: 13-15)

In a footnote on Tesnière’s page 15 he tells how he first conceived of th e idea

of th e stem m a in June 1932. He started using it in his private research in

1933 and in his publications in 1934. In 1936, whilst on a trip to th e U.S.S.R.,

he discovered th a t he was not th e only person to have this idea.^^ Usakov,

Smirnova and Sceptova had published an article using stem m as as early as

1929. B arkhudarov and Princip had done likewise in 1930, and Kruckov and

Svetlaev had used stem m as in a book published in early 1936. In spite of this

— in W estern E urope a t least — Tesnière is usually nam ed as th e originator of

®Les connexions structurales établissent entre les mots des rapports de d ép en d a n ce.
Chaque connexion unit en principe un terme su p é rieu r à un terme inférieur.
^Le terme supérieur reçoit le nom de rég issa n t. Le terme inférieur reçoit le nom de
su b o r d o n n é.
®0n exprime la connexion supérieur en disant que le subordonné d ép en d du régissant,
et la connexion inférieur en disant que le régissant co m m a n d e ou ré g it le subordonné.
®En principe, un subordonné ne peut dépendre que d’u n se u l régissant. Au contraire
un régissant peut commander p lu siers subordonnés.
i^Le nœude formé par le régissant qui commande tous les subordonnés de la phrase est le
n œ u d d es n œ u d s ou n œ u d central. Il est au centre de la phrase, dont il assure l’unité
structurale en en nouant les divers éléments en un seul faisceau. Il s’identifie avec la phrase.
“J ’ai eu la joie de constater que l’idée du stemma y avait germé de façon indépendante”.

46
DG as an explicit system for linguistic description. C ertainly it was Tesnière’s

work which did more th a n anyone else’s to publicize DG. Had his volume been

published any tim e other than in the im m ediate afterm ath of th e publication
of C hom sky’s Syntactic Structures^ DG m ight have been taken seriously by a

m uch wider audience.

A m ongst th e most influential of Tesnière’s ideas were those relating to

valency. T h e valency of a verb is its potential for having other words depend

on it. T hus, an intransitive verb takes one dependent, a tran sitiv e verb takes

two, a ditransitive three, etc. In addition to these com plem ents which m ust

be present in a well-formed structure, a verb m ay also take some num ber of

adjuncts. Com plem ents subcategorize the verb, whereas adjuncts m odify it.

T he term ‘valency’ was borrowed from m olecular physics where it is used to

describe th e a ttra ctiv e potential of a m o l e c u l e . Tesnière is often cited as

th e originator of the term ‘valency’ in linguistics but, according to Schubert

(1987: 61), it can be found in the earlier writings of K acnel’son (1948: 132)

[‘sintaksiceskaja valentost’] and de G root (1949: 111) [‘syntactische valentie’].

B aum (1976: 32) claims th a t R ockett (1958: 249) uses the term ‘valence’

independently of Tesnière.

T he relationship between valency and dependency is ra th e r opaque. E arly

dependency theorists tended to concentrate on form al issues and to see ver

bal valency as ju st one speciflc exam ple of th e general case of dependency

— in o ther words, all words have a valency. Valency theorists, on th e other

hand, concentrate on the pivotal role played by the m ain verb in n a tu ra l lan

guage sentences. They tend to focus particularly on th e case relations— in

th e general sense of Fillmore (1968)—of the verb. Two largely disjoint re

search com m unities have sprung up. In a recently published bibliography of

peut ainsi comparer le verbe à une sorte d’atome crochu susceptible d ’excercer son
attraction sur un nombre plus ou moins élevé d’actants, selon qu’il comporte un nombre plus
ou moins élevé de crochets pour les maintenir dans sa dépendance. Le nombre de crochets
que présente un verbe et par conséquent le nombre d’actants qu’il est susceptible de régir,
constitue ce que nous appellerons la v a len ce du verbe” (page 238).

47
valency gram m ar [valenzbibliographie] which includes 2377 entries, only 294

are indexed as relating to ‘dependency’ (Schumacher 1988). It is somewhat

difficult to see why these separate com munities still exist since a num ber of

linguistic theories appear to bridge th e perceived gap quite effectively (e.g.

Heringer 1970; Anderson 1977; Starosta 1988).

T he influence of Tesnière’s work has reached into alm ost every p a rt of the

world where language is studied, b u t th e effects have not always been the same.

In Tesnière’s native France and throughout th e R om ance language areas his

insights have been frequently applauded b u t seldom adopted. In Schubert’s

words:

In works w ritten in French, Spanish, Italian and o ther Romance

languages, Tesnière is referred to as a classic of linguistics, b u t
hardly anybody has taken up th e essence of his ideas and w ritten
for exam ple a dependency syntax of French or a valency dictionary
of Spanish (Schubert 1987: 22).

M aurice Gross is sometimes cited as a French dependency gram m arian but he

has not been active in the DG field since the early 1960s when he briefly exam

ined dependency gram m ars from a com putational point of view (Gross 1964).

Tesnière’s work had more influence in G erm any (E ast and W est),

where it was judged to be more ap p ro p riate for describing Germ an

word order variation and agreem ent p attern s th a n th e rath er inflexi

ble PSG s available in the 1960s and 70s. O ne of the first large-scale

uses of dependency in the description of G erm an was by H ans-Jurgen

Heringer, who combined constituency and dependency in a single represen

ta tio n (H eringer 1970). Two schools arose w ithin dependency-based stu d

ies of language in the late 1960s a t Leipzig and a t M annheim . The

Leipzig school — which is chiefly associated w ith G erhard Helbig — con

centrated on th e com pilation of valency dictionaries for G erm an verbs

(Helbig and Schenkel 1969), adjectives (Som m erfeldt and Schreiber 1974),

and nouns (Som m erfeldt and Schreiber 1977). T h e M annheim school under

48
Ulrich Engel and Helmut Schumacher began by producing an alternative va

lency dictionary of G erm an verbs (Engel and Schum acher 1976) b u t they pro

gressed to apply the insights of DG in the general description of languages.

E ngel’s gram m ar of G erm an (Engel 1977) was possibly the first a tte m p t to

describe all of the m ajor phenom ena of a single language w ithin a dependency

framework. O ther G erm an dependency theorists include Jiirgen Kunze and his

colleagues in E ast Berlin (e.g. Kunze 1975) and Heinz V ater who developed a

transform ational generative version of DG (V ater 1975).

From Germany, interest in DG spread throughout N orthern Europe, often

being prom ulgated by Germ anists. It was introduced in F inland by Kalevi

Tarvainen (Tarvainen 1977), in Sweden by Henrik N ikula (N ikula 1976), and

in D enm ark by C atharine Fabricius-Hansen (Fabricius-H ansen 1977).

In G reat B ritain, John Anderson (also a G erm anist) developed ‘Case G ram

m a r’, a com bination of DG and localist case (A nderson 1971; A nderson 1977).

M ore recently, Anderson has been involved in th e developm ent of a

dependency-based theory of phonology (A nderson and D urand 1986). R ichard

H udson’s theory of ‘D aughter Dependency G ram m ar’ (DDG) (H udson 1976)

grew out of his earlier research in Systemic G ram m ar (H udson 1971) and was

a com bination of constituency and dependency. He subsequently abandoned

DDG in favour of a new theory, ‘Word G ram m ar’ (H udson 1984), which is

based on dependency alone. Hudson has recently published w hat is probably

th e first m ajor theoretically-m otivated DG of English (H udson 1990). In addi

tion to these dependency theories, at least two B ritish scholars have used DG

in syntax textbooks (M atthew s 1981 and Miller 1985) and R odney H uddleston

has published two gram m ars of English which incorporate insights from DG

(H uddleston 1984; H uddleston 1988).

In th e early 1960s a num ber of Soviet scholars — including Sergei Fi-

tialov and Igor M el’cuk — used dependency as the basis of m achine tra n s

lation system s. Since then, Mel’cuk (now a t th e U niversity of M ontreal)

49
has been developing his dependency-based ‘M eaning-Text M odel’ of language

(M el’cuk and Zolkovkij 1970; Mel’cuk 1979; M el’cuk 1988). P e tr Sgall’s group

at Charles U niversity in Prague has produced a general theory of language

stru ctu re called ‘Functional G enerative D escription’ in which dependency is

basic and constituent stru ctu re plays no p a rt (Sgall et al. 1986). I am aware

of some ongoing dependency research in Bulgaria b u t I have not seen any En

glish papers other th an those by Pericliev and Ilarionov (1986) and Avgusti-

nova and Oliva (1990). A num ber of slavists working in th e W est have also

used some version of DG in their work, e.g. David K ilby (A tkinson et al. 1982)

and Johanna Nichols (Nichols 1978; Nichols 1986).

It is w orth pausing to reflect th a t so far in our discussion of th e develop

m ent of DG we have considered only E uropean scholars (w ith th e exception

of Panini, the mediaeval Arabic gram m arians, and Jo h an n a Nichols who is

based a t Berkeley). If constituency gram m ar can be regarded as th e product

of N orth A m erican scholarship (and especially of Bloomfield and Chomsky)

then DG can be regarded as a distinctively E uropean developm ent. However,

although the vast m ajority of work in DG has been carried out in Europe,

some work has been done in N orth America and Jap an .

T he m ain figures associated w ith DG in N orth Am erica are D avid Hays,

Jan e Robinson, and Stanley S taros ta. Hays worked for th e RAND C orporation

in the early 1960s, on a large R ussian-English m achine translation project. He

explored the uses of DG for m achine translation and also investigated the

formal properties of DCs. His work is described in more detail in C h ap ter 4.

In the late 1950s, H aim G aifm an, had collaborated w ith Bar-Hillel and Sham ir

in a study of C ategorial G ram m ars and PSG s which proved for th e first tim e

the generative equivalence of the two formalisms (Bar-Hillel et al. 1960). In

th e early 1960s, while he was based in th e M athem atics D epartm ent of the

University of California a t Berkeley, G aifm an undertook consultancy work

for th e RAND C orporation. It was there, while working w ith Hays, th a t

50
G aifm an carried out the work described at the beginning of this chapter. His

sem inal paper Dependency system s and phrase structure system s appeared as

a RAND internal report in May 1961, although it was not published in a m ajor

journal u n til 1965, a year after H ays’ article m aking G aifm an’s work accessible

to linguists had appeared in Language (Hays 1964). R obinson’s work in DG

was carried out at the end of the 1960s while she was employed a t IBM ’s

W atson Research Center. The m ain objective of her work was to explore

the ways in which Fillm orean case gram m ar could fit into a transform ational

fram ework. Her conclusion was th a t a transform ational gram m ar should have a

DG ra th e r than a PSG base com ponent (Robinson 1969; Robinson 1970). The

work of V ater m entioned above (Vater 1975) was a developm ent of R obinson’s

i d e a s . T h e largest single contribution to DG in N orth A m erica in recent

years has been m ade by S tarosta a t the U niversity of Hawaii. Since th e early

1970s S tarosta has been developing a dependency-based theory of language

called ‘Lexicase’ (S tarosta 1988). Lexicase has been used in th e description of

around fifty different languages, m any of them so-called ‘exotic’ languages. It

is unlikely th a t any other dependency-based theory has been so widely field-

tested. For th a t m atter, it is unlikely th a t m any theories of any variety have

been so widely field-tested. An extended description of Lexicase can be found

in C hapter 8.

T he m ain figure associated w ith DG in Jap an is Tokum i K odam a

(K odam a 1982). However, very little theoretical DG work has so far been

done in Japan.

2.5 R e la te d g r a m m a tic a l fo r m a lism s

A num ber of frameworks bearing similarities to DG have emerged during the

last few decades. Three of these m erit special a tte n tio n here, nam ely Case

G ram m ar, Categorial G ram m ar, and Head-Driven PSG.

Robinson subsequently abandoned DG in favour of augmented PSG.

51
2 .5 .1 C a se g r a m m a r

Consider the following sentences:

(4) a P unch hit Judy w ith a club,

b P unch used a club to hit Judy,
c Ju d y was hit w ith a club (by Punch).

A lthough these sentences vary considerably in their surface forms, th e sem antic

relationships they express rem ain constant. Punch is the agent of th e hitting

action; Judy is on the receiving end of th e hittin g action; the club is the

instrum ent of the h ittin g action.

Case gram m ar, developed in th e late 1960s by Charles Fillmore

(Fillm ore 1968), formalizes these relationships. The sem antic deep structure of

a sentence is held to consist of two com ponents, a modality and a proposition.

T he m odality com ponent carries features of tense, m ood, aspect, and negation

relating to the sentence as a whole. T he proposition com ponent records th e

deep case relations in the sentence. Typically these cases are associated w ith

th e m ain verb. In Fillm ore’s original version of case gram m ar there were six

deep cases: a g e n t i v e , i n s t r u m e n t a l , d a t i v e , f a c t it iv e , l o c a t iv e , and

OBJECTIVE. (This num ber has varied widely between different instantiations

of case gram m ar). T he case fram e for the verb hit would include agentive,

objective, and in stru m en tal case slots, where each slot can be filled by phrases

of the appropriate sem antic type.

T he sim ilarities between case gram m ar and DG should be ap p aren t. It

is easy to envisage w riting a set of case gram m ar rules in modified G aifm an

form at, or giving a graphic representation of case structures using arc dia

gram s. Fillm ore him self acknowledges his debt to Tesniere and o ther depen

dency gram m arians (Fillm ore 1977: 60). However, I believe there are good

reasons for keeping DG and case gram m ar clearly separated. DG as I have

described it so far, is concerned w ith surface sy ntactic structure. Once a de

pendency stru c tu re has been found, one option is to use it as a guide to assign

a case structure. However, gram m atical relations and case relations are n o t

52
necessarily coextensive. In (4a), the o b je c t iv e case is realized by th e object

gram m atical relation (Judy). In (4c), the o b je c t iv e case is realized by the

subject gram m atical relation.

The logical separation of dependency and case is dem onstrated in practice

by the fact th a t while some dependency gram m arians make extensive use of

case in their theories (e.g. Anderson 1971, 1977; S tarosta 1988), others make

use of alternative sem antic frameworks (e.g. Covington 1990b; Hudson 1990).

O ur concern here is w ith the construction of syntactic structures and not se

m antic structures. The question of which sem antic framework is m ost appro

p riate when starting from a dependency tree is an interesting one, b u t it is not

the question we are tackling here. Dependency and case — though superficially

sim ilar — are logically distinct.

A useful introduction to case gram m ar is provided by Bruce and Moser

(1987). Applications of case gram m ar in NLP are described in Somers (1987).

It is w orth noting in passing th a t C onceptual D ependency (CD)

(Schank 1972; 1975), which is a generalization of case gram m ar for describing

relations holding between events and p articipants, is also outw ith th e scope

of this thesis. T he presence of the word ‘dependency’ in its title should not

be allowed to lead to confusion: DG is concerned prim arily w ith syntactic

dependency relations; CD is not.

2 .5 .2 C a te g o r ia l g r a m m a r

C ategorial gram m ars (CGs) trace their origins from a num ber of devices devel

oped in th e field of logical sem antics, specifically Lesniewski’s theory of sem an

tic categories (Lesniewski 1929) which brought together insights from H usserl’s

Bedeutungskategorien (Husserl 1900) and W hitehead and R ussell’s theory of

logical types (W hitehead and Russell 1925). Lesniewski’s theory was refined

by Ajdukiewicz (Ajdukiewicz 1935) who applied the resulting system in the

specification of Polish notation languages (parenthesis-free logical languages

in which operators/ functors are w ritten im m ediately to the left of their argu-

53
m erits). In a gram m ar of the sort envisaged by Ajdukiewicz, there are two

distinct types of category: p r im itiv e or f u n d a m e n ta l categories, which are

denoted by u nitary symbols (eg. S, N), and d e riv e d or o p e r a t o r categories,

which are denoted by complex symbols of th e form:

a //3

w here a and (3 can be either variety of category, prim itive or derived. W hen

complex categories appear w ithin a category, it is custom ary to place brackets

around the em bedded categories. A gram m ar consists of a single rule which

states th a t, given any string of two category symbols a / ^ and /?, replace the

string w ith a. This rule is suggestive of cancelling in fractions, eg. 3/2 x 2 =

Consider a language w ith three item s in its alphabet: x, y, and z. x has

category A /B \ b has category B /C ; c has category C. The string xyz would be

analysed as follows:
X y z
A /B B /C C
(5)
B
A

By convention, a line is drawn below two adjacent categories which combine

to form a com posite category and the resulting category label is w ritten below

th a t line. The sim ilarities to phrase stru ctu re should be obvious; in this case

we can generate th e same string and th e same constituent stru ctu re (7) w ith

th e following set of PS rules:

(6) A A /B B
B B /C C
A /B —>X
B/C-^y
C z

54
A

A/ B B

B C

(7)

Three extensions to Ajdukiewicz’s scheme were introduced by Bar-Hillel

(Bar-Hillel 1953): (i) assignm ent of words to more th a n one category was

allowed, (ii) a new kind of complex category — a \/3 — was introduced, and

(iii) a new com position rule was introduced to deal w ith the new kind of

category: given a string of any two symbols a and a \/? , replace the string

w ith /).
A CG is unidirectional if its complex categories are all either of th e form

a \/3 or of the form ex/(d. A gram m ar w ith b o th types of complex category is

called a bidirectional CG.

In his seminal paper The m athem atics o f sentence structure, Joachim Lam-

bek proposed four different CG rules: application, com m utativity, composition,

and raising (Lam bek 1958). These rules — or m inor variants of th em — have

now become the standard rules of CG.

A pplication
X/YY X
YY\X X

These are the rules of com bination we have already encountered. If a noun were

assigned the category N, an intransitive verb would be assigned the category

N\S. T hus, by the second clause of the application rule, a noun-intransitive

verb sequence such as John snores would cancel to S.

55
C om m u ta tiv ity

(X \Y )/Z X\(Y/Z)

C om position
X /Y Y/Z -4 X/Z
X \Y Y\Z X\Z

Raising
X - , Y /(X \Y )
X - , Y \(Y /X )

The m otivation for raising is as follows. Suppose th a t the pronoun he is as

signed the category S /(N \S ) to indicate th a t it can only occur in subject

position, and the pronoun him is assigned th e category (S /N )\S to indicate

th a t it can only occur in object position. The raising rule allows an unm arked

noun to assum e either of these categories and to appear in either subject or

object position.
Interest in CG greatly increased during the 1970s due to the influ

ence of R ichard M ontague’s work in truth-conditional m odel-theoretic se

m ât ics and, in particular, his P T Q gram m ar (Thom ason 1974; see also

Dowty et al. 1981). Interest in CG continued to increase th roughout the 1980s,

due to the influence of David Dowty (Dowty 1982, 1988) M ark Steedm an

(Ades and Steedm an 1982; Steedm an 1985, 1986), and others. M any different

variants of Lam bek’s rules are currently in circulation. Steedm an’s C om bina

tory Categorial G ram m ar (CCG ) offers one of the m ost interesting examples.

CCG analyses have been offered for particularly difficult non-context-free con

structions such as the notorious D utch cross-serial coordinate structures. An

other claim for CCG is th a t it allows increm ental (i.e. strict left-to-right) struc

tu re building, and thus it facilitates on-line in terp retatio n (H addock 1987).

CG and DG are widely held to be notational variants (Lyons 1968: 231).

This is understandable, since there is an obvious sim ilarity betw een a DG rule

such as the one shown in (8a) and a CG category such as th e one shown in

(8b).

56
(8)
a S(N,*,N)
b N \S /N

However, behind these surface similarities lie the rules of CG. The basic

rule of combination in DG is something like C G ’s rule of functional application.

DG has nothing corresponding to the rules of com m utativity, composition, or

raising.

M ost CG parsers adopt a basic shift-reduce strategy. The interest of these

parsers lies not in their parsing strategy so much as in the p articu lar form

and effects of the com bination rules they employ. Later I shall note in passing

how some similarities emerge between DG and CG parsing in th e context of

increm ental shift-reduce parsing.

2 .5 .3 H e a d -d r iv e n p h r a se s t r u c t u r e g r a m m a r

H ead-driven P hrase Structure G ram m ar (H PSG ) is a theory of syntax and

sem antics developed by Carl Pollard and Ivan Sag (Pollard and Sag 1988). It

differs from stan d ard PSG in the extent to which inform ation is stored in the

gram m ar in relation to head words. For example, p a rt of th e lexical entry for

love is shown below.

(9)

(love, V[BSE, SUBCAT (NP, NP)])

This states th a t love is a verb which, in its base (infinitival) form subcategorizes

for two noun phrase complements. T he SUBCAT list is ordered according to

obliqueness, w ith more oblique argum ents appearing to th e left of less oblique

argum ents. This is further illustrated by th e following exam ple, which shows

the by-phrase of passive loved on the left of th e SUBCAT list.

(10)

(loved, V[PAS, SUBCAT ((PP[BY ]), NP)])

57
(a) X (b) XP

Figure 2.10: syntactic stru ctu re in DG (a) and in H PSG (b)

From the point of view of this survey, th e identification of argum ents by posi

tion on an obliqueness list is no more th an an im plem ent at io n al detail.

At first sight, th e HPSG representation appears to have ju st th e right kind

of inform ation stored at the right (i.e. lexical) level to make it into a DG. It

is certainly possible to envisage using an HPSG lexicon to produce stan d ard

dependency structures. However, in addition to its lexical rules, H PSG also

makes use of a small num ber of phrasal rules. These effectively add an ex tra

(phrasal) layer of stru ctu re above each head word. For every phrase of type X

an X P is constructed. X becomes one of the daughters of X P and th e features

of X are copied to XP. Thus, where DG would build the stru ctu re shown in

Figure 2.10 (a), HPSG would build the stru ctu re shown in Figure 2.10 (b).

T he relationship between H PSG and DG is certainly very close. Ju st how

close is a question which I shall not address further here. For inform ation on

HPSG parsing see P roudian and Pollard (1985).

2.6 Sum m ary

This chapter has a ttem p ted to delineate exactly w hat is understood by th e

term ‘dependency gram m ar’ as used in this thesis. It has done so first by

presenting a detailed formal definition of G aifm an gram m ar, whose purpose

here is to act as a cardinal point to which other versions of DG m ay be related;

and second, by chronicling the rise and spread in linguistics of theoretical

approaches which, although they may include additional features, ap p ear to

58
rest on a foundation which is expressible in term s of a G aifm an gram m ar.

Key figures and schools in the development of DG were identified. To assist in

identifying th e boundary between th a t which is included in this study and th a t

which is n ot, three examples of gram m atical theories which lie ju st outside DG

were isolated, and the reasons for their exclusion given.

59
C h ap ter 3

D ep en d en cy parsers

In the la tte r p a rt of the last chapter I traced the origins and developm ent of DG
in theoretical linguistics. In this chapter I chart th e origins and developm ent

of DG in com putational linguistics.

T he designer of a PSG parser has at his or her disposal th e whole com puta

tional linguistics literature which describes a host of tried and tested techniques

and algorithm s. However, it does more th an this. It defines th e space of pos

sibilities for PSG parsing. For exam ple, th e designer of a parser m ust decide
w hether to build syntactic stru ctu re top-dow n, bottom -up or some com bina

tion of the two. Not only is there a serious lack of published descriptions of

dependency parsing techniques^, b u t there is an even m ore serious absence of

definitions of the problem space. It is sometimes naively assum ed th a t DG

parsing and PSG parsing are slight variations on a single them e. This may

tu rn out to be the case b ut there can be no a priori guarantees. For example,

it m ay not make sense to talk about top-dow n versus botto m -u p dependency

parsing when there are no non-term inal nodes in a tree.^ O ne of th e m ain ob

jectives of this thesis is to begin th e task of charting th e dependency parsing

problem space.
^The extensive bibliography of N a tu ra l Language P rocessing in the 1980s compiled by
Gazdar et al. 1987 includes only 9 entries indexed under ‘dependency’ (excluding non-DG
senses of the word).
^As we shall see below, a top-down/bottom-up distinction can be made in connection
with dependency parsers but it is not exactly the same as the more familiar PSG distinction
(cf. Chapter 12).

60
This chapter begins with an introduction to dependency in com putational

linguistics (Section 3.1). This is followed by an introduction to P A R S (Sec

tion 3.2), a language for the description of dependency parsing algorithm s,

which I shall use for the sake of clarity in the survey of existing dependency

parsers which follows this chapter.

3.1 D e p e n d e n c y in c o m p u ta tio n a l lin g u istic s

A lthough com putational DG is lacking in theoretical underpinnings, a num

ber of system s have been developed from th e late 1950s onwards. These can

be roughly divided into machine translation systems, speech understanding

system s, other applications, im plem entations of theories, and exploratory sys

tem s. The next eight chapters present some of these systems in more detail.

3 .1 .1 M a c h in e tr a n s la tio n s y s t e m s

D ependency-based machine translation (M T) research has taken place in two

periods: th e first half of the 1960s and th e second half of th e 1980s.

T h e early 1960s

In the early 1960s, there were two m ajor dependency-based M T projects. The

first of these was based at the Moscow Academy of Sciences. Amongst the

scholars associated w ith the project were Sergei Fitialov, O.S. K ulagina and

Igor M el’cuk. Very few — if any — docum ents describing th e project in detail

are available in English. However, an an notated bibliography on dependency

theory released by Hays in March 1965 contains English ab stracts of a num

b er of the project papers (Hays 1965). Since these papers are not discussed

elsewhere in th e English literature and since H ays’ bibliography is not in wide

circulation, those most im m ediately relevant to our present concerns are re

produced below.

61
T he cod in g o f words for an algorithm for syn tactic analysis
(M artem ’yanov 1961)
Word classes ADr, ADI, AG, PG r, and PG l are defined, where
A =active, P=passive, D = depend, G =govern, r= rig h t, l=left. An
active governor sweeps up passive dependents. A parsing routine
is discussed in p a rt, including the effect of English inflections on
word class.

A n algorith m for sy n tactical analysis o f language te x ts —

general principles and som e results
(M el’cuk 1962)
A dependency parser is outlined. The units of syntactic analysis are
‘content com binations’, i.e. syntagm as (governor and dependent),
phraseological com binations, etc., given in th e form of configura
tions, each giving a pair of objects to be sought, a search rule,
conditions, actions, etc. These are listed in a syntactic dictionary.
T he algorithm th a t uses this list consists of 67 stan d ard (Kulagina)
operators. T he Russian configuration list has 263 lines. A bout 250
auxiliary operators are used. A flowchart and configuration list are
given.

O btaining all adm issible variants in sy n ta ctic analysis of

te x t by m eans o f com pu ter
(Slutsker 1963)
Assume a gram m ar th a t specifies w hat pairs of words can be con
nected as governor and dependent. To find all projective parses of
a sentence, first set up a square m atrix w ith Wij = 1 if the gram m ar
allows word i to depend on word j . A parse of the sentence can be
specified by a m atrix with a single non-zero elem ent in each row,
chosen among those w ith W{j = 1. P rojectivity can be interpreted
in term s of incom patibilities in th e m atrix; all elem ents incom pati
ble w ith unit elem ents unique in th e rows can be erased. T hen, by
a backw ard procedure, all parses can be found.

It is unfortunate th a t more inform ation is not available on th e Moscow M T

project. However, on the basis of these brief ab stracts it is possible to infer th a t

a significant am ount of effort was directed tow ards developing dependency-

based N LP systems. (T he abstracts tell us, for exam ple, th a t a t least three

scholars were involved in the developm ent of a t least three parsing algorithm s).

62
The second m ajor dependency-based M T project was sponsored by th e

RAND corporation and led by David Hays. T h e RAND project aim ed to

build a Russian-English MT system. It appears th a t Hays had no contact w ith

the work of Tesniere. Instead he learned about DG from the Soviet scholars.

It seems th a t there was a surprising am ount of com m unication betw een the

two research groups (especially considering th e prevailing Cold W ar clim ate

and the defence significance of Russian-English M T). T he RAND DG work is

sum m arized in C hapter 4 of this thesis.

A th ird stran d of dependency-based MT work was begun by P e tr Sgall’s

group in P rague in the early 1960s (Sgall 1963). No inform ation on this work

is available at th e tim e of writing.

T he m id 1980s

In the m id 1980s three large dependency-based M T projects were under

taken. T he first of these is the European C om m unity EU R O TR A project

(Johnson et al. 1985; Arnold 1986; Arnold and des Tombe 1987) which has

led more recently to an offshoot dependency-based project called MiMo based

a t the Universities of Essex and U trecht (Arnold and Sadler forthcom ing). The

second, the D utch D istributed Language T ranslation (DLT) project, is dis

cussed in C hapter 7 of this thesis. Sgall’s group in P rague has recently been de

veloping an M T system based on the model of Functional G enerative Descrip

tion (K irschner 1984; H ajic 1987; Sgall and Panevova 1987; H ajicova 1988).

It is not clear w hether this is a continuation of th e work begun in th e 1960s,

or w hether it represents a completely new venture.

3 .1 .2 S p e e c h u n d e r s ta n d in g s y s t e m s

In at least two projects, dependency parsers have been used to process lattices

o u tp u t by speech recognition systems. The claimed advantages of DG are first,

th a t its rules and structures are word-based and can readily be associated w ith

th e basic units of recognition in lattices, nam ely w ord hypotheses; and second,

63
th a t DG is well-suited to combining top-down and bottom -up constraints in a

way which is particularly useful for processing lattices.

The first speech understander to make substantial use of dependency is the

Italian CSELT system (late 1980s), which is a speech interface to a database.

The CSELT parser is described in C hapter 11.

The second dependency-based speech understander was developed in Jap an

at N T T Tokyo and th e University of Y am agata. It is described in M atsunaga

and K ohda (1988).

The speech understanding system developed for the SPICOS project by

Niederm air and his colleagues a t Siemens in M unich (N iederm air 1986) is

sometimes m entioned in discussions of DC. However, their system is a hy

brid of PSC and basic valency theory. A first-phase augm ented context free

phrase stru ctu re parser identifies and builds the m ajor phrases in a sentence.

A second-phase parser establishes binary relations between th e m ajor phrases

on the basis of sem antic caseframe entries. This is an interesting approach,

m otivated by the particular problem s encountered in parsing speech. However,

it would be misleading to describe it as a dependency parser.^

3 .1 .3 O th e r a p p lic a tio n s

At least one m ajor NLP project has investigated th e use of dependency in

a practical application other th a n M T or speech understanding. This is the

Finnish Kielikone project whose aim is to produce a general-purpose n atu ral

language interface which — in theory a t least — can sit on top of any database

w ith m inim al custom ization. T he project has been running since 1982. The

Kielikone parser is described in C hapter 6.

®This view has recently been confirmed by Gerhard Niedermair (personal
communication).

64
3 .1 .4 I m p le m e n ta tio n s o f th e o r ie s

So far we have considered only DG-based NLP system s which were designed

with some particular application in mind, such as M T, speech understanding,

or database access. However, a num ber of systems have been built in order

to test th e coverage and coherence of particular linguistic theories. T he two

theories which have most obviously spawned this kind of activity are Lexicase

(S tarosta 1988) and W ord G ram m ar (Hudson 1984, 1990a). Their im plem en

tation has taken place only in recent years. T he fact th a t more dependency-

based theories have not been im plem ented reflects th e fact th a t there has

been a shortage of well-developed theories to im plem ent. A t least two parsers

based on Lexicase have been produced so far (S taro sta and N om ura 1986;

Lindsey 1987). These are described in C hapter 8. Several parsers based on

W ord G ram m ar have been im plem ented (e.g. Fraser 1989a; Hudson 1989c).

These are described in C hapter 9.

Some of the work done by Sgall’s group in Prague is directed tow ards imple

m enting th e theory of Functional G enerative D escription (e.g. Petkevic 1988),

although this seems to be less em phasized th an the design of specific applica

tions.

3 .1 .5 E x p lo r a to r y s y s t e m s

It would be misleading to suggest th a t all work in dependency parsing has

been carried out w ith specific applications or theoretical linguistic objectives

in m ind. Some of the m ost interesting and useful results have em anated from

exploratory research directed tow ards investigating th e com putational proper

ties of DCs and trying out various novel parsing algorithm s.

E arly in th e 1960s, a DG research group was form ed at th e EURATOM

CETIS Research C entre in Ispra, Italy.^ O ther research was carried out by

a group funded by EURATOM and under the leadership of Lydia Hirschberg

^EURATOM = European Atomic Energy Community. CETIS = Centre Européen pour
le Traitement de l’Information Scientifique.

65
at th e U niversity of Brussels. A lthough the work these groups carried out is

widely referenced, most of it is described in EU RATOM internal reports and

is otherw ise unavailable. The following abstracts appear in H ays’ annotated

bibliography.^

A u to m a tic analysis
(Lecerf 1960)
The ‘conflict’ program tests each item against th e adjoining, al
ready constructed phrase and either subsumes it as an additional
dependent or makes it the governor of a new, extended phrase. The
result is a chameleon, looking like both a phrase stru ctu re diagram
and a dependency diagram.

C on d ition al relaxation of th e P r o jectiv ity H yp oth esis

(Hirschberg 1961)
W hen parsing is blocked and a subtree exists headed by a unit th a t
dem ands a governor, remove th a t subtree and continue. W hen a
tree for a sentence is otherwise complete, look for th e governor in
the subtree headed by the nearest preceding node. M any exam
ples are given. There are also fixed non-projective com binations
in m any languages. An annex classifies French dependency types
by value. The highest value obtains when governor and dependent
require one another; the lowest, when neither calls specifically for
the other.

For a t least a decade and a half, Jürgen K unze’s group in F a st Berlin has

been developing a version of DG for use in com puter applications. This work

could be expected to be of considerable significance in dependency parsing.

U nfortunately, very little of K unze’s m aterial has been available for inspection

at th e tim e of w riting.

Since the early 1970s P eter Hellwig has been developing his PLA IN system ,

chiefly at the U niversity of Heidelberg. PLAIN is a suite of program s centred

around a dependency parser. W hile Hellwig is actively involved in a num ber of

N LP projects to develop applications, th e PLA IN system seems to be prim arily

a research environm ent. The PLAIN system is described in C h ap ter 5.

^Hays himself spent 1962-63 at the EURATOM CETIS Centre, Ispra, Italy on leave from
RAND.

66
During the last few years, Michael Covington at th e U niversity of Georgia

has developed a num ber of simple dependency parsers in order to explore

the parsing of free word order languages. C ovington’s most recent parser is

described in C hapter 10.

A simple dependency parser has been designed and im plem ented by Bengt

Sigurd at the University of Lund. This work was inspired by Sigurd’s reading

of Schubert (1987).

Very recently a group at IBM ’s Tokyo Research Laboratory has be

gun to experim ent w ith dependency-based NLP systems (M aruyam a 1990;

Nagao 1990).

I have presented a very brief historical overview of the field of dependency-

based NLP. This is sum m arised in Figure 3.1. Projects identified by heavy lines

are discussed in detail in C hapters 4-11. Notice how th e early enthusiasm and

associated research effort — much of it associated w ith M T — dwindled to

alm ost nothing in the late 1960s and throughout th e 1970s. It is interesting to

see how interest has picked up throughout the 1980s and, at the s ta rt of the

1990s, the field is blossoming once more.

C hapters 4-11 present overviews and critiques of twelve parsers. I present a

sum m ary table for each algorithm noting the following features: search origin

(top-dow n, bottom -up, etc), search m anner (breadth-first, depth-first, etc),

search order (left to right, right to left, etc), num ber of passes (single pass,

m ultiple passes, etc), search focus (w hat is being searched for?), and am bi

guity m anagem ent (how are choice points and m ultiple analyses handled?).

Verbal descriptions of the algorithm s are presented b u t these can not always

be as perspicuous as m ight be desired. Consequently, th e informal verbal de

scriptions are accompanied by slightly more formal descriptions. In order to

facilitate understanding and comparison of th e parsers it is useful to abstract

away from th e m any different notations used, and to represent th e parsing

algorithm s in a clear and theory-neutral fashion. It would be an enormous

67
^Moscow (MT)
RAND (MT)
EURATOM
? Sgall__________ _ _
Hellwig
Kunze
Kielikone
EUROTRA
DLT
Lexicase
Word Grammar
CSELT
NTT
Covington
MiMo
Sigurd
IBM Tokyo
\ I I I I I I
1960 1970 1980 1990

Figure 3.1: dependency-based NLP projects

68
task to do this thoroughly. First of all, it could involve th e design of a whole

new representation language whose syntax and sem antics would have to be

defined. Second, it would involve representing th e knowledge pertaining to

each parser in the kind of detail which would make the task com parable to

re-im plem enting the algorithm s. T he solution adopted here is a compromise.

A representation called PARS is introduced in th e next section. It is in tu

itively simple b u t lacking in formal rigour. T he prim ary purpose of PARS is

to achieve expository clarity in descriptions of parsing algorithm s. I make no

stronger claims for the representation.

3 .2 P A R S : P a r sin g A lg o r ith m R e p r e s e n ta
t io n S ch em e

In this section a simple quasi-formal language (PARS) for describing depen

dency parsing algorithm s is outlined. Its purpose is exposition ra th e r than

im plem entation so it is defined rath er less rigorously th an would be required

in a more form al specification. There is a tradition in com puter science of
using languages of this type (sometimes known as pseudo-Pascal) to describe

algorithm s (e.g. Goldschlager and Lister 1982). PARS is unusual in being

designed specifically to serve as a general-purpose representation scheme for

dependency parsing algorithm s. I shall use PARS to describe m any of the

dependency parsing algorithm s described in the following chapters.

3 .2 .1 D a t a s tr u c tu r e s

C on stan ts

Integers, and lower case identifiers are allowed. Two list-related constants are

recognized. ‘0 ’ is the ‘begin-list’ m arker, ‘e’ is the ‘end-list’ marker.

69
V ariables

Variables can be distinguished from other d a ta structures in PARS by the fact

th a t they all begin w ith an upper case character. All variables are global unless

otherw ise indicated.

By convention, the variable C is used to identify the current word in the

input list of words by means of its sequential position in th e list. Because of

PARS s expository function, this variable is used fairly loosely. Sometimes it

is used as a norm al variable, sometimes as a pointer, sometimes it refers to the

thing pointed to. The context ought to make the in terp retatio n clear in each

case.

O ther nam ing conventions include List (a global list). Stack (a global stack),

and Top (the top elem ent on the stack).

As we shall see below, values are assigned to variables by means of th e :=

(assigns) operator.

Variables can be used as pointers. W hen X is a pointer, X | is the element

to which it points.

Stacks and Lists

A stack is a last-in-first-out d ata structure. The default nam e for a stack is

Stack. T he action

pop(Stack)

discards the topm ost item on th e stack. T he action

push(Element)

pushes Element (any variable or constant) onto Stack. It is possible to push

elem ents onto stacks other th an the default stack by means of th e action

70
push(Stackl, Element)

which pushes Element onto Stackl (some stack).

T he action

empty(Stack)

returns ‘tru e ’ if Stack is empty, otherwise it returns ‘false’. The action

top(Stack)

returns th e top elem ent of Stack w ithout popping it.

A list is an ordered sequence of elements. T he begin m arker is ‘O’. T he end

m arker is ‘e’. Elem ents in a list are addressed by pointers. If C is a pointer to

a list elem ent, then C-1 is the previous elem ent and C+1 is the next elem ent.

An elem ent can be added to the tail of a list by means of th e action

append(Element) or append(Listl,Element)

and an elem ent can be removed from the list by means of th e action

remove(Element) or remove(Listl,Element).

T he first and last elements of a list are returned by th e following actions:

first(Listl)

and last(Listl).

71
T he length of a list can be found by m eans of th e action

length(Listl)

3 .2 .2 E x p r e s s io n s

T he basic com ponents of PARS descriptions are expressions. Expressions can

either be simple, consisting of one or more actions, or stru ctu red condition-

action sequences, as shown below:

(11)

I F condition(s)
T H E N expression(s)
(E L S E expression(s))

In addition, expressions m ay be labelled, as follows:

( 12)

N: expression

where ‘N ’ is an integer.

Expressions end w ith a full stop.

C o n d itio n s

Conditions can be of several different varieties. Each variety is associated with

a different operator. T he general purpose operators are sum m arized in the

table below.

O p e ra to r N am e
= equals
depends
U unifies

E q u a lity T he = (equals) operator is used to test two item s for identity. The

te st succeeds it th e item s are identical.

72
D ep en d en cy The (depends) operator is used to test for dependency. The

test succeeds if the elem ent on the RHS of the operator already depends on

the elem ent on the LHS of the operator or if it can be m ade to depend on the

LHS elem ent (i.e. there is nothing in the gram m ar or th e sentence to prevent a

dependency relation from being established). T he detailed articulation of this

operator will vary from system to system.

U nification The U (unifies) operator is used to test w hether or not two

feature structures unify. T he test succeeds if the structures unify. As well as

producing a tru th value, a successful test also results in th e unification of the

feature structures tested as a side effect.

O th er As was noted above, the empty(Stack) action returns a tru th value

and can be used as a condition in expressions.

T he condition saturated(C) succeeds if all of th e valency requirem ents of

some word C are satisfied.

C on ju n ctive and d isju n ctive con d ition s Conditions m ay be conjoined

using th e & (and) operator. For example:

(condition 1 & condition^)

D isjunctions of conditions are possible using the ‘V’ (or) operator. For ex

ample:

(conditionl V condition2)

A ction s

A ssig n m en t Values are assigned to variables using th e := operator. Thus

73
C:=l

assigns the value 1 to C. If C equals, say, 5, then it is possible to reassign

C thus

C:=6

or thu s

C:=C+1

(the result is th e same in both cases).

R ecord T he record(X) action makes a record of X. For example,

record(C —>• C+1)

makes a record of the fact th a t a dependency has been established in which C

is the head of th e next word in th e global queue.

G o to T he goto(Label) action shifts control to th e expression identified by

Label. T he sta te before a goto action is not stacked. It is not possible to

re tu rn to a prior state once a goto action has been executed. Expressions are

usually identified by integers. For example,

goto(3)

L e n g th T he iength(List) action returns an integer corresponding to th e n u m

ber of elem ents in List, excluding th e end-of-list m arker.

74
Succeed a n d fail succeed signals th a t a parse has succeeded, fail signals

that parsing has failed. Both actions term inate the parse immediately.

O th e rs As noted above, other actions include the stack-related pop and push,

and the list-related append and remove.

3.3 S u m m a ry

In this chapter I have charted the rise of ‘applied’ DG, i.e. DC in service of

NLP. I have shown how an increasing num ber of NLP system s are being based

on DG in M T, speech understanding, and database access systems. Separate

strands of research are devoted to building NLP systems whose object is to

explore novel parsing algorithm s and to im plem ent linguistic theories. Lack of

published m aterial (or lack of m aterial published in a language accessible to

me) renders it impossible to include here a detailed exam ination of every sys

tem nam ed in the survey. The following eight chapters describe those parsers

for which most inform ation is currently available. A t least one representative

of each of the categories m entioned above is included in this collection. W here

possible and helpful, parsing algorithm s are described in the special-purpose

description language, PARS.

The following chapters constitute th e most thorough exam ination of the

practice of dependency parsing yet assembled. C hapter 12 builds on this

m aterial w ith a view to outlining some elements of a general taxonom y of

dependency parsing algorithms.

75
C h ap ter 4

T h e R A N D parsers

4.1 O v e r v ie w

In this chapter I present the earliest dependency parsers described in this

survey. T he parsers were produced in the early 1960s by researchers at the

RAND C orporation, S anta Monica, USA and reported, for th e m ost p art, by

D avid Hays. M ost of the n atu ral language work at RAND was centered on
the developm ent of a Russian-English M T system , of which a parser was con

sidered to be a vital p art. T he choice of DG as th e basis of th e system could

be regarded as n a tu ral considering th e difficulties involved in w riting PSGs

for variable word order languages like Russian — especially as the RAND

work preceded developm ents in PSG for handling variable word order such as

scram bling transform ations (Ross 1967; Saito 1989) or the ID /L P formalism

(cf. G azdar et al. 1985: 44-50). However, in 1961 — when RAND was just

one of m any groups involved in building R ussian-English M T systems — DG

was far from being the ‘n a tu ra l’ choice. Hays claimed th a t “P h rase structure

theories underlie all M T systems being developed in th e U nited States, except

th a t of th e RAND C orporation” (Hays 1961b: 258). As a leading figure in

M T in th e U nited States who was soon to become president of th e Association

for M achine T ranslation and C om putational Linguistics, Hays would almost

certainly have known if there had been any other dependency systems in ex

istence. For an overview of the NLP work carried out a t RAND in the early

76
1960s, see Hays (1961c).
It is hard to over emphasize the im portance of th e RAND work in the

developm ent of dependency parsing. It was probably th e first m ajor project in

com putational linguistics in the W estern world to be based on DG. A lthough

Tesniere’s Éléments de Syntaxe Structurale was published shortly before the

RAND M T project got under way, it is not referenced in any of th e available

publications by Hays or his colleagues. Instead, th e RAND work seems to

draw on an older Russian literature. In fact. Hays reports th a t several Soviet

M T projects m ade use of the notions of dependency. Leading figures in these

projects are nam ed as K ulagina, Moloshnaya, Paduceva, Revzin, Shelimova,

Shum ilina and Volotskaya. U nfortunately, nothing has been found describing

their DG work except H ays’ abstracts presented on page 62 above. T heir work

in other areas of formal linguistics is described in P ap p (1966) and Kiefer

(1968). As th e first widely publicized NLP system based on dependency, the

RAND system set an agenda for future systems to follow. Almost all authors

of the o ther system s described in this thesis acknowledge their debt to Hays

and his colleagues.

It m ust be rem em bered th a t com putational linguistics was ra th e r different

th irty years ago from its present-day condition. Firstly, there were hardw are

and software lim itations which im paired prototyping and which, inevitably,

coloured th e way th a t researchers viewed the problems to be modelled. We

shall see in this chapter some suggestions which seem ra th e r old-fashioned to

m odern eyes. Secondly, m any techniques of linguistic description which are

nowadays taken for granted, were in 1960 still in their infancy or even w ait

ing to be invented. For example, the RAND systems would alm ost certainly

have looked different if their designers had been able to m ake use of complex

feature unification. Thirdly, the prevailing views on w hat constituted difficult

problem s and w hat constituted easy problems were m arkedly different from

present day views. These were days of great optim ism in M T. Hays w rote in

77
1961:

M achine translation is no doubt the easiest form of au to m atic

language-data processing... In 10 years we will find th a t M T is too
routine to be interesting to ourselves or to others. (Hays 1961c:
25)

Of course, events proved him wrong. The US N ational Academ y of Sciences

produced a dam ning report on M T in 1966 which resulted in all US government

funds to M T projects drying up, and w ith them th e dream of constructing fully

functional M T systems. This precipitated the demise of th e RAND M T project

and th e virtual disappearance of DG from W estern com putational linguistics

until the emergence of a new wave of DG research in the 1980s.

In this chapter I present two parsing algorithm s. One of these was imple

m ented in th e RAND M T system and could loosely be described as a ‘bottom -

u p ’ algorithm . T he other is described by Hays in ab stract term s and it is not

clear w hether it was ever im plem ented. It could loosely be described as a ‘top-

dow n’ algorithm . A third algorithm is described very briefly in Hays (1966b).

U nfortunately, insufficient detail is given to reconstruct th e algorithm .

4 .2 T h e b o tto m -u p a lg o r ith m

The bottom -up algorithm was em bodied in th e RAND SSD (‘Sentence S truc

tu re D eterm ination’) program . The principle references are Hays and Ziehe

and Hays (1961a).

4 .2 .1 B a s ic p r in c ip le s

There may, in fact, have been several distinct versions of th e parser described

here. Hays points to the fact th a t work centred around two ‘basic principles’

which could be ‘preserved through a variety of technical v ariatio n s’.

78
B asic principle 1: separate w ord-order and agreem ent rules

T he first basic principle was th a t word-order rules should be isolated from

agreem ent rulesd This principle led to the developm ent of two sub-program s.

The first program selected pairs of words which could serve as candidates to

enter into a dependency relationship on the basis of th eir relative positions.

The second sub-program tested to see w hether a dependency relation was pos

sible on the basis of the gram m atical features and dependency requirem ents of

each word. T he sub-program s could thus be thought of as working alternately;

th e first program selected a pair for th e second program to link or reject. If

the linking program succeeded then the pair-selection program would try to

find a new pair of candidates for linking. If the linking program failed then

the pair-selection program would have to find an alternative pair to be linked.

B asic p rinciple 2: adjacency

T he second basic principle stated th a t ‘two occurrences can be connected only

if every intervening occurrence depends, directly or indirectly, on one or the

other of th e m ’. In other words, this was an explicit adjacency constraint. This,

in tu rn , ensured th a t the class of languages recognized was exactly the class

of context-free languages.

4 .2 .2 T h e p a r sin g a lg o r ith m

T he parsing algorithm iterates through the pair-selection/linking cycle until

there are no more pairs left to select.

P a ir selectio n

T he pair selection procedure effectively embodies th e control strateg y of the

parser. It works by attem pting to link any two words which are im m ediate
^Hays (1961a: 368) states that “this principle has been invented, lost, and re-invented
several times.”

79
neighbours in the input string. Search for im m ediately adjacent pairs pro

ceeds from left-to-right. An attem p t is made to link th e current word w ith its

rightside im m ediate neighbour. If a dependency can be established between

the two words, the dependent drops out of sight, thus creating a new pair

of im m ediately adjacent elements to be tested. The word which is th e head

of the newly created pair becomes the current word. If a dependency is not

established, the next word in the string becomes th e current word. Leftside

neighbours are only checked after a change of current word resulting from a

failure to establish any dependency links.

The algorithm can be described more formally in PARS as follows.

80
IN IT IA L IZ A T IO N : read input words into a list;
C =1.

1. IF C + l = e
T H E N halt

E L S E IF C C+1
T H E N record(C C + l).
remove(C+l),
goto(l)
E L S E IF C + l C
T H E N record(C+l C),
remove(C),
goto(l)
E L S E C:=C+1,
goto(2).

2. IF C=e
T H E N halt

E L S E IF C C+l
T H E N record(C -v C + l),
remove(C+l),
goto(2)
ELSE IF C + l C
T H E N record(C+l C),
remove(C),
C:=C+1,
goto(3)
E L S E C:=C+1,
goto(2).

3. IF C=1
T H E N goto(l)
E L S E IF C-1 -> C
T H E N record(C-l C).
remove(C),
C:=C—1,
goto(2)

81
E L SE IF C C-1
T H E N record(C C-1),
rem ove(C-l),
goto(3).

A lg o r ith m 4.1: H ays’ bottom -up parser

T he parser succeeds in producing an analysis for the whole sentence if ex

actly one word rem ains visible in the input list at the end of th e parse. This

implies th a t all the other words have been successfully linked into th e stru ctu re

and so have disappeared from view.

T he parser reported by Hays produces only a single analysis for an am bigu

ous sentence. This was a lim itation im posed by the then existing technology.

It has to be assum ed by most designers th a t th e cost of a search for

all possible structures is too great to be borne in practice; heuristic
devices of various types therefore appear in m ost SSD program s.
(Hays 1961a: 370)

T he parser favours closer attachm ents over more d istan t ones. Hays suggested

th ree kinds of heuristic which could be used to increase th e likelihood of th e

parser getting the attachm ents right first time. (A pparently this was vital:

th ere are no references to the possibility of backing up after wrong choices.)

W o r d - c e n tr e d o r d e r in g H ays’ first suggestion was to specify for certain

words a p artial ordering for the establishing of th eir dependency relations. For

exam ple, in one trial version of th e RAND SSD system a preposition could not
be linked to its head until its object had been attached to it.

D e p e n d e n c y - c e n tr e d o r d e r in g Dependency relations could be labelled ac

cording to gram m atical type (such as subject). A p artial ordering could th en

be established am ongst types (for example, find subjects before objects).

82
A ss ig n ‘u r g e n c y ’ s c o re s Dependency relations could be assigned ‘urgency’

scores. W henever more th an one possible link existed, the one w ith th e highest

urgency score was allowed to ‘w in’. This was a simple weighting system . Hays

only suggests local scoring of alternative analyses. It would be interesting to

investigate the use of global scoring techniques to choose between alternative

analyses. Of course, b o th approaches presuppose th a t some reliable weights are

available, for example, from a hand-analyzed corpus (see C hapter 7 for more
on this approach to dependency parsing). Hays does not report th e results of

any trials which m ade use of ‘urgency’ scores and it seems unlikely th a t his

suggestion was im plem ented.

L in k in g

T he parsing algorithm presented above shows the order in which w ord-pairs

should be exam ined to check the possibility of establishing a dependency rela

tion between them . In a m odern-day system this would constitute m ost of th e

work of th e parser. T he test for dependency would simply involve an a tte m p t

to unify two complex feature structures, one associated w ith each word to be

tested. If the test succeeds then unification has already built the new com posite

stru ctu re, otherw ise a simple failure is returned. However, in th e early 1960s

no such luxuries were available and so-called ‘agreem ent te sts’ co n stitu ted a

m ajor p a rt of the parsing problem . At least one of H ays’ papers (Hays 1966a)

is entirely devoted to this subject. If they were used, th e heuristics m entioned

above would be im plem ented in the agreem ent testing m echanism. T he de

tails of the various kinds of agreem ent testing are m ostly of little relevance to

m odern readers. However, two of the strategies still hold some interest.

T a b le lo o k -u p Imagine a feature-based gram m ar including a large feature

inventory covering all of the various distinctions possible in a gram m ar. Now

im agine converting every possible feature p erm u tatio n into a distinct atom ic

symbol. This is effectively w hat was done in the RAND SSD system . Each

83
word form was assigned to one of these symbols (or a disjunction of these

symbols). For convenience the symbols used were integers. Assume th a t there

were n distinct integer symbols. An n x n array was set up. In order to find

out w hether a dependency could be established betw een a word form of type i

and another of type j it was necessary to look in the (z, j) - th cell of the m atrix.

This would indicate w hether it was possible to link th e words and, if so, w hat

kind of dependency relation was involved and which word was th e head. In

th e RAND system a 4000x4000 cell array was used and it was projected th a t

a 50000x50000 array would eventually be required! It is little wonder th a t

agreem ent testing came to be viewed as such a significant com ponent of the

parsing problem.

B it en cod in g One of H ays’ suggestions to improve th e efficiency of agree

m ent testing was a modification of th e categorial gram m ar system th a t Lam-

bek had recently developed (Lam bek 1958). Hays’ suggestion was to replace

the atom ic symbols in a category symbol (usually N and S', e.g. S / N ) with

complex symbols. He writes:

In Russian, nouns and adjectives agree in num ber, gender and case;
there are six cases, and the following gender-num ber categories:
masculine singular, feminine singular, neuter singular, and plural.
Let each bit-position of a 24-digit binary num ber correspond to a
case-num ber-gender category, and use the ap p ro p riate num ber as
a com ponent of the gram m ar-code symbol of adjective or noun.
Agreem ent is tested by taking the ‘intersection’.. .If th e intersec
tion is zero, the occurrences do not agree. This m ethod is faster in
operation and requires no stored agreem ent tables; it is alm ost cer
tain to be the m ethod of future operational system s. (Hays 1961a:
373-4).

T here is no evidence th a t this approach was ever tried at RAND. A recent

parsing system which includes a similar strategy using a UCG is described in

A ndry and T hornton (1991) and A ndry et al. (1992).

84
4 .3 T h e to p -d o w n a lg o r ith m

In this section we exam ine Hays’ other dependency parsing algorithm . It is not

clear w hether it was ever im plem ented at the RAND C orporation. Hays de

scribes it in an introductory textbook on com putational linguistics (Hays 1967)

so it is possible th a t it was invented for purely pedagogical purposes.

4 .3 .1 T h e p a r s in g a lg o r ith m

This parser is in th e m inority am ongst the dependency parsers described in

this survey in th a t it embodies a top-dow n control strategy. H ays’ exposition

does not describe the rule system employed by the parser so I shall assum e

th a t dependency rules are expressed in Gaifman form at. Rules m ay th u s take

the following forms:

(13)

b
c 4 ;^.)
where (13a) shows the case where X i has dependents X j ^ - X j^ . (13b) is the

case w here X{ can appear in a sentence w ithout dependents. (13c) notates the

case w here X{ can appear in a sentence w ithout depending on any other word,
i.e. it is th e sentence root.

T he parsing algorithm begins by scanning the sentence for a word which

can serve as the sentence root, i.e. for which there is an entry of ty p e (13c)

in the gram m ar. H aving found th e sentence root, th e algorithm makes it th e

root of a dependency tree. N ext, the gram m ar is searched for a rule of type

(13a) listing possible dependents for th e root, or a rule of type (13b) showing

that the root can occur w ithout dependents. For example, suppose th a t the

sentence root is R; the gram m ar is searched for a rule of type If a rule

is fDund, it is m atched against th e words of the sentence. For exam ple, if the

rule R ( Q , * , S ) is found, checks are m ade to see if the p a tte rn ^Q...R...S'’ is

85
present in the input sentence. If there is a m atch then th e fact th a t these

dependents have been found is recorded in the dependency tree. If there is no

m atch then an alternative rule specifying dependents for th e root is searched

for in the gram m ar. The same is done for every word in th e in p u t string when

it becomes a leaf in th e dependency tree. If a rule of type (13b) m atches any

word X then no more rules of type % (...) are searched for. A sentence has

been successfully parsed if all leaves in th e dependency tree have been m atched

against rules of type (13b) and no words rem ain in the in p u t string which are

not linked in the dependency tree.

I shall say th a t a word X for which a rule of type % (...) is found and

m atched, has been expanded. If the dependency tree is represented as a nested

list, th en expansion replaces one symbol w ith more th a n one symbol. For

exam ple, consider the following sentence:

(14)

Simpson eats haggis

Assum e th a t the sentence is pre-processed w ith a word class recognizer:

(15)

[N: simpson] [V: eats] [N: haggis]

If th e gram m ar contains a rule of the form *(V ), th e dependency tree will

initially look like this:

(16)

([V: eats])

If th e gram m ar contains a rule of th e form V(A^, *,A^), then the d ep en d en cy

tree can be expanded to look like this:

(17)

( ([N: simpson]) [V: eats] ([N: haggis]) )

T hus, it should be clear th a t successful expansion operations increase th e size

of th e tree. Note, however, th a t th e num ber of nodes in th e final tree (177) is

no greater th an the num ber of symbols in the input string. In this resp^ect,

86
top-dow n dependency parsing differs crucially from top-dow n PSG parsing: in

top-down dependency parsing an expansion can not add a symbol which does

not appear in the input string. In top-dow n PSG parsing, of course, ex tra

non-term inal symbols can be inserted by expansion operations. This leads

to the possibility in a top-down PSG parser of an infinite succession of non

term inal symbol insertions, as in the case of left recursion. T he dependency

parsing algorithm described here is capable of recognizing exactly the context

free languages (recall G aifm an’s result) but unlike a top-down C FPSG parsing

algorithm which has not been heuristically constrained, it can never enter

infinite loops, given an arbitrary gram m ar. Thus, it m ust be regarded as

being more robust th an a top-dow n C FPSG parsing algorithm which is always

a t the mercy of the gram m ar w ith which it works. If the C FPSG contains any

left recursive rules then parser can expect, sooner or later, to blunder into an

infinite loop.

T he order in which symbols are expanded is not crucial to H ays’ algorithm ,

although it m ay be im p o rtan t in some applications. If the leftm ost available

leaf were always to be expanded this would lead to a left-to-right depth-first

search. If the rightm ost available leaf were always to be expanded it would lead

to a right-to-left depth-first search. If all nodes a t distance d from the root were

expanded before any nodes a t distance d + 1 were expanded, a breadth-first

search would be im plem ented. This could also be set up to progress left-to-

right, right-to-left or m iddle-out, all at level d before moving on to level d -f 1.

However, these labels describe the ways in which the branches are added to

dependency trees ra th e r th an the order in which words in th e sentence are

built into the trees. For example, a left-to-right depth-first parser would add

th e words of sentence (18) into the tree in th e order: like, giants, jolly, green,
corn, golden.

(18)

Jolly green giants like golden corn

87
Table 4.1: main features of H ays’ bottom -up dependency parser

Search origin bottom -up

Search m anner depth-first
Search order left to right
N um ber of passes one
Search focus pair-based
A m biguity m anagem ent first parse only (heuristics guide choices)

Hays top-dow n parser is intuitively simple b u t since it is best described for

mally in term s of recursive procedure calls, a PARS description of the algorithm

is not particularly illum inating. T he subject of top-dow n dependency parsing

is addressed in C hapter 12, where a top-down algorithm is presented in detail.

4 .4 S u m m a ry

H ays’ first parsing algorithm processes sentences from left-to-right. It is

bottom -up, in th e sense th a t it starts building stru ctu re from th e words in

the sentence ra th e r than from the rules in the gram m ar. Heads do not search
for dependents; neither do dependents search for heads. Instead, th e parser

searches for potential head-dependent pairs and an agreem ent m atrix (‘belong

ing’ to neither word) indicates w hether th e potential dependency can become

an actual dependency. There is never an instance of one m em ber of th e pair

searching for the other member. T he parser produces at m ost a single analysis

for each input sentence by means of depth-first search.

The m ain features of H ays’ first parser are sum m arized in Table 4.1 (the

exact m eaning of entries in sum m ary tables will be discussed in C h ap ter 12).

H ay’s second parsing algorithm processes sentences from heads to depen

dents. It is top-dow n in the sense th a t it builds stru c tu re from the rules in

the gram m ar ra th e r th an from the words in th e sentence. Hays leaves many

of the details of his algorithm unspecified or underspecified. I have a ttem p ted

to show how different search strategies offer variations on th e order in which

Table 4.2: main features of H ays’ top-down dependency parser

Search origin top-down

Search m anner unspecified
Search order unspecified
N um ber of passes one
Search focus heads seek dependents
Am biguity m anagem ent unspecified

a dependency tree is constructed although the resulting tree does not depend

on th e order in which branches are added. No strategy for handling am biguity

is offered.

T he m ain (known) features of H ays’ second parser are sum m arized in T a

ble 4.2.

89
C h ap ter 5

H e llw ig ’s P L A IN sy ste m

5.1 O v erv iew

T he PLA IN system (‘Program s for Language Analysis and INference’) is a

suite of NLP com puter programs developed by P eter Hellwig a t the University

of Heidelberg. The system originated in work Hellwig did in th e early 1970s

tow ards his dissertation (Hellwig 1974). Since then he has continued to develop

his system . A lthough the PLAIN system has been im plem ented in several

different locations around the world (e.g. Cam bridge, Hawaii, Heidelberg, Kiel,
Paris, Pisa, Surrey, Sussex, Zurich) and custom ized for at least three different
languages (English, French and G erm an), Hellwig rem ains the only au th o r on

th e PLA IN bibliography (a copy of which is included in Hellwig 1985: 79).

Basically, the PLAIN system is a parser. I shall not describe any of its

incidental capabilities here. Instead, I shall detail th e form and content of

th e gram m ar th a t PLAIN uses. All linguistic knowledge is w ritten in a sin

gle feature-based representation called ‘D ependency R epresentation Language’

(D RL). I shall examine the way in which the parser uses unification to build

structures, including discontinuous constituents. I shall also show how a chart

can be used to increase the efficiency of th e parser.

90
5.2 D e p e n d e n c y R e p r e s e n ta tio n L a n g u a g e

Hellwig’s prim ary m otivation for basing his parser on dependency is his be

lief th a t DG provides a framework w ithin which “functional, lexical, m or

phological and positional features can be processed sm oothly in parallel”

(Hellwig 1986: 198). This can be done w ithin a single representation lan

guage and a single structure. Hellwig contrasts this w ith, for exam ple, LFG

(K aplan and Bresnan 1982) which builds a c-structure to represent th e syntac

tic constituent stru ctu re of a sentence and a distinct f-structure to represent

the functional dependency relationships between functors and argum ents. He

describes his dependency system in th e following way:

T he salient point of this formalism is th a t the functional, th e lexe-

m atic and the m orphosyntactic properties coincide in every term ,
as they do in the elements of n atu ral language. To p u t it in th e
term inology of LFG: f-structure and c-structure are totally synchro
nized. Since this cannot be achieved in a phrase stru ctu re represen
tation, it is often assum ed th a t there is a fundam ental divergence
between form and function in n atu ral language. (Hellwig 1986:
196).

In effect, Hellwig is offering an existence proof th a t form and function do coin

cide in n a tu ra l language, at least to th e extent th a t they have been m odelled

in the PLA IN system.

A secondary argum ent Hellwig offers for using DG is th a t it deals w ith

discontinuous constituents ra th e r more elegantly th an PSG. There are, after

all, no ‘co n stitu en ts’ to be ‘discontinuous’ in DG. As we shall see, this claim

takes us beyond th e power of Gaifman gram m ars.

5 .2 .1 T h e fo rm o f D R L e x p r e s s io n s

All linguistic inform ation is represented in a unified framework, DRL. Hellwig

describes it in the following terms:

G ram m ar formalisms and com puter languages are usually devel

oped independently. DRL is both a t th e same tim e. In th e sam e

91
Figure 5.1: stem m a showing a simple dependency stru ctu re

spirit as Prolog is tailor-m ade for the purposes of logic, DRL

has been particularly adapted to represent linguistic structures.
W hereas the interpreter for Prolog includes a theorem prover, the
interpreter for DRL is linked w ith a parser. (Hellwig 1986: 195)

The parser is described in the next section. Here, I pursue the question of

linguistic representation. A DRL stru ctu re consists of a bracketed expression,

where the bracketing represents a tree w ith nodes and directed arcs. Arcs

are directed from the node represented by an outer bracketing to th e nodes

represented by each bracketing it contains. Each node is a lexical item . Thus,

an expression representing the stem m a shown in Figure 5.1 has the form shown

in (19).

(19)

(D (A) (B ( O ) (E))

In a DRL expression, the nodes A -E (called ‘term s’) correspond to single

words b u t they are not expressed by atom ic symbols. R ather, they consist

of collections of features in the form of attribute-value pairs. Three types of

a ttrib u te s are grouped together in each DRL term , nam ely a role, a lexeme,

and a complex m orphosyntactic category.

Sentence (20) would be represented by th e DRL expression shown in (21).

( 20 )

The cat likes fish

92
(21)
(ILLOCUTION: assertion: else typ<l>
(PREDICATE: like: verb fin<l> num<l> per<3>
(SUBJECT: cat: noun num<l> per<3>
(DETERMINER: the: dete))
(OBJECT: fish: noun)

This exam ple shows one term per line w ith indentation m arking the hierar

chical stru ctu re of the tree represented. The three different types of attrib u te

in each term are separated by single colons. Roles are listed first. These are

syntactico-sem antic functions. They can be thought of as labels on arcs in the

tree. So, for example, cat is the SU B JEC T of like and fish is th e O B JE C T

of like. Lexemes are listed next. Roles and lexemes express, respectively, the

w ord’s syntagm atic and paradigm atic relations. Together they constitute a

sem antic representation of the sentence. The th ird p art of each term describes

the surface properties of the associated word. This consists of a m ain category

— usually a word class — followed by a set of attribute-value pairs. A ttributes

are, by convention, three-character strings. Values are coded as num bers inside

angle brackets.

The analysis employed in PLAIN does not make use of any non-term inal
constituents. N either does it use em pty categories. Every node in a depen

dency tree m ust correspond to an actual word in the sentence — w ith one
exception. Hellwig argues th a t

T here m ust be som ething to denote the suprasegm ental meaning

th a t a clause conveys in addition to th e semantics of its con
stitu en ts. As a necessary extension of DG, the yield of a clause
is — so to speak — lexicalized... and represented by a term th a t
dom inates the corresponding list (Hellwig 1986: 196).

In order to te th e r this ‘clause’ item to som ething which actually occurs in the

sentence, Hellwig associates it w ith the sentence-final period. T he period, after

all, serves to m ark the ending of a m ain clause and it can — if so desired —

be viewed as a word in a w ritten sentence. Several objections can be raised

to this approach. (W hat about spoken language? W h at about subordinate

93
clauses?) Hellwig is aware of these but he argues th a t th e advantage of treatin g

the period as clause head is th a t it allows a fully consistent system in which

all nodes correspond to actually occurring ‘w ords’ in th e input sentence. He

steps into much more dangerous territory when he goes on to suggest th a t

“punctuation in w ritten language can be interpreted as a similar lexicalization

of clausal sem antics” (Hellwig 1986: 196). However, he does not carry his

suggestion any further in practice.

5 .2 .2 W o rd o r d e r c o n s tr a in ts

In addition to the more familiar surface p roperty features such as finiteness,

person and num ber, a DRL term can also include positional features which

act as constraints on the relative ordering of words in a sentence. Three such

features are reported in the literature: ‘seq’, ‘a d j’, and ‘lim ’. These constrain

th e relative positions of a dependent (D) and a head (H) as follows:

s e q This feature relates to linear sequence. It has two possible values:

1. D precedes H

2. D follows H

a d j This feature relates to the im m ediate adjacency of items. It has two

possible values:

1. D im m ediately precedes H

2. D im m ediately follows H

lim This feature delimits the outerm ost dependents of a word and th u s can

be used to m ark a ‘b arrier’ across which other dependents of th e sam e

word m ay not be ‘m oved’. Once again, this feature has two values:

1. D is th e leftmost dependent of H

2. D is the rightm ost dependent of H

94
Hellwig presents th e DRL term in (22) to illustrate th e use of these word order

features. T he term describes sentence (23), due to Pereira 1981.

(22)

(ILLOCUTION: assertion: adj<1>

(PREDICATE: squeak: adj<l>
(SUBJECT: mouse: adj<l>
(DETERMINER: the: seq<l>)
(ATTRIBUTE: chase: adj<2>
(OBJECT: that : lim<l>)
(SUBJECT: cat: adj<l>
(DETERMINER: the: adj<l>)
(ATTRIBUTE: like: adj<2>
(SUBJECT: that: lim<l>)
(OBJECT: fish: adj<2>)))))))

(23)
T he mouse th a t the cat th a t likes fish chased squeaks.

T he purpose of these positional features is to produce analyses of sentences —

including sentences w ith discontinuous constituents — which do not make use

of transform ations, m etarules or SLASH feature passing, and which leave no

gaps or traces. In this respect, Hellwig’s system is similar to C ovington’s (de

scribed in C hapter 10 below): neither recognizes th e existence of constituents,

either explicitly by m eans of non-term inal phrase labels or im plicitly by means

of an adjacency constraint, so for them there is no difference between estab

lishing a dependency between a head and an ‘unm oved’ word and establishing

a dependency betw een a head and a word which has ‘m oved’ out of its parent

‘co n stitu en t’. C ovington’s system works w ithout any positional constraints at

all whereas Hellwig’s system can use as m any or as few positional constraints

as are required. B oth systems can be constrained to accept only contiguous

groups of dependents if necessary. Hellwig’s claim is to be able to set positional

constraints so as to allow the kind of discontinuous constituency found in n a t

ural language and to disallow the sort of discontinuous constituency prohibited

in n a tu ra l language (e.g. movements across barriers). If Hellwig is correct then

his system will be impressive indeed. In fact, there is a clue to indicate th a t

95
Hellwig’s suggestions are fairly ten tativ e since he proceeds to say th a t “It is

likely th a t appropriate attributes can also be defined for more difficult cases

of extraposition” (Hellwig 1986: 197), thereby suggesting th a t these have not

yet been fully explored.

5 .2 .3 T h e b a se le x ic o n

A base-lexicon is required to associate word forms in th e input sentence with

lexemes and clusters of m orphosyntactic features. The base lexicon consists

of a collection of assignm ents. An assignm ent consists of a word form to the

left of the assignm ent arrow, and a DRL term to th e right of th e arrow. The

following examples (from Hellwig 1986: 197) show some entries in th e base

lexicon.

(24)
a CAT -> (* cat: noun num<l> per<3>);
b CATS -> (* cat : noun num<2> per<3>);
c LIKE -> (* like: verb per<l,2>);
d LIKES -> (* like: verb num<l> per<3>);
e LIKE -> (* like: verb num<2> per<3>);
f FISH -> (* fish: noun per<3>);

None of the entries has been assigned a role. This can only occur during

parsing. E n try (24a) has a singular num ber feature num< 1 > distinguishing

it from the plural num< 2 > in (24b). T he person feature p e r < 1,2 > of (24c)

has a disjunction of values. Entries (24d) and (24e) are required for subject-

verb agreem ent. E ntry (24f) has no num ber feature since fish can be either

singular or plural. Since features are constraints, absence of a feature m eans

absence of any associated constraint.

5 .2 .4 T h e v a le n c y le x ic o n

As well as a base lexicon it is necessary to m aintain inform ation detailing

the kinds of dependents a word m ay have. It would be possible to enter th e

inform ation directly in the base lexicon, for example:

96
(25)

(*: like: verb fin<l> num<l> per<3>

(SUBJECT: _ : noun num<l> per<3> adj<l>)
(OBJECT: _ : noun seq<2>));

T he characters are variables. In an analysis of a sentence they would be

replaced w ith lexemes corresponding to the SU B JEC T and O B JE C T words.

variables are knows as ‘slots’ since they can be filled by dependents. The

SU B JE C T slot can be read as saying th a t the subject m ust be a singular third

person noun which im m ediately precedes its head. T he O B JE C T slot requires

th a t the object be a noun which occurs somewhere to the right of its head.

T he technique of storing valency inform ation in the base lexicon is effec

tive b ut it fails to capture generalizations. O ther forms of the verb like will

have very sim ilar slots and m any other third person singular verbs will have

identical slots. G eneralizations can be m ade very simply by storing th e shared

inform ation in ‘com pletion p a tte rn s’ and setting up a distinct ‘valency lexicon’

which associates com pletion p attern s w ith words. For example, the following

com pletion p attern s would be set up for SU B JE C T (a) and O B JE C T (b):

(26)

a (* : +subject: verb fin<l>

(SUBJECT: _ : noun num<C> per<C> adj<!>));
b (*: +object
(OBJECT: _ : noun seq<2>));

T he feature value ‘C ’ is used to copy feature values from heads to dependents,

i.e. (26a) says th a t the subject will agree w ith its head in person and num ber.

E ntries in the valency lexicon look like those in (27).

(27)

a (:->(*: squeak) (: +subject));

b (:->(*: like) (& (: +subject)
(: +object)));

T hese state th a t squeak just has a subject slot (it is intransitive) whereas like

has b o th subject and object slots (it is transitive). Entries in th e valency

lexicon control the unification of term s from the base lexicon w ith stored com

97
pletion p attern s. Unification is not confined to this task; it is th e principal

structure-building operation in th e gram m ar. For this reason, Hellwig term s

his gram m ar Dependency Unification G rammar (DUG). I prefer to retain this

label to designate any DG based on th e unification of complex feature stru c

tures, and to describe Hellwig’s gram m ar as one variety of DUG (for exam ple

M cG lashan 1992 describes another variety of DUG).

It is possible to have a disjunction of slots (indicated by a com m a at the

head of a list of disjuncts) where a dependent can be in sta n tiated in more th an

one way. For example, Hellwig analyzes relative pronouns as th e subjects of

em bedded sentences. Thus the +subject completion p a tte rn can be expanded

at least to the following:

(28)

(*: +subject: verb fin<l> per<3>

(, (SUBJECT: _ : pron rel<l,C> lim<l>)
(SUBJECT: _ : noun num<C> per<C> adj<l>)));

We have seen how words in the input string can be associated w ith role, lexeme

and m orphosyntactic inform ation in DRL term s. We have also seen how words
can be given slots into which dependents can fit. In th e next section we shall

see how potential dependencies are turned into actual dependencies by the

parser.

5 .3 T h e p a r sin g a lg o r ith m

T he literatu re does not contain a full, clear exposition of th e PLA IN parsing

algorithm . T he content of this section has been constructed from Hellwig’s

1986 COLING paper and from personal com m unication w ith Hellwig.

T he parser m aintains two d a ta structures:

1. A list of DRL expressions corresponding to th e words of the in p u t sen

tence.

2. A queue indicating the order in which words are to be exam ined. The

98
queue contains an explicit end-of-queue m arker. The parser begins at

the left and works towards the right of the sentence so for a sentence

w ith n words (including the period), the queue looks like this: (1, 2, . . . ,

n, end-of-queue).

The parsing algorithm uses these two d a ta structures in the following way:

1. M ake the word at the head of th e queue the current word.

2. Try to find a slot in another word w ith which th e current word can unify.

O nly adjacent words are tried. There are two possible outcomes:

(a) A slot is found for the current word. In this case th e current word is

unified with th e slot of its head to form a single partial dependency

structure. T he pointer to this new stru ctu re is placed at th e end of

the queue.

(b) A slot is not found for the current word. In this case the pointer to

the current word is moved to the end of th e queue.

3. G oto 2 until end-of-queue is reached. W hen this happens move end-of-

queue to the end of the queue and proceed to 4.

4. Try to find a slot in another word w ith which th e current word can unify.

Only words a t one remove (i.e. n — 2 or n -f 2 are tried. T here are two

possible outcomes:

(a) A slot is found for the current word. In this case the current word is

unified w ith th e slot of its head to form a single p artial dependency

structure. T he pointer to this new stru ctu re is placed at th e end of

the queue.

(b) A slot is not found for the current word. In this case th e pointer to

the current word is moved to the end of th e queue.

99
5. G oto 4 until end-of-queue is reached. W hen this happens move end-of-

queue to the end of the queue and goto 2.

T he process term inates when steps 2 and 4 are b o th executed w ith no change

to th e queue.

Hellwig (p.c.) describes this as an island parser. It builds up structure

around word ‘islands’ in the sentence. T he object of step 4, which looks beyond

the im m ediate context of an island, is to detect moved p arts of a discontinuous

constituent.

This is a m ulti-pass parser. D ependents search for heads b u t not vice versa:

heads do not search for dependents. Hellwig makes no claims for th e validity of

the parser as a psychological model; its m otivation is purely im plem entational

and p a rt of the ongoing program m e of research is devoted to parallelizing the

algorithm .

100
I N I T IA L I Z A T I O N : initialize two lists: Pcinter_L and Term.L;
Term_L is an ordered list of DRL terms
corresponding to the words of the sentence;
Pointer.L is an ordered list of pointers
to these DRL terms;
C is a pointer;
CÎ is the term pointed to by C;
C|:Slot is any valency slot in C|;
X and Y are variables;
e is not an absolute end-of-list marker
initialize an empty stack: Stack.

1. IF C=e
T H E N IF top(Stack) = Term_L
T H E N IF length (Term_L) = 1
T H E N succeed
E L S E fail
E L S E push(Stack,Term_L),
remove(Pointer_L,C),
append(Pointer_L,e),
C:= f irst( Poi nte r_L),
goto(2)
E L S E IF CÎ U CT-l:Slot
T H E N remove(Term_L,CT),
remove(Pointer_L,C),
C:=first(Pointer_L),
goto(l)
E L S E IF CÎ U CT+l:Slot
T H E N remove(Term_L,Cî),
remove(Pointer_L,C),
C:=first(Pointer_L),
remove(Pointer_L,C),
append(Pointer_L,C),
C:=first(Pointer_L),
goto(l)
E L S E remove(Pointer_L,C),
append(Pointer_L,C),
C:=first(Pointer_L),
goto(l).

101
2. IF C=e

T H E N remove(Pointer_L,C),
append(Pointer_L,C),
C:=first(Pointer_L),
goto(l)
E L S E IF CÎ U CT-2:Slot
T H E N remove(Term_L,CT),
remove(Pointer_L,C).
C:=first(Pointer_L),
X:=last(Pointer_L),
remove(Pointer_L,X),
Y:=last(Pointer_L),
remove(Pointer_L,Y),
append(Pointer_L,X),
append(Pointer_L,Y),
goto(2)
E L S E IF CÎ U CT+2:Slot
T H E N remove(Term_L,CT),
remove(Pointer_L,C),
C:=first(Pointer_L),
X:=C+2,
remove(Pointer_L,X),
append(Pointer_L,X),
goto(2).
E L S E goto(l)

A lg o r ith m 5.1: Hell wig’s dependency parsing algorithm

5 .4 T h e w e ll-fo r m e d su b str in g ta b le

One of the m ost interesting and innovative aspects of Hellwig’s parser is his use

of a well-formed substring table (W EST) to optim ize processing in th e parsing

of sentences w ith am biguity. W EST parsing has been developed in th e context

of PS G and has not been explored to any great extent in dependency-based

system s. T h e norm al conception of a W EST is of a stru c tu re w ith nodes and

102
edges. To begin w ith, there are as m any edges as there are readings for the

words in the input sentence. W hen a constituent is built an edge is inserted

which spans all of the words which the constituent contains.

Hellwig’s W FST is very like this except th a t his edges are labelled w ith

DRL descriptions of th e words spanned. These descriptions may contain slots.

W hen a word becomes a filler for another word’s slot, th e two are unified and

a new edge is inserted in the W FST spanning w hat was previously spanned

by the two edges. Hellwig’s W FST for th e globally am biguous sentence Flying

planes can be dangerous (Hellwig 1988: 243) is shown in Figure 5.2.

However, the standard view of a W FST assumes th a t constituents are

continuous. An edge serves to m ark everything betw een its end-points as

belonging to one constituent. T he edge is labelled w ith th e nam e of th a t

constituent. This is not sufficient for Hellwig’s parser, which advertises as one

of its benefits the ability to parse discontinuous constituents. If a constituent

is discontinuous, simply m arking its left and right boundaries does not serve

to identify its com ponents since, by virtue of the discontinuity, some of the

m aterial betw een th e endpoints will not belong to th e constituent.

Hellwig’s solution is to adopt a w ord-centred rath er th a n a constituent-

centred approach to W FST parsing. This he does by assigning a bit string to

each word in th e input sentence. Each bit string in an n-w ord sentence consists

of one ‘1’ and n-1 ‘O’s. T he zth word is represented by a bit string w ith th e ‘1’

in ith position. Before any a tte m p t is m ade to establish a dependency relation

between two words, th eir bit strings are added. If the addition involves any

‘carry’ operations (i.e. a 1 is added to a 1) then the dependency is prohibited

even before the slot features have been checked. If no ‘carry ’ operations are

involved, th e process m ay proceed. In this way a W FST can be built up for

discontinuous constituents.

For example, th e words of sentence (29) would be assigned the initial bit

strings shown in (30) (trailing zeros in bit strings and features in DRL slots

103
ILLOC assertion
(PRED can verb fin
(MV be verb inf
(PA dangerous adje))
(SUBJ flying noun
(OBJ planes noun))

ILLOC assertion
(PRED can verb fin
(MV be verb inf
(PA dangerous adje))
(SUBJ planes noun
(ATR flying adje))

(can verb fin

(MV be verb inf
(PA dangerous adje))
(SUBJ flying noun
(OBJ planes noun))
(can verb fin
(MV be verb inf
(PA dangerous adje))
(SUBJ planes noun
(ATR flying adje))
^ (fly in g noun (can verb fin
(OBJ planes noun)) (MV be verb inf
(PA dangerous adje))
(SUBJ _ noun))

[planes noun
(ATR flying adje))

/(flying /^ b e verb inf

noun (PA dangerous adje))
(OBJ .
noun))

/(flying (planes (can (be ^ " \y ^ a n g ero u s (ILLOC

adje) noun verb fin verb inf adje) assertion
(A T R _ (MV __ (PA_ (PRED __
adje)) inf) adje)) verb fin))
(SUBJ
noun))

Flying planes can he dangerous

Figure 5.2: Hellwig’s W FST for Flying planes can he dangerous

104
are suppressed for readability).

(29)

W h at did D anforth say to George?

(30)

BITSTRING TREE
1 (what pron)
01 (do verb fin
(SUBJECT: _ )
(MAINVERB: _ ))
001 (Danforth noun)
0001 (say verb inf
(DIRECTOBJECT: _ )
(INDIRECTOBJECT: _ ))
00001 (to
( _ )
000001 (George noun)
0000001 (ILLOCUTION question
(PREDICATE: _ )

In (29), What is the direct object of say. T he discontinuous tree rooted in ‘say’

is represented unproblem atically in Hellwig’s W FST as shown in (31).

(31)

BITSTRING TREE
100111 (say verb inf
(DIRECTOBJECT: what )
(INDIRECTOBJECT: to
(George)))

W h at Hellwig has done is to discard th e notion of ‘constituency’ and replace

it w ith the notion of ‘consistency’.

W h at is missing from the PLA IN literatu re is a description of exactly how

th e W FST is used in the parsing algorithm to increase the efficiency of th e

parser. Hellwig consistently describes his system as a ‘chart parser’ thereby

im plying a more sophisticated control m echanism th an is necessary in a sim

ple W FST parser. T he omission is particularly disappointing since Hellwig’s

system is, to the best of my knowledge, th e only dependency parser to m ake

use of a W FST in the m anagem ent of ambiguity. We shall retu rn to this topic

in Section 12.6 below.

105
Table 5.1: main features of Hellwig’s dependency parser

Search origin bottom -up

Search m anner depth-first
Search order left to right
Num ber of passes at least two
Search focus dependents seek heads
A m biguity m anagem ent W FST (adjacency not enforced)

5.5 S u m m a ry

Hellwig uses a simple unification gram m ar expressed in term s of complex fea

ture structures. His parser has a bottom -up island-driven control strateg y

which is claimed to be able to build discontinuous constituents w ithout re

course to special registers or feature passing (although more inform ation on

the precise use of the lim feature is required before th e system can be properly

evaluated). W ords look for heads; they never look for dependents. T he parser’s

efficiency is increased by the use of a W FST which differs from stan d ard W FST

parsers in building dependency rath er th an constituency stru ctu res and in rep

resenting non-contiguous collections of dependents.

T he m ain features of Hellwig’s parser are sum m arized in Table 5.1.

106
C h ap ter 6

T h e K ielik on e parser

6.1 O v e r v ie w

In this chapter I exam ine the Kielikone dependency parser. Since Ju n e 1982

th e Finnish N ational Fund for Research and Development (‘SITR A ’) has spon

sored a research project known as ‘Kielikone’ a t the Helsinki U niversity of

Technology. T he aim of the project is th e developm ent of a com puter system

for th e autom atic interpretation of w ritten Finnish. T he m ain application fo

cus of the research is the design and im plem entation of a Finnish te x t interface

to com puter databases. However, th e object is to produce an interface which

is independent of any single database so th a t it can be ported to m any appli

cations. T he overall stru ctu re of the interface system — which has recently

come to be known as ‘SUOM EX’ — is described in Jap p in en et al. (1988a).

Sentence processing in the Kielikone system is achieved by four distinct

modules.

1. A morphological

analyser known as ‘M O R PH O ’ breaks words down into th eir com po

nent m orphs (Jappinen et al. 1983; Jappinen and Ylilammi 1986). This

is vital in an agglutinating language like Finnish since a full form lexicon

would be m uch larger th an for a language like Enghsh which has much

less m orphological variation. By 1987 th e lexicon contained over 35000

lexical entries (i.e. stems) (Valkonen et al. 1987a).

107
2. A parser, known as ‘A D P ’ (A ugm ented D ependency Parser), uncovers

th e dependency stru ctu re of sentences. It is this module which v/ill be

the focus of investigation in this chapter.

3. A logical analyser is responsible for constructing the prepositional m ean

ing of sentences and also for interpreting sentences in their dialogue con

text. Thus the m odule embraces b o th sem antics and pragm atics. In early

1987 this m odule was referred to as ‘D IA LO G ’ (Jappinen et al. 1987:

preface); by 1988 its nam e seemed to have changed to ‘AW ARE’

(Jappinen et al. 1988a: 335).

4. T he fourth m odule appears not to have a name. It serves as th e buffer be

tween th e n a tu ral language understanding m odule and the database. Its

task is to transform interpretations of Finnish sentences into sequences

of form al d atabase queries. In order to m ake this a general purpose

p ortable interface, queries are couched in a d atab ase interlingua called

‘U Q L’ (U niversal Q uery Language). To interface th e system to any spe

cific database it is only necessary to w rite an in terp reter to tra n slate UQL

queries into the form at expected by th e specific database, e.g. SQL.

Some of the dependency parsers covered in this thesis are described on the

basis of ju st one or two papers or reports. W ith th e Kielikone parser there

is an abundance of docum entation. A Kielikone bibliography published in

1987 hsts 53 item s, of which 14 are specifically concerned w ith parsing. This

abundance of literatu re is obviously very welcome to th e stu d en t of dependency

parsing. However, it does introduce some problem s of version control. D uring

th e hfetim e of th e project a num ber of changes in direction have been m ade

and it is difficult to keep track of exactly which incarnation of th e system is

being described a t any given point. As we have already seen, m any of the

com ponents in the system have been given names. W hen new names appear

it is not always clear w hether (i) only th e nam es have changed while the

108
com ponents rem ain the same, (ii) the new names introduce new com ponents

to com plem ent the existing com ponents, or (iii) th e new names introduce new

com ponents to supersede old com ponents. This would all be self evident were

it not for the fact th a t SUOM EX is a very complex system and m ost published

papers can only discuss selected sub-parts of it. It is thus necessary to try to

guess w hether elem ents which are not m entioned have been left out for lack of

space or because they have been quietly dropped from th e system. T he parser

itself suffers from this problem since, as we shall see, its internal stru ctu re is

also ra th e r complex.

6.2 E v o lu tio n o f th e p a rser

In order to aid exposition, I shall plot the m ain milestones in the developm ent

of the parser before turning to a more detailed exam ination of th e m ost recent

version.

6 .2 .1 T h e e a r lie s t v er sio n : tw o w a y fin ite a u to m a ta

T he earliest descriptions of th e parser appeared in

1984 (Nelim arkka et al. 1984a; Nelim arkka et al. 1984b). A t th a t stage th e

developers of the parser were em phasizing three m ain points:

1. T he gram m ar was based on the notion of functional dependency.

2. ‘C o n stitu en ts’ were built middle o u t

3. T he parser built stru c tu re using two-way finite automata.

F u n ction al d ep en d en cy gram m ar

T he parser builds dependency structures consisting of pairs of words in binary

antisym m etric dependency relationship with each other. T h e words involved

in dependency relationships are identified using a ‘regent-dependent’ nom en

clature. N on-term inal phrase nodes or labels do not ap p ear anyw here in th e

109
heitti

a . d v ô r b i a . 1 / subject object
TIME / agent neutral

Nuorena poika kiekkoa

Figure 6.1: a functional dependency stru ctu re

system. However, the term ‘constitu en t’ is used consistently to refer to a word

plus all of its (direct or indirect) dependents. It is even (confusingly) used to

refer to a single word which has no dependents. T h e word on which all others

depend (directly or indirectly) in a constituent is th e ‘h ead ’. Different kinds

of dependency are recognized and these are linked w ith th e trad itio n al syntac

tic functions (or relations) subject, object, adverbial, genitive a ttrib u te , etc.

These, in tu rn , are associated w ith sem antic interpretations such as A G EN T,

N EUTRAL, D IR EC TIV E, etc.

For example, the sentence Nuorena poika heitti kiekkoa (‘W hen he was

young the boy used to throw the discus’) is given a stem m a analysis as shown in

Figure 6.1 (exam ple cited in Nelimarkka et al. 1984a: 169). T his com bination

of dependency, syntactic function, and deep case is w hat is referred to by th e

term ‘functional dependency gram m ar’.

M id d le-ou t stru ctu re building

T he parser is described as being strongly d a ta driven, left-to-right, and bottom -

up. It is also described as building a constituent from th e m iddle outw ards.

This seems slightly inconsistent: left-to-right suggests one control strategy,

m iddle-out suggests another. In fact, the parser is only left-to-right in the

sense th a t it sees word 1 before it sees word 2. It m ay actually end up building

constituents a t the end of the sentence before it has built any a t th e beginning.

110
O verall, the strateg y is very close to th a t of an island parser which starts

constructing ‘islands’ as close to the beginning of the sentence as it can.

Suppose th a t the string the parser is operating on consists of constituents

C\ — Cn (rem em ber, a single word can be a constituent and, if the constituent

consists of more th a n one word, only th e head is visible externally). M iddle-out

control works as follows:

1. T ry to recognize (7,_i as a dependent of Q .

2 . T ry to recognize C.+i as a dependent of C,-.

3. Shift the focus to or C i^\.

Notice th a t the parser only attem p ts to link im m ediately adjacent (i.e. neigh

bouring) constituents. If constituent A meets the dependency requirem ents

of constituent B, then constituent A is ‘absorbed’ into constituent B and so

disappears from sight of the parser. C onstituent B now has a new neighbour

and so the parser can a tte m p t to establish a new dependency link between

them .

T he parser can be envisaged as consisting of a register holding the current

constituent, plus two stacks, one storing the left context, th e other storing the

right context (see Figure 6.2, due to Lehtola et al. 1985).

T he current constituent C either establishes a dependency link w ith LI or

R l, or it is pushed onto one stack and the current constituent register is filled

from th e top of the other stack. T he parser is constructed so as always to

search th e im m ediate left context first.

T w o-w ay finite au to m a ta

T he gram m ar stores inform ation concerning binary dependency relations and

their corresponding functions. However, it is also necessary in this system

to store inform ation specifying w hat all constituents may contain. In other

words, it is necessary to store for each word type a com plete record of all its

111
The register of the
current constituent

LI Rl

L2 R2

L3 R3

The left The right

constituent constituent
stack stack

Figure 6.2: left and right context stacks

obligatory and optional dependents. This can then serve as a model for actual

occurrences of th a t word type. For this task the system uses two-way finite

autom ata,

A two-way finite autom aton (Levelt 1974) consists of a set of states. One

of these is distinguished as the sta rt sta te and one or m ore are distinguished

as final states. T he states are linked by transition arcs betw een th e states.

Each arc recognizes a sentence elem ent and moves th e reading head either to

the right or to the left in the input string. T he au to m ato n accepts an input

string if it begins in the sta rt sta te w ith th e first word under th e reading head

and proceeds to a final state, leaving th e reading head pointing to th e right of

the last word in th e input string.

T he stan d ard idea of a two-way finite au to m ato n is modified som ew hat in

the Kielikone system. Instead of recognizing words in th e input string, each

autom aton recognizes functions like subject, object, etc. Each arc traversal

also serves to build some structure, nam ely to insert a dependency relation

betw een two neighbouring words. T he dependency relation is labeled w ith

112
the nam e of the function specified by th e arc traversed. States are divided

into ‘left’ and ‘rig h t’ states indicating the side of the current word on which

dependents so m arked will be found. Thus, contra Covington (1990b), relative

position is expressed explicitly in the gram m ar of a free word order language.

It has been known for some tim e th a t any language recognized by a two-way

auto m ato n is regular (i.e. type 3, the most highly constrained set of languages

in the Chom sky Hierarchy). This power is not sufficient for th e requirem ents

of n a tu ra l language. To increase the recognition power, several au to m a ta are

m ade to activate one another. They make use of three ‘control’ arcs which shift

processing from the current word to one of its neighbours. These control op

erations are ‘B uildPhraseO nR ight’, ‘FindR egO nLeft’, and ‘F indR egO nR ight ’.

W hen an autom aton has found all of th e obligatory dependencies asso

ciated w ith a given word, the final action of th e autom aton is to m ark the

head ‘-fphrase’, thus indicating th a t th e constituent is complete. O ther, more

specific, features m ay also be used, e.g. ‘is e n te n c e ’, ‘in o m in a l’, ‘± m a in ’.

It is w orth noting th a t a u to m ata ‘know’ nothing about when and why they

were activated. This distributed control (or ‘local control’ as it is referred to

by Kielikone researchers) ensures th a t parsing is strongly d a ta driven. Careful

ordering of function and control arcs in the au to m ata is said to result in very

little backtracking being necessary.

A u to m ata are fairly complex objects in th e Kielikone system. T he only

au to m ato n to be described in the Kielikone literatu re can be found in Lehtola

et al. (1985; 99).

6 .2 .2 A g r a m m a r r e p r e s e n ta tio n la n g u a g e : D P L

It is not clear from the literature w hether the representation language described

in this section was developed concurrently w ith th e com ponents covered in the

section above or w hether it represents a subsequent step.

T he language, ‘D P L ’ (Dependency Parser Language), is a representa

tion language developed as p a rt of the Kielikone project (Lehtola et al. 1985;

113
Lehtola 1986). All functions, relations and a u to m ata were, a t one tim e, ex

pressed in this unified representation language.

Given th a t DPL abbreviates ‘Dependency P arser Language’, it seems

som ewhat incongruous th a t “the m ain object in D PL is a co n stitu en t”

(Lehtola et al. 1985: 100). However, this can be read as m eaning ‘th e main

object in D PL is a word plus all its properties, including its dependents’. T he

gram m ar w riter specifies an inventory of perm itted p roperty names and values.

These can then be built into descriptions. A num ber of operators are available

to relate objects to each other and to perform actions on objects, including

the following:

= equality

: = replacem ent

insertion

< > m ark the scope of an implicit disjunction

0 m ark the scope of an implicit conjunction

—)■ perform all operations on the right

=4» term inate execution after first successful operation

T he definition of Subject shown in Figure 6.3 should serve to illustrate

w hat a DPL entry looks like. This exam ple is taken from Lehtola et al. (1985:

102). I shall not discuss its detail here. T he im p o rtan t point to note is th a t

the gram m ar w riter is forced to w rite a procedural gram m ar. It is generally

acknowledged th a t procedural gram m ars — other th an gram m ars for tin y frag

m ents — are m uch harder to w rite, to understand, to modify and to p o rt th an

declarative gram m ars so it could be argued th a t DPL is not th e best represen

tatio n on which to base a parser. Notice th a t the gram m ar w riter is charged

w ith the task of defining the a u to m ata in D PL as well as th e task of defining

the functions and relations in the gram m ar. Fairly m inor m odifications to the

gram m ar could be expected to require a lot of hard work.

114
(FUNCTION: Subject
( RecSubj -> (C := Subject))
)

(RELATION: RecSubj
((C = Act < Ind Coud Pot Imper >) (D = -Sentence +Nominal)
-> (D = PersPron (PersonP R) (PersonN R)
((D = Noun) (C = 3P) -> ((C = S) (D = SG))
((C = P) (D = PL))))
((D = Part) (C = S 3P)
-> ((C = ’OLLA)
=> (C :- +Existence))
((C = -Treuisitive +Existence) ) ) )

Figure 6.3; a DPL definition of Subject

Before moving on to exam ine the next developm ent in the Kielikone project

we m ust note a cryptic com m ent buried in one of the papers describing the

D PL representation language:

An autom aton can refer up to three constituents to the right

or left using indexed names: L I, L2, L3, R l, R2 or R3
(Lehtola et al. 1985: 101).

Everything else in the Kielikone literatu re seems to suggest th a t th e only con

stitu en ts in sight of the current word are its im m ediate left and right neigh

bours. T he above com m ent seems to suggest th a t the parser really has th ree

cell lookahead and lookback buffers, ra th e r like M arcus’s determ inistic PA R

SIFAL system (M arcus 1980) (which has a three-cell lookahead buffer). This

would be a very im portant point if it were th e case. However, since nothing

else in the literatu re points in this direction we m ust sim ply place a question

m ark beside the above rem ark, and proceed.

6 .2 .3 C o n s tr a in t b a se d g ra m m a r: F U N D P L

As I have previously observed, DPL presented th e gram m ar w riter w ith a fairly

unwieldy formalism. The gram m ar w riter was required to work out complex

control issues. This problem was acknowledged by th e Kielikone team who

115
responded by designing a more user-friendly high-level representation language

called ‘F U N D PL ’ (FU N ctional D PL )d FU N D PL is built on top of D PL so its

functionality is exactly the same. T he crucial difference is th a t the gram m ar

w riter is no longer required to worry about control issues (at least, not to the

sam e extent). FU ND PL is described in Jap p in en et al. (1986).

FU N D PL is basically a constraint system. As such, it is claimed to be

related to other constraint-based

gram m ars such as LFG (K aplan and B resnan 1982), FU G (Kay 1985), and

G PSG (G azdar et al. 1985). In common w ith these systems FU N D PL allows

th e gram m ar to be w ritten as a set of well-formedness constraints. C onceptu

ally, th e job of the parser is to search for an analysis of th e sentence which does

not violate any constraints. However, unlike these other system s, FU N D PL

gram m ars are not unification gram m ars. FU N D PL is sim ply a high level in

te rp re te r which m aps declarative FU N D PL structures onto procedural D PL

structures. T he m ain benefit of FU N D PL is th a t D PL, w ith all of its proce

dural complexity, is no longer visible to the gram m ar w riter. It is no longer

necessary to think in term s of two-way finite autom ata.

F u n c tio n a l s c h e m a ta

FU N D PL constraint structures for the description of constituents are known

as schemata. Each schema has four parts: p attern , structure, control, and

assignm ent, as shown in Figure 6.4

A schem a is triggered by m atching the properties of a constituent w ith

those in the W h e n slot of the schema. (Presum ably th e slot is nam ed to signify

som ething like ‘when this p a tte rn is m atched, use th e schem a’). T he structure

p a rt of the schema lists optional and obligatory dependents for the head of the

constituent. T he O r d e r slot specifies any ordering (concatenation) restrictions

which m ay apply. For example. O r d e r = <D1 D2 R > states th a t D1 m ust

^The pronunciation of this acronym is not known.

116
F JS C H E M A ; nam e
W h e n = [properties] pattern
O b lig a to r y = (functions)
O p tio n a l = (functions) structure
O r d e r = <conc.description>
T r y L e ft = < functions>
T r y R ig h t = < functions > control
D ow n
Up
A s s u m e = [properties] assignment
L ift = function(attrib u tes))

Figure 6.4: the general form of functional schem ata

precede D2 which in tu rn m ust precede the regent. Irrelevant intervening

m aterial is indicated by two consecutive dots (..). O r d e r = < D 1..R ..D 2>

requires D1 to ap p ear somewhere to th e right of R and D2 to appear somewhere

to the left of R. T he O r d e r slot m ay be empty. T he control p art of the

schem a consists of heuristic inform ation to guide the parser’s search order.

This is stored in the T r y L e ft and T r y R ig h t slots. If a w ord’s dependents

are usually, though not necessarily always, found in particu lar locations, the

heuristic inform ation can cut down average search tim e considerably. D o w n

and U p are used to change levels between m atrix and subordinate sentences.

T heir use is not well docum ented. Presum ably their purpose is to prevent

constituents a t one level from being confused w ith those a t an o th er level; it is

not clear how they work and no examples are available. Clearly, th e designers

of FU N D PL are being som ewhat optim istic when they say th a t their system

“liberates a gram m ar w riter from control anxieties” (Jap p in en et al. 1986).

T he A s s u m e slot assigns new features (e.g. -fPhrase) to the regent once the

schem a has been fully m atched and bound. T h e L ift slot is like th e A s s u m e

slot except th a t it copies features from a dependent to th e regent. For example,

‘L ift= S u b je c t(C a se )’ copies the S ubject’s case feature to the regent.

T he exam ple shown in Figure 6.5 appears in Jappinen et al. (1986: 463).

It is the functional schema for norm al Finnish transitive verbs which allow u n

lim ited adverbials on either side. T he schem a allows all ordering perm utations

117
(F J S C H E M A : V PT rA ct
W h e n = [Verb Act Ind + Transitive]
O b lig a to r y = (Subject O bject)
O p tio n a l = (Adverbial*)
T r y L e ft = < S u b ject O bject A dverbial>
T r y R ig h t = < O bject Adverbial S u b ject>
A s s u m e = [+ P hrase + Sentence))

Figure 6.5: a schem a for Finnish tran sitiv e verbs

((R = Verb Act

< (< Ind Cond Imper Pot Ilpartis > (PersonP D)(PersonN D)
-Negative -Auxiliary)
(Auxiliary Ilpartis Norn -Negative)
(Negative < (Imper Pr < (S 2P) Neg >)
(Cond Pr S 3P) (Pot Pr Neg)
(Ilpartis Nom)> -Auxiliary)>)
(D = PersPron Nom))...

Figure 6.6: the binary relation ‘S u b ject’

am ong dependents b u t it ‘prefers’ SYO order.

B in a r y r e la tio n s

N otice th a t functional schem ata specify th e possible com ponents of a con

stitu e n t. T hey do not contain any inform ation detailing w hat m ight co n stitu te

a legitim ate dependent of the regent. For example, th e schem a in Figure 6.5

records th a t a transitive verb requires a subject b u t it says nothing about

w hat m ay legitim ately serve as a subject. In the FU N D PL system , functional

schem ata — which are generalized descriptions of th e stru c tu re of constituents

— are com pletely distinct from binary relations which define all perm itted

dependency relations which may hold betw een pairs of words in Finnish sen

tences. B inary relations are boolean expressions which succeed if all conditions

are m et, otherwise they fail. U nfortunately, th e literatu re offers only half a bi

nary relation by way of example. This half, which is p a rt of th e binary relation

‘S u b ject’, is shown in Figure 6.6 (Valkonen et al. 1987b).

T he regent R m ust be an active verb. F u rth er restrictions appear w ithin

118
(CATEGORY : SynCat
< (Word)
(Noun ! Word)
(Proper ! Noun)
(Common ! Noun)
(Pronoun ! Word)
(PersPron ! Pronoun)
(DemPron ! Pronoun)
(IntPron ! Pronoun)

Figure 6.7: the ‘S ynC at’ category

th e disjunctive angle brackets. expresses negation. T he dependent D m ust

be a personal pronoun. T he significance of round brackets is not clear. If

the conditions for both R and D are satisfied then th e value of the relation is

‘T ru e ’, i.e. a dependency relation can be established between them .

T y p e d efinitions

A FU N D PL gram m ar includes type definitions of three varieties: C A TE

G O R IES, FEA TU R ES, and PR O PE R T IE S.

C A TEG O RY definitions set up hierarchical relations am ongst names. This

allows properties to be inherited autom atically by lower individuals from

higher individuals in the hierarchy. For example, a category SynC at, con

sisting of a word class hierarchy, would be defined as shown in Figure 6.7

(Valkonen et al. 1987b: 219).

T he ‘!’ symbol can be read as ‘isa’.

FE A T U R E definitions record the names and possible values of features.

PR O P E R T IE S are like features except th a t they can have default values.

For example, the following property definition (from Valkonen et al. 1987b:

219) records the fact th a t ‘P o lar’ can have two values, ‘P o s’ or ‘Neg’. T h e

value of ‘P o la r’ is ‘P os’ by default.

(32)
(PROPERTY: Polar < ( Pos ) Neg >)

119
Lexicon

T he FU N D PL lexicon records idiosyncratic, non-inferrable features for words.

Thus it consists of wordrfeature stru ctu re pairings.

This concludes my sketch of the evolution of th e Kielikone parser. In

evitably, some features have not been covered. Some of these were left out

because they were minor, ephem eral suggestions. O thers were left out be

cause the literature contains insufficient or confusing inform ation. For exam

ple, K ettunen 1986 mentions a parser called ‘DADA’ (an acronym from the

unlikely designation ‘Dependency Analysis is D ependency A nalysis’!) and de

scribes it as being p a rt of the Kielikone system. T he parser is never heard of

again so it is hard to tell w hether it was a short-lived altern ativ e to the older

system or simply a confusion of names.

In th e next section I explain how the FU N D PL com ponents I have described

fit together in the most recent version of the KieUkone parser.

6.3 T h e p arser

T he best texts describing the present state of the parser are Valkonen et al.

(1987b) and K ettunen (1989). There is not full agreem ent between these p a

pers — they even disagree about the nam e of the parser! Valkonnen et al. call

the parser A D P and describe FU N D PL as a declarative high level language.

K ettunen consistently refers to FU N D PL as a parser, even in th e title of his

paper Evaluating FUND PL, a dependency parser fo r Finnish. However, since

K e ttu n en ’s usage seems to be idiosyncratic I shall ignore it.

6 .3 .1 T h e gram m ar

T he gram m ar accepted by th e parser is w ritten in FU N D PL. It consists of the

four com ponents described in the previous section, namely

1. T ype definitions, consisting of definitions for categories, features and

properties.

120
2. A lexicon for associating features w ith words. Recall th a t

the SUOM EX system includes a morphological analyzer, M ORPH

(Jappinen and Ylilammi 1986), which analyzes words into their compo

nent morphs. T he role of the lexicon in the gram m ar is simply to add

inform ation which cannot be predicted from general principles.

3. B inary dependency relations which are boolean evaluation functions to

determ ine w hether the features of any two words are such as to allow

them to enter a dependency relationship.

4. Functional schem ata, consisting of definitions of the stru ctu re of con

stituents. These may be under-specified so, for example, relative word

order m ay not be defined, thus allowing any ordering.

6 .3 .2 B la c k b o a r d -b a s e d c o n tr o l

T he stru c tu re of the parsing system is represented by the diagram in Figure 6.8

which appears in Valkonen et al. (1987a: 700) and Valkonen et al. (1987b:
221 ).

T he account of the system ’s stru ctu re offered by its designers proceeds as

follows.

T he system has two knowledge sources, a body of functional schem ata and

a body of binary relations (i.e. boolean expressions). These two knowledge

sources do not com m unicate directly. Instead, they read from and w rite to

a shared d a ta stru ctu re known as a ‘blackboard’. W hen a word becomes the

current word its properties are m atched against the triggering p attern s of the

functional schem ata (i.e. the values of the W h e n slots in the schem ata).

O nly one m atch can be entertained a t any one time. A m atching schema

is used to create an ‘active environm ent’ associated w ith the constituent to

be built around th e current word. This active environm ent is located on the

blackboard and is m onitored by the binary relations. These are used to indicate

when th e properties of a regent and a candidate dependent are such as to

121
BLACKBOARD KNOWLEDGE
SOURCES

An active Schema xxx

environment Functional
description schemata
Partial solutions (local trees)

/ ^ Binary
Other computational state data dependenc
relations

control flow A scheduler for knowledge

data flow sources

CONTROL

Figure 6.8: architecture of the Kielikone parser

allow a dependency link to be estabhshed. W hen the prevailing conditions

allow linking, th e p a rtia l dependency tree is built by “dependency function

applications” (Valkonen et al. 1987b: 221). It is not clear w hat these are or

where they fit in the above diagram . This process continues u n til all of the

obligatory slots (and perhaps some optional slots) have been filled in th e active

environm ent. A t this point th e local p artial dependency tree is com plete and

processing can shift to another constituent w ith another active environm ent,

unless, of course, the constituent to be com pleted has a m ain verb ( + Sentence)

as head in which case th e parse is complete.

T he blackboard is a well known d a ta stru c tu re in artificial intelligence

(H ayes-R oth et al. 1983; Nii 1986). T he principle behind blackboard system s

is th a t several com ponent processes (or knowledge sources) can collaborate in

th e construction of objects residing on the blackboard. T he order in which

objects are added to th e blackboard is determ ined by th e availability of in

form ation to th e processes. T hus, a knowledge source can be th o u g h t of as

122
a dem on w atching the blackboard until som ething appears which th a t demon

is able to process. T he demon writes the resulting stru c tu re to th e black

board and returns to a sem i-dorm ant m onitoring state. In this way, different

knowledge sources can collaborate to achieve some task.

A n exam ple of this kind of blackboard system is the HEARSAY-II speech

u nderstander (E rm an et al. 1981) which used a blackboard to keep track of

th e sentence analysis being developed by several different knowledge sources.

W hether or not this degree of architectural sophistication is really neces

sary in a dependency parser is open to question. T h e m otivation for using a

blackboard is usually th a t it is necessary to apply several knowledge sources

to each stru c tu re in order to generate a solution. In the Kielikone parser there

are only two knowledge sources, namely the functional schem ata and th e bi

n ary relations. It is not even clear th a t these need to be separate knowledge

sources. T h e division is not justified anywhere in th e Kielikone literatu re and

a num ber of other dependency parsers described in this thesis seem to work

adequately w ithout any such division of labour.

6 .3 .3 T h e p a r sin g a lg o r ith m

In this section I describe the parsing algorithm . Before getting too close to

the detail it is w orth attending to the designers’ high-level description of w hat

their system does:

In analysis two ab strac t levels exist. On the regent level (R-level)

are those constituents which lack dependents to fill some required
functional roles. On the dependent level (D-level) are those con
stitu en ts which have becom e full phrases (m arked by th e feature
-f-Phrase) and are therefore candidates for functional ro les... T he
underlying a b strac t view is this. A word enters th e parsing process
via R-level. W hen all dependents of the constituent (th e word)
have been bound (from D-level), it descends to D-level. T here it
remains until it itself becomes bound as a dependent. T hen it
vanishes from sight (Jappinen et al. 1986).

123
T he parsing algorithm is defined by a two-way finite autom aton. This is not to

be confused w ith the two-way finite a u to m ata originally used by th e gram m ar

w riter to define functional schem ata and still built on the fly by the FU N D PL

interpreter. T he parsing algorithm embodied in the au to m ato n consists of five

m ain steps, namely:

1. One of the schem ata associated w ith the current constituent is activated.

2. Search for left-side dependents for the current constituent.

3. T he current constituent is waiting for th e building of th e right context.

4. Search for right-side dependents for th e current constituent.

5. T he schema associated w ith the current constituent has been fully

m atched and becomes inactive. T he current constituent is now a com

pleted (partial) dependency tree.

No m ore th an one schema may be active a t any one tim e, i.e. only one con

stitu e n t may be a t step 2 or step 4 in th e autom aton. However, any num ber

of constituents m ay be at step 3. These are term ed ‘pending’ constituents and

are im plem ented as a PEN D IN G stack. Parsing starts w ith th e first word and

proceeds to the right. A sentence is well-formed if the parsing process yields a

single constituent in step 5.

We shall now consider each step in th e algorithm in greater detail:

1. All constituents have heads, w hether they consist of single words or com

plex dependency structures. A schem a, whose W h e n features m atch

th e head features of the constituent, is activated. It is not clear w hether

m atching m ust be exact or more like unification, i.e. th ere is a m atch if

there is no conflict. Move to step 2.

2. Left-side dependents are searched for on th e basis of th e dependency

requirem ents stated in th e active schema. T here are two possible o u t

comes:

124
(a) There are no left neighbours or left neighbours are a t step 3, pend

ing. Go to step 3.

(b) T he left neighbour is in step 5 (i.e. is a com plete constituent).

B inary relation tests are carried out to establish w hether or not it

is a suitable dependent. If it is then th e left neighbour is subsum ed

in the current constituent which re-enters step 2 (now w ith a new

left neighbour). If binary relations fail, the active schem a enters

step 3, pending.

3. T here are two possibilities here:

(a) T here are no right neighbours. Go to step 5.

(b) T here are right neighbours. Push th e current constituent on the

PEN DIN G stack and go to step 1 w ith the next constituent to the

right (i.e. read in the next word).

4. Search for right-side dependents. If binary relation tests succeed then

subsum e each dependent in the current constituent. R eturn to step 3.

5. T here are two possibilities:

(a) If no constituents rem ain other th an th e current constituent then

the sentence has been successfully parsed. If right-side constituents

exist then go to step 1 with th e next constituent as input (i.e. get

next word from input). If neither of these succeed then go to step

4 and pop PENDING.

(b) Failure.

T he control strategy autom aton is shown in Figure 6.9.

T he above description has been constructed following published descrip

tions of the Kielikone parser as closely as possible. A PARS description of the

Kielikone parsing algorithm is given below:

125
Figure 6.9: the Kielikone parser control strategy autom aton

IN IT IA L IZ A T IO N : read input words into a list;

C is the current word;
C:=l;
initialize an empty stack;
Result is the result variable;
'saturated(C)’ is a condition which succeeds iff
C's valency requirements have been satisfied.

1. IF (C = l V C -l=top(Stack))
T H E N goto(2)

E L S E IF (saturated(C-l) & C ^ C-1)

T H E N record(C -> C-1),
remove(C-l),
goto(l)
E L S E C:=top(Stack),
pop(Stack),
goto(3).

2. IF (saturated(C) v C + l= e )

T H E N goto(4)

E L S E push(C),
C:=C-1-1,
goto(l).

3. IF (saturated(C + l) & C —> C+1)

T H E N record(C C+1).
remove(C+l),
goto(2)

E L S E fail.

126
4. IF (C + l= e & C -1=0 & empty(Stack))

T H E N Result:=C,
succeed

E L S E IF C + l= e
T H E N C:= top(Stack),
pop(Stack),
goto(3)
E L SE C:=C+1,
goto(l).

A lg o r ith m 6.1: the Kielikone dependency parsing algorithm

T he basic parsing strategy should be obvious. Each schema becomes active

and continues to be active until either it builds a complete constituent or it goes

to sleep to wait for the constituents it requires to be built. As th e Kielikone

parser is described, an active schem a is ju st th e d ata stru ctu re th a t happens

to be being m anipulated a t the present mom ent. Active schem ata are not, of

them selves, either active or inactive: they are simply representations. They

are interpreted as being active or inactive according to w hether th e parser is

currently trying to satisfy the dependencies specified by them . T he situation

would be completely different, if each schem a were actually a process rath er

th a n a representation. This would make for ra th e r an interesting parser which

would b ear a family resemblance to a W ord Expert P arser (Small 1983), a

parser which consists of a set of interacting processes, each of which is an

‘ex p ert’ on some word in the lexicon. This flavour of system is mentioned

briefly in th e closing rem arks of Valkonen et al. (1987a: 702):

We argue th a t our blackboard-based com putational model also

gives a good basis for parallel parsing. There should be an own
processor for each word of the input sentence. T he p artial depen
dency trees would be built in parallel and sent to th e m ain process
th a t links them into a parse tree covering th e whole sentence.

For a parallel Word E xpert P arser see Devos et al. (1988).

127
6 .3 .4 A m b ig u it y

A m biguities arise in th e system due to indeterm inacies of th ree distinct kinds:

choice of analyses for hom ographie words, choice of schem ata, and choice of

dependency relations. In parsing, a record is kept of all choice points and

exhaustive enum eration of all possible readings of a sentence is produced by

chronological backtracking. This is not a com putationally efficient approach

to am biguity since it can result in identical structures being built m any times

over.

6 .3 .5 L o n g d is t a n c e d e p e n d e n c ie s

U nder norm al circum stances, dependency relations are established betw een

im m ediately neighbouring constituents. However, this is not possible in the

case of long-distance dependencies where, by definition, p a rt of a constituent

is moved out of its norm al position into another, inaccessible, position.

Long-distance dependency is caused by an elem ent which has

moved from th e local environm ent of a regent to th e local envi
ronm ent of another regent (Valkonen et al. 1987b: 220).

In order to deal w ith long-distance dependencies, a m inor m odification is m ade

to the gram m ar and th e parser. The m odification to th e gram m ar involves

m arking schem ata which can become possible neighbours of moved item s as

having a special (optional) ‘D istantM em ber’ dependency function. This can

act as a place-holder for the moved item which is said to be ‘cap tu red ’. T he

schem a from which th e constituen t can be moved is m arked w ith a ‘D IST A N T ’

clause indicating which dependents could possibly be moved out of th e imm e

diate vicinity of the constituent. For example, a schema m ight contain the

entry:

(DISTANT Object)

indicating th a t an object could be a possible candidate for movement.

128
A m odification to the parser is also required. T h e parser is given an ex

tra register. Any captured constituents are copied from the ‘D istantM em ber’

slot of the ‘h ost’ schema into the special purpose register. This register m ust

be checked in addition to a co n stitu en t’s im m ediate neighbours during the

parsing process. If the item in the register is found to satisfy a dependency

requirem ent of the current constituent, it can be copied from the register into

th e current constituent as a dependent. (I assum e — although this is not

stated explicitly — th a t the register is only checked if a dependency can not

be satisfied by more conventional means). A fter initially being copied into the

special register, the captured constituent is no longer visible in the constituent

which captured it. This could be described as a ‘swooping’ analysis. T he ‘Dis

tantM em ber’ dum m y dependency is similar to th e ‘V isitor’ relation in Word

G ram m ar (H udson 1988b: 202ff; also 189 below). However, unlike W G, the

Kielikone solution does not appear to handle ‘island co n strain ts’ (Ross 1967).

One such constraint stipulates th a t extraction out of a complex noun phrase

(e.g. the claim that Saddam is a wonderful host) is prohibited. There is no th

ing in the Kielikone parser’s treatm en t of movement to stop it from accepting

a sentence w ith this kind of prohibited extraction, i.e. it would parse both of

the sentences in (33).

(33)

a N obody believes the claim th a t Saddam is a wonderful host,

b *W hat does nobody believe the claim th a t Saddam is?

‘D istantM em ber’ is directly analogous to the HOLD register in A ugm ented

T ransition Networks (Woods 1970) and is thus subject to th e sam e kinds of

criticisms. (It is an ad hoc device, it is descriptively inadequate, etc.)

6 .3 .6 S t a t is t ic s a n d p e r fo r m a n c e

T he m ost recent available figures (Valkonen et al. 1987b: 225) report th a t th e

system contains 66 binary relations, 188 functional schem ata and 1800 idiosyn

cratic lexical entries. T he lexicon of th e separate M O R PH O morphological

129
analyzer contains 35000 entries.

It is claimed th a t a recent modification to th e parser (discussed below)

parses unam biguous sentences in linear tim e. This sounds impressive b u t is

misleading. It is not unusual for dependency parsers to operate in linear tim e

on unam biguous sentences (for example, my own parser described in C hapter 9

does so). It is also th e case th a t there exists a class of am biguous languages

(which is hard to describe in intuitively comprehensible term s) which can be

parsed in linear tim e by parsers based on context free gram m ars. (Some ex

amples are given in Earley 1970). It is norm al to cite worst case or possibly

average case complexity rath er th an best case com plexity in order to evalu

ate a parser. U nfortunately, these figures are not published for th e Kielikone

system .

6 .3 .7 O p e n q u e s tio n s

T h eo retica l sta tu s

Unlike some of the other dependency parsers reviewed in this thesis (e.g. the

Lexicase and W ord G ram m ar parsers. C hapters 8 and 9), th e Kielikone parser

is not based on a linguistically m otivated theory. In spite of the fact th a t

Finnish has fairly free word order, it does not have a trad itio n of D C scholar

ship as is the case w ith, for example, G erm an and Russian. Indeed, Tarvainen

(1977) is one of the few texts which makes any a tte m p t a t analysing Finnish

syntax in term s of DG and this work is not m entioned in th e (English) Kielikone

literature.

T here seems to be some uncertainty as to th e statu s of th e Kielikone parser.

Obviously, it is an NLP system w ith a clear application in view, nam ely

th e design of a portable n atu ral language interface to com puter databases.

However, from th e early days of th e project the designers have claimed th a t

they were also developing a cognitive m odel (e.g. N elim arkka et al. 1984a:

168; Lehtola et al. 1985: 106). N ot everyone shares this view. For exam-

130
pie, S taro sta and N om ura (1986: 127) describe the Kielikone parser as having

“evolved from th e com putational rath er th an the linguistic direction” . If the

claim th a t th e Kielikone parser is a cognitive model is to be taken seriously

it m ust be backed up by argum entation and evidence. At the m om ent this is

conspicuous by its absence.

M o d u la r iz a tio n

As they stan d , th e parser and the gram m ar are alm ost distinct — but not

quite.

To begin w ith the gram m ar, the functional schem ata contain U p and

D o w n slots which can be interpreted as control statem ents. They also con

tain heuristic T r y L e ft and T r y R ig h t slots whose sole purpose is to reduce

the am ount of search required of the parser. Jappinen et al. (1988b) have

recently proposed an optim ization of th e parsing algorithm which clearly re

moves the boundary betw een gram m ar and parser. They do this by introducing

an ordered set of constituent types to look for (in much the sam e m anner as

S tarosta and N om ura 1986):

T he basic left-corner-up algorithm can be modified so th a t it hi

erarchically first builds nominal L G T ’s [Locally Governed Trees]
w ithout prepositional modifiers, then L G T ’s governed by preposi
tions and postpositions, then nom inal L G T ’s w ith postpositional
modifying nom inal L G T ’s, and finally th e LG T governed by the
finite verb (Jappinen et al. 1988b: 277).

D iv is io n o f la b o u r

One of the outstanding questions surrounding th e Kielikone parser is why there

is a distinction between functional schem ata and binary relations. This m ight

be restated more succinctly by asking why th e notion of ‘co n stitu en t’ has been

retained a t all. In th e present system, the binary relations are concerned

with the kind of simple pairwise relations fam iliar from dependency gram m ar

whereas th e functional schem ata are concerned w ith larger objects which can

131
be identified w ith constituents in th e traditional sense. In fact, a schema acts

ju st like an X im m ediate dom inance (ID) rule.

In criticizing the Kielikone approach, K ettunen claims th a t:

It seems evident th a t the lexicon should be working more actively

in a dependency parser. In FU N D PL this is not th e case. As
such, FU N D PL is not modelling dependency gram m ar properly
(K ettunen 1989).

This seems like a harsh criticism w ith which to conclude this exam ination of

th e Kielikone parser. However, th e Kielikone researchers have left themselves

open to criticism. A lthough they have been prolific in their o u tp u t, it has

consisted alm ost exclusively of descriptions of th e system s they have built,

and, as has been noted above, these have not always been readily interpretable.

T here has been hardly any real discussion of motives for choices or argum ents

against possible alternatives. Parsers are notoriously difficult to com pare and

evaluate. Bald perform ance figures are not very helpful. W h at is required is

a clear statem en t of th e decisions which th e parser embodies and some strong

argum ents for these decisions.

6 .4 Sum m ary

T he Kielikone parser works from left to right, bottom -up. W ith each input

word it associates an active schema, i.e. a fram e consisting of dependency

slots and heuristic inform ation. Search proceeds from heads to dependents in

a single pass through the sentence.

T he parser is based on a blackboard architecture. W hile th e basic idea

of th e parser is fairly clear, my a tte m p ts to reconstruct th e algorithm on the

basis of published accounts have not m et w ith com plete success.

T he m ain features of the Kielikone parser are sum m arized in Table 6.1.

132
Table 6.1: m ain features of the Kielikone dependency parser

Search origin bottom -up

Search m anner depth-first
Search order left to right
N um ber of passes one
Search focus heads seek dependents
A m biguity m anagem ent chronological backtracking
(heuristics guide search)

133
C h ap ter 7

T h e D LT M T sy stem

7.1 O v e r v ie w

In this chapter I exam ine the D istributed Language T ranslation (DLT) systems

produced by B uro voor Systeem ontwikkeling (‘B S O /R esearch’^).

I begin w ith an overview of th e DLT system. In Section 7.2 I consider

in more detail th e DLT DG formalism. In Section 7.3 I review th e approach

to parsing adopted in the first system prototype. In Section 7.4 I consider

th e more radical solution suggested for the second prototype: a probabilistic

dependency parser.

T he DLT project is a large M T project jointly funded by B SO /R esearch

and the D utch M inistry of Economic Affairs. It began in late 1984 and, so

far, 50 person-years have been invested in it. T he aim of th e project is to

construct a sem i-autom atic M T system. T he precise m eaning of the desig

nation ‘sem i-autom atic’ will become clear shortly. Unlike some of the other

projects described here, there is an abundance of published m aterial describ

ing th e DLT system , including a six-volume book series published by Foris and

devoted entirely to DLT. For present purposes th e m ost interesting of these

are Schubert (1987) and Maxwell and Schubert (1989).

An im p o rtan t design consideration was th e need to give th e system a pow

erful language-neutral inference engine which could be sim ply custom ized for

any language pair. T he effort involved in constructing an M T system is much

^Since 1 July 1990 BSO/Research has been known as ‘BSO/Language Systems’.

134
too great to risk having to re-build the whole system every tim e a new language

is added. T he design adopted in DLT ‘distributes’ the tran slatio n task into

two sub-tasks. Firstly, the source language is tran slated into an interm ediate

representation. Secondly, th e interm ediate representation is tran slated into

the target language. This is not obviously a simplification since where there

m ight have been a single language pair, there are now two. T he rationale for

this approach is th a t all th a t is required in order to add a new language to the

system is to w rite a sub-system for translating betw een th a t language and the

interm ediate representation. Once this has been done, it is possible to tra n s

late from th e newly added language to all of th e other languages in the system

w ithout further effort. T hus, if there are ten languages in th e system and a

new language is to be added, this necessitates th e developm ent of a tran slato r

for one language pair instead of ten language pairs. T h e interm ediate represen

tatio n used in the DLT system is a slightly modified version of Esperanto(!).

In the early prototypes English is the source language and French is th e target.

T ranslation is sem i-autom atic in DLT in the sense th a t th e system can seek

clarification from the user when necessary. W hen th ere are no difficulties, the

system can tran slate from source to interm ediate to targ et as though operating

in batch mode. W hen a problem arises in tran slatin g from source language

to interm ediate representation, the system can query th e user (in th e source

language). For example, if a source sentence is am biguous, th e system is able to

resolve the am biguity by asking the user to select am ongst altern ativ e readings.

T he syntactic framework used in th e DLT system is a version of DG. I

shall describe it in more detail in the next section. DLT is controversial in its

failure to construct explicit m eaning representations for th e sentences to be

translated. M ost M T system s first construct a sem antic analysis of th e source

sentence and then use it to generate a sentence in th e targ et language. DLT

workers have argued th a t this sort of content-oriented approach is a kind of

‘analytic overkill’. In trying to make th e sem antics explicit, a lot of problems

135
are raised which then have to be solved. Instead, they argue for an approach

to translation which focuses on fo rm rath er th an content. Schubert writes:

T here are a good deal of form correspondences, short cuts from

form to form, which can and should be used. These correspon
dences are m ostly not found in the directly visible sy ntactic form
of texts, b u t a t the next level of abstraction, th e level of syntactic
functions th a t are inferrable from syntactic form (Schubert 1987:
202).

In order to effect th e m apping from syntactic structures of one language to

Syntactic structures of another language, a higher-level, dual language ‘con

trastive sy n ta x ’ is required. T h e nam e by which this contrastive sy n tax is

known is m etataxis, from Tesniere’s term ‘m etatax e’.^

M etataxis...is a process which starts w ith syntactically analysed

source language texts as the input and results in a synthesis of syn
tactically correct texts in a targ et language (Schubert 1987: 125).

It is claimed th a t a m etataxis approach to M T does not m ake no use of se

m antics, b u t ra th e r th a t the sem antic inform ation is used im plicitly’.

In a m etataxis-oriented sem antic transfer process, it is possible

to keep deep cases implicit and use sem antic relators th a t are
ra th e r straightforw ardly inferrable from syntactic functions (op,
cit.: 203).

I shall not investigate this claim here. (For m ore inform ation see

Sadler 1989c.) Instead, I shall focus on the way in which DG is used to rep

resent sentence stru c tu re and the way in which th a t stru c tu re is b u ilt by a

parser.

T he DLT system is sum m arized in Figure 7.1, which is based on a diagram

in W itkam (1989: 142).

^"La traduction d’une langue à l’autre oblige à faire appel à une structure différente.
Nous donnerons à ce changement structural le nom de m é ta ta x e ” (Tesnière 1959: 283).

136
Translation 1 Translation 2

SYNTACTIC SYNTACTIC.
IN T E R
SO U RCE M E D IA T E TARGET
LEXICAL LEXICAL ■

Figure 7.1: the D istributed Language T ranslation system

7.2 D e p e n d e n c y g ra m m a r in D L T

A lthough th e DLT system has been well-publicized, my discussion of th e ver

sion of DG on which it is based will be ham pered by th e fact th a t I have

not been able to find any published account of the form of dependency rules

adopted. T he rem arks in this section will accordingly be confined to a discus

sion of general constraints on well-formed sentences.

M any of the constraints on well-formedness are expressed in term s of tree

geometry. In DLT, dependency structures are required to be ‘tru e trees’ rath er

th a n arb itra ry graphs. T h a t is, they m ust be rooted, directed, acyclic, and

non-convergent.

R o o te d T h e root of th e tree represents th e single independent elem ent to

which all other words in the sentence m ust be subordinate.

D ir e c te d T h e directedness of the arcs indicates the direction of th e depen

dency relation holding between heads and dependents.

A c y c lic T he fact th a t the tree m ust be acyclic precludes th e possibility of

interdependence. Word A can not be head of word B in respect of one

dependency relation and dependent of word B in respect of an o th er de

pendency relation as this would lead to the presence of a cycle in the

tree.

N o n - c o n v e r g e n t Links in the tree m ay never converge on a node. T h e effect

of this is to prevent a word from depending on m ore th a n one other

137
word or from depending on a single word by virtue of more th an one

dependency relation.

So far, this definition of well-formed dependency structures is entirely stan

dard. W here it differs from th e conventional model is in m aking no use of

a projectivity or adjacency constraint. In term s of tree geometry, this would

lead to crossing arcs, were it not for the fact th a t surface word order is not

preserved in DLT dependency trees.^

D ependency syntax does not rely on the contiguity principle. Word

order may well play a role in syntactic form, b u t as soon as a word
by means of its syntactic form has been assigned a dependency
type label, syntactic form has fulfilled its function and need not be
rendered in th e tree. D ependency trees thus do not represent word
order. They are not projective, a t least not in th e present model
(op. cit.: 64).

T he DLT dependency gram m ar de-couples w ord-order from dependency. This

is illustrated in Figure 7.2 which shows th e analysis for th e sentence W hom did

you say it was given to? (op. cit.: 103). (DLT dependency trees are usually

represented as Tesnierian stem m as. Arcs are labelled w ith th e nam e of th e

ty p e of dependency relation involved, although I have om itted labels here for

th e sake of readability.) Reading from right to left, notice th a t you precedes

did (unlike in the sentence), and whom is in object position in th e em bedded

sentence, rath er th an in its ‘m oved’ sentence-initial position.

T h e arcs in a DLT tree represent dependency relations b u t w hat do th e

nodes represent? T he simple answer is th a t m ost of th e tim e they repre

sent words, where ‘w ord’ is defined crudely in term s of a string of characters

bounded by space characters. A node is never allowed to represent more th a n

one word. Nodes are even prohibited from representing frozen m ulti-word

foreign language borrowings such as ipso facto.

A lthough nodes signifying more th an a single word are not allowed, a case

is m ade for allowing nodes to signify less th a n one word, i.e. a m orpheme. T h e
^Except in the form of features indicating the word’s position in the input string.

138
did

you say

was

given

wnom

Figure 7.2: dependency analysis of the sentence W hom did you say it was given
to?

argum ents hinge around phenom ena such as English clitics {can’t = can not)

and possessives {E lizabeth^ the Queen o f E n g la n d ^ and th e class of G erm an

verbs which com bine a root w ith a participle in a single word in some contexts

b u t which sep arate them into two words in other contexts. T hus, th e root and

th e participle m ust be identified by different nodes in the dependency tree.

A m ore accurate characterization of the restriction on nodes is th a t they

m ay only be used to represent morphemes, or m orphem e strings sm aller th an

or coextensive w ith the words in which they appear. It is necessary to allow

m orphem e strings to be represented by nodes since it would not be helpful to

recognize a root word and its inflectional affix as separate nodes in th e tree.

Things are not quite as simple as this, since th e DLT gram m ar recog

nizes p unctuation symbols as having a place in th e stru ctu ral analysis of

sentences. For example, the period is used to m ark th e end of a sentence

(van Zuijlen 1990) and th e comma is used as a conjunction in coordinate struc-

139
sing

and

< < ) )
Î Harry

Tom Dick

Figure 7.3: the use of comma in coordinate stru c tu re analyses

tures, such as the one shown in Figure 7.3 (Schubert 1987: 114fF; cf. Hellwig’s

use of p unctuation in DG described on page 93 above).

7.3 A n A T N for p a r sin g d e p e n d e n c ie s

A num ber of parsing approaches have been considered in connection w ith the

DLT project, m ost of them modifications of parsing techniques well tried with

PSGs. In this section I shall briefly m ention three of these — augm ented PSG

(A PSG ), definite clause gram m ar (DOG), and augm ented tran sitio n network

(ATN) gram m ar — which were briefly investigated during th e developm ent of

th e first DLT prototype.

In th e early stages of the DLT project two parsers were developed for a

subset of English in order to com pare their com putational efficiency. These

were based on A PSG s (W inograd 1983: 377ff) and ATNs (W oods 1970;

Woods 1987). It appears th a t the ATN gram m ar perform ed best, I shall

discuss it further below.

Schubert (1987: 213) argues th a t far from being tied to PSG , A PSG

is a general-purpose form alism for th e description of trees which is “suited

for dependency parsing as well.” T he A PSG -based parser was imple-

140
merited in a parsing environm ent developed a t the U niversity of A m ster

dam (van der K orst 1988). However, it stretches the m eaning of ‘dependency

parser’ som ew hat to designate the APSG parser thus. R ather, it is a PSG

parser which is able to m ap constituent stru ctu re onto dependency structure

as it goes along. Its input is a PSG , not a DG. According to K orst the gram

m ar contains 49 non-term inal categories and 27 lexical/punctuation categories

(op. cit.: 6-7). I shall not consider the A PSG parser any fu rth er here.

Schubert argues th a t DCGs (Pereira and W arren 1981) are not inherently

inappropriate for expressing (or parsing) dependency relations. He continues:

I am not aw are of an im plem entation of DCGs involving de

pendency syntax, a t least not for a com plete syntax of a lan
guage. W ith in the DLT m achine tran slatio n project, a small word
parser has been im plemented (by Jo b van Zuijlen) which builds
up dependency trees for morphemes of complex E speranto words
(Schubert 1987: 214).

To the best of my knowledge no further research has been done towards de

veloping a dependency version of DCG. Van Zuijlen’s DCG morphological

analyzer is reported in van Zuijlen (1986a, 1986b)

T urning to th e ATN-version of DG, we find slightly m ore details in the

literature. In fact, an ATN was used in th e first DLT p rototype which was

com pleted in 1988. Schubert writes:

For th e DLT m achine translatio n system , W itkam (1983: IV-

87ff) designed an ATN for E speranto, which is basically
constituency-based and for which he had constituency trees in mind
(W itkam 1983: IV-72ff). W hen dependency syntax was chosen for
th e DLT system , it was easy to equip this sam e ATN w ith tree-
building actions for dependency trees (Schubert 1986: llff, 99ff).
No rearrangem ents whatsoever were required in the ATN in order
to shift from assum ed constituency trees to dependency trees (op.
cit.: 213).
very simple DCG for parsing sentences and constructing dependency trees can
be found in the file d c g .p l in Appendix A.3. The file also includes a predicate
dcg_generate which generates all strings and trees allowed by the grammar. The pro
gram in map_to_dcg.pl (also in Appendix A.3) can be used to map an arbitrary Gaifman
grammar into an equivalent DCG.

141
SUBJ SUBJ

DOBJ DOBJ

lOBJ lOBJ

POBJ POBJ

PRED PRED

PREC PREC

VERB

ADVC ADVC

INFC INFC

PREA PREA

ADVA ADVA

SUBO VC

LIA POSTA

Figure 7.4: an ATN for parsing D anish sentences

ATNs are very simple and effective for parsing languages w ith an adjacency

constraint (i.e. contiguous constituents) in term s of DG. T he exam ple network

in Figure 7.4 is taken from Schubert (1987: 219). It shows th e top level

netw ork for describing the stru ctu re of simple D anish sentences. Labelled

boxes denote nam ed networks; un boxed labels on arcs indicate words to be

consumed. Notice th a t there is considerable scope for variation of word order

am ongst the dependents of the verb. Registers are used to ensure th a t a verb

has th e correct num ber of dependents, e.g. th a t a verb has exactly one subject

(as opposed to either any num ber of subjects or one before, and one after the

verb). Figure 7.5 shows the separate SU B JE C T network.

This dependency parser im plem ents a top-down, left corner parse strategy.

142
NOUN

PRÜNGU:

PLACE
HOLDER

VERB

NUM

Figure 7.5: an ATN for parsing Danish subjects

ATNs im pose an explicit search ordering, although in this case th e relative

order of the verb’s dependents is fairly free. It could be argued th a t this works

against one of D G ’s greatest assets, nam ely its orientation to relationships

am ongst words, ra th e r th an sequencing of words, which is w hat ATNs orient

to.

As is norm ally the case w ith ATNs, the gram m ar and the parser are con

flated. In fact, this is a procedural grammar. In line w ith th e prevailing view

in com puter science and com putational linguistics, I endorse th e view th a t

a clean separation should be m aintained between gram m ars and parsers for

reasons of clarity and m odifiability (e.g. see G azdar and Mellish 1989: 95ff).

Presum ably th e sam e conclusions have been reached by th e DLT team since

they have now abandoned the use of ATNs.

7.4 A p r o b a b ilistic d e p e n d e n c y p a rser

For the second prototype of the DLT system , a completely different approach

to parsing has been adopted. In the earlier prototype, fairly stan d ard rule-

governed parsers were tried. For th e second prototype, experim ents are being

carried out w ith probabilistic parsing m ethods.

143
Probabilities can be incorporated into gram m ars in a t least two ways. First,

gram m ar rules can be augm ented w ith probabilities reflecting th e probability

of each rule actually being used in a context in which it could be used. For

exam ple, the following n o tation could be used to indicate th a t th e rule n(det,*)

is appropriate for 60% of all nouns and the rule n(det,adj,*) for 20% of all

nouns.

P r{n[det, *)) = 0.6

P r{n {d e t,a d j,* )) = 0.2

This inform ation can be used heuristically during parsing so th a t the rule w ith

highest probability is tried first. A lternatively, all possible rules can be tried

and all possible analyses built for a sentence. T he analysis w ith th e highest

p robabihty (calculated from the joint probabilities of all th e rules used) is

selected. In this way probabilities are used to choose am ongst analyses in a

language whose boundaries are fixed.

T h e second way in which probabilities can be built into a gram m ar dis

penses w ith the dichotom y betw een well-formedness and ill-formedness, replac

ing it instead w ith a gram m aticality continuum ranging from fairly ill-formed

constructions through very well-formed constructions. In this approach the

core rules of the gram m ar m ay be assigned probabilities in th e fashion shown

above. Additionally, all other rules possible w ithin th e logic of th e gram m atical

fram ew ork m ay be allowed w ith very low probability. For example, th e A PR IL

(‘Annealing P arser for Realistic Input Language’; Haigh et al. 1988) and RA P

(‘Realistic Annealing P a rse r’; Atwell et al. 1989) projects use th e technique of

sim ulated annealing to reduce th e am ount of search required in order to parse

w ith a gram m ar which does not rule out any stru ctu ral possibilities a priori^

instead assigning very low probabilities to all tree configurations not a tte ste d

in th e corpus. T he object of this approach is to ensure th a t an analysis of

144
some kind is produced for every sentence, including those which conventional

parsers would simply reject as ungram m atical.

It is norm al for the probabilities attached to rules to be derived from em pir

ical studies of tex t corpora. A corpus is first parsed and th e analyses verified.

T he frequency of application of each rule is counted and then used to com pute

the probability of each rule. These probabilities are then projected from the

‘train in g ’ corpus to the rest of the language. A rationale for allowing all log

ically possible rules w ith very low probability is th a t no training corpus will

ever be large enough to furnish examples of th e use of all rules of a n atu ral

language. By allowing every logically possible rule w ith very low probabil

ity it may be possible to make a parser robust enough to produce reasonable

analyses, even for structures not attested in the training corpus.

As far as I am aware. Job van Zuijlen of B S O /R esearch is the first per

son to im plem ent a probabilistic dependency parser. W hile he has investigated

th e theoretical possibilities of using sim ulated annealing in dependency parsers

(van Zuijlen 1989a, 1989b), he has in practice adopted a more straightforw ard

approach in th e probabilistic dependency parser he has actually im plem ented

(van Zuijlen 1990). F irst of all it was necessary to obtain a syntactically a n

alyzed corpus in order to compile a set of rule probabilities. T he Bilingual

Knowledge Bank (BKB) is a corpus-based knowledge source which has come

to be regarded as the heart of the DLT system (Sadler 1989a; Sadler 1989b).

P u t simply, it consists of a fully analyzed text in one language and th e sam e

te x t fully analyzed in another language.^ This can then be tre a ted as a resource

for working out correspondences between the languages. Since the analysis of

a language in th e BKB includes preferred (hand-constructed) parse trees, it

can be used to generate rules and associated probabilities of occurrence. (For

the purposes of probabilistic parsing the fact th a t it is a bilingual knowledge

®According to van Zuijlen (personal communication) a simple rule-based dependency
parser and a graphical tree editor were used to assist the human analyzer. I have no further
information on the rule-based parser.

145
Table 7.1: different dependency links retrieved from the BKB

Word Links
you 17
can 10
remove 4
the 9
docum ent 58
from 16
drawer 37

151

base is of no interest: only one language is exam ined). For his first proba

bilistic parsing experim ent (Jan u ary to April 1990), van Zuijlen used a BKB

tree bank consisting of 1400 dependency trees, representing some 22000 words

from a software m anual. (This is far too small a tree bank to have any sig

nificance outside of an exploratory experim ent.) Corpus-based probabilistic

parsing proceeds in four stages which are identified as R etrieval, C onstruction,

G eneration, and Evaluation.

R e tr ie v a l For each word in th e input sentence, th e corpus is searched. All

of the occurrences of th e word in th e corpus are identified and a record is kept

of all the different pairwise dependency relations in which th e w ord-instances

in the corpus participate. For example, th e num ber of different dependency

links retrieved for the input sentence You can remove the docum ent from the

drawer is shown in Table 7.1 (all examples from van Zuijlen 1990).

In addition to the inform ation regarding the separate dependency links

which point towards and away from th e word instances, a tally is also kept of

the pattern in g of these links w ith each other. Thus, a record of th e individual

links and th e collective p a tte rn s is assembled.

C o n s tr u c t io n A netw ork is constructed by finding pairs of links which ‘fit

together’. Intuitively, these links are descriptions of th e sam e relation from

146
different perspectives, the head perspective and the governor perspective. More

formally, a link can be added in the network if:

1. the governor label of the head link corresponds to the dependent label

of the governor link,

2. the dependent link should be present in one or more of the dependency

p attern s of the governor, and

3. the position of the governor should agree w ith th e direction of the de

pendent link.

T he netw ork produced for the test sentence You can remove the docum ent from

the drawer is represented in Figure 7.6. D ependency links are portrayed as

connected rectangles. Solid rectangles identify dependents, dashed rectangles

identify heads. T he arrow points from dependent to head. N ote th a t of th e

original 151 links found in the corpus, only 19 have fulfilled th e construction

conditions for inclusion in the network.

G e n e r a tio n In the generation phase the netw ork is processed to remove

links which do not form part of any possible coherent parse tree which has a

single root to which all other words are subordinate. T he removal of impossible

links from the network in Figure 7.6 leaves 13 links rem aining in th e network.

(These generate four different trees.)

Van Zuijlen has developed a m ethod for representing m ultiple dependency

trees in a single graph w ith structure-sharing (van Zuijlen 1988). However, its

com plexity is such th a t it can not be described here.

E v a lu a tio n Associated w ith each link in the network is a pair of num erical

values. T he weight of a link is an indication of how well a dependent fits in th e

dependency p a tte rn of its governor, taking the governor’s other dependents into

account. T he suitability of a link is an indication of how well a p articu lar word

is suited to having a specific function w ith respect to a p articu lar governor. T he

147
You CcLii remove the document from the drawer

GOV

SUB. ^ov]

[GQV^HtNFCl

___ I
OBJ

p B J “|— R t R2

^REÔ- PARC
PTSK
OBJ

DET
DET

ATRl B b-j ]
ATRl

DET
DET

§ov]- PCT

Figure 7.6: a dependency link network for th e sentence You can remove the
docum ent fro m the drawer

148
weight and th e suitability measures are merged in an adjustable proportion to

yield the quality of the analysis represented by a given tree. In this way the

alternative readings for the sentence can be com pared and a ‘best analysis’ can

be selected. I shall not explore the m athem atics of th e ‘best analysis’ selection

m ethod here.®

This parser represents an interesting innovation in both th e fields of depen

dency parsing and probabilistic parsing. T he association of probabilities w ith

pairwise dependencies is, to the best of my knowledge, w ithout precedent. It

will be very interesting to w atch this research develop and to see w hat the per

form ance of the parser turns out to be when it has a reasonably large corpus

to operate on. In the m eantim e judgem ent m ust be reserved on it until more

results become available. Because of th e extent to which this parser differs

from th e others in this thesis, detailed comparisons are difficult to make. I

shall refrain from presenting a more formal PARS version of the algorithm or

a worked example.

7.5 Sum m ary

This chapter has presented an overview of the D istributed Language Transla

tion M T project which is based on the idea of m etataxis or contrastive syntax.

I have shown how the functional structures represented by dependency trees

provide a startin g point for th e process of m etataxis. I have briefly noted the

existence of small experim ental quasi-dependency parsers based on augm ented

PSG and definite clause gram m ar. I have looked in more detail a t a depen

dency parser which is no more th a n a slight m odification to a conventional aug

m ented transition network. This im plem ents a top-down, left-to-right parsing

strategy. Probabilities are used to decide the ‘b e st’ analysis when more th a n

one is possible. However, the binary distinction between well-formedness and

®van Zuijlen (personal communication) says “In future work I hope to include incremen
tal evaluation in order to control the size of the solution space during parsing” (original
emphasis).

149
Table 7.2: m ain features of the DLT ATN dependency parser

Search origin top-down

Search m anner depth-first
Search order left to right
N um ber of passes one
Search focus netw ork navigation
A m biguity m anagem ent first parse only

Table 7.3: m ain features of the DLT probabilistic dependency parser

Search origin bottom -up

Search m anner breadth-first
Search order unspecified, u n im p o rtan t
N um ber of passes one
Search focus heads and dependents seek
each other sim ultaneously
A m biguity m anagem ent highest-scoring parse selected

ill-formedness is strictly m aintained.

T he m ain features of the DLT ATN dependency parser are sum m arized in

Table 7.2.

T he latest parser to be developed in the project is much more radical, being

based on the use of rules and probabilities generated ‘on th e fly’ from a hand-

analyzed corpus. T he parser mixes bottom -up and top-dow n search: th e actu al

words of th e sentence are used to construct a gram m ar which th ereafter guides

search. D irection of processing is not crucial to th e parser’s control strateg y

(i.e. there is nothing inherently left-to-right or right-to-left ab o u t it). R ather,

the parser begins by constructing as m any m inim al islands (i.e. word pairs)

as it can and then rules out those which are not consistent w ith a coherent

analysis or w ith w hat is known ab o u t th e co-occurrence of dependency links.

T he m ain features of the DLT probabilistic parser are sum m arized in T a

ble 7.3.

150
C h ap ter 8

L exicase parsers

8.1 O v e r v ie w

Lexicase (S tarosta 1988) is a gram m atical theory developed by Stanley

S tarosta and his graduate students a t th e U niversity of Hawaii over the last

two decades. It is unique in contem porary linguistic theory for a num ber of

reasons. F irst, it is old. T he version of th e theory in use today can be traced

back to a class handout produced by S tarosta in 1970 (S taro sta 1970). To

this a num ber of papers were soon added (e.g. S taro sta 1971a, 1971b). No

other theory of n a tu ra l language m entioned in this thesis has rem ained so sta

ble for such a long time.^ Second, th e theory has been widely field-tested.

Lexicase gram m ars have been w ritten for significant p arts of around fifty dif

ferent languages including m any so-called ‘exotic’ (i.e. not Indo-European)

languages. A pparently th e th eo ry ’s longevity does not stem from the sort of

disregard for th e hard facts of language of which some theories are occasion

ally accused. A th ird fact which distinguishes Lexicase from its rivals is th a t

the theory has been all b u t ignored in the linguistic m ainstream . On first

inspection it seems strange th a t a theory which has been in existence for so

long and which can draw on such an impressive body of descriptive m aterial

should receive so little critical attention. If th e theory were worthless it ought

to have been exposed as such; if it were outstanding it ought to have been

^This may be interpreted positively as evidence of the theory’s proximity to the truth,
or negatively as evidence of the fact that the theory has not been subject to the critical
attention of the wider linguistics community.

151
praised. Neither of these things has happened to any great extent. Instead,

it has been largely ignored. This may be due in p art to the fact th a t the

first book-length introduction to Lexicase theory only becam e available fairly

recently (S tarosta 1988). (At present th e Lexicase literatu re runs to some 130

item s.) The fact th a t this introductory volume has received some positive re

views (e.g. Blake 1989; Fraser 1989b; Miller 1990) m ay signal the awakening of

interest in Lexicase (but see T urner 1990 for a searing a tta ck on th e same vol

ume). Certainly, some of the m ain features which have distinguished Lexicase

from other theories for m ost of its existence — its lexicalism, its recognition of
h ead/dependent asym m etries, its extensive use of features — now form p a rt

of th e tool chest of m ainstream linguistics.

I shall not a tte m p t to evaluate Lexicase theory here. R ath er, I shall sketch

th e m ain points of the theory and examine two parsing algorithm s developed

for use w ith Lexicase gram m ars. Section 8.2 provides an overview of Lexicase
theory. Section 8.3 describes the two Lexicase parsing algorithm s.

8 .2 L e x ic a se th e o r y

S tarosta describes Lexicase as a “panlexicalist m onostratal dependency vari

ety of generative localistic case gram m ar” (S taro sta 1988: 1). It is panlexi

calist in th e sense of H udson (1981a), i.e. the rules of th e gram m ar are lexi

cal rules, expressing relations among lexical item s and features w ithin lexical

entries. Larger structures are seen as sequences of words linked by depen

dency relations. Lexicase is m onostratal in th a t it accounts for th e system

atic relationships among words in sentences by m eans of lexical rules ra th e r

th a n syntactic transform ations. The gram m ar refers to only one level of

representation — the surface level. This is a feature which Lexicase shares

w ith m ost dependency-based theories of language (for notable exceptions see

Robinson 1970; Anderson 1977; Sgall et al. 1986; M el’cuk 1988). D ependency

in Lexicase will be described in more detail in th e next section. Lexicase is

152
generative in the traditional Chom skyan sense — th e rules and representations

are expressed formally and explicitly and are concerned w ith a speaker-hearer’s

linguistic competence. Lexicase is a case gram m ar in th e Fillm orean tradition

(Fillm ore 1968); every nom inal constituent is analysed as bearing a syntactic-

sem antic relation to its regent. However, it has evolved away from m ainstream

case approaches in a num ber of respects. It is localistic (Hjelmslev 1935;

Hjelmslev 1937; Anderson 1971), th a t is, it places strong em phasis on the

use of spatially oriented sem antic features. W hereas m ost case gram m ars are

prim arily concerned w ith situations and ‘deep’ analyses (e.g. Fillmore 1968;

Schank 1975), Lexicase tends tow ards identifying case relations w ith syntactic

relations (in this it accords w ith A nderson’s case gram m ar (A nderson 1971)).

O ther distinctive features of case in Lexicase are th e feature-based form aliza

tion and the requirem ent th a t every verb contain a P a tie n t in its case frame

(the so-called Patient Centrality hypothesis (S tarosta 1978)).

8 .2 .1 D e p e n d e n c y in L e x ic a s e

S tarosta presents his dependency system as a highly constrained version of X

theory. However, he introduces a num ber of constraints on the form of his X

gram m ar, namely:

1. th e lexical leaf constraint;

2. the optionality constraint;

3. th e one-bar constraint;

4. the sisterhead constraint; and

5. the features on lexical item s constraint.

Before exam ining these constraints, it is w orth noting th a t very few discus

sions of X theory m ake clear exactly how constrained an X system needs

to be. There are, of course, m any possible in stan tiatio n s of X gram m ar

(K ornai and Pullum 1990), only one of which could be said to be equivalent
to S ta ro sta ’s DG.

153
T h e lexical leaf constraint

T he lexical leaf constraint ensures th a t all term inal nodes are words. T hrough

out th e years th a t PSG has been used by linguists, term inal nodes have been

used to represent a num ber of different things besides words, for exam ple

m orphem es and dum m y symbols. GB theory (Chomsky 1981) allows em pty

categories such as P R O and t and sub-lexical morphem es such as Tense and

A G R . Amongst dependency gram m arians the same sort of non-word nodes

have been introduced into dependency trees. For example, Robinson proposes

a sub-lexical T (tense) m orphem e (Robinson 1970) and A nderson advocates a

phonetically null 0 node (Anderson 1971: 43).

It is hard to over-emphasize the im portance of the lexical leaf constraint

in Lexicase. It makes explicit the distinction between morphology and syntax:

the associated claim is th a t the m orphological stru ctu re of words is irrelevant to

syntax. It rules out ‘em pty category’ analyses and the possibility of handling

‘m ovem ent’ by associating moved item s w ith ‘gaps’. S taro sta sums up the
effect of this constraint as follows:

T he Lexicase representation thus sticks quite close to th e lexical

ground, accepting as possible gram m atical statem ents only those
which can be predicated of th e actual strings of lexical item s which
constitute the atom s of th e sentence. This constraint plus [the
other constraints] lim it th e class of possible gram m ars by exclud
ing otherw ise plausible analyses and deciding on equally plausible
analyses form ulât able w ithin th e constrained Lexicase framework
(S tarosta 1988: 13).

T he analysis in Figure 8.1 is prohibited in Lexicase. The lexical leaf constraint

requires this sentence to be analysed in a tree stru ctu re w ith exactly three

leaf nodes corresponding to th e three words in the sentence; the stru ctu re in

Figure 8.2 is closer to th e Lexicase analysis. We shall see th e actual form

of a Lexicase tree for this sentence once we have exam ined th e rest of the

constraints.

154
COMP

NP INFL VP
+ TENSE
+ AGR NP

N V

Stan invent

lexicase

Figure 8.1: a syntactic stru ctu re w ith em pty nodes

invented

Stan
lexicase

Figure 8.2: a syntactic stru ctu re w ithout em pty nodes

155
T h e o p tio n a lity constraint

T he optionality constraint states th a t every non-head daughter in a rule is

optional. This is th e standard understanding of ‘o p tio n ality ’ as em bodied in X

PSG (Em onds 1976: 16; Jackendoff 1977: 36; K ornai and Pullum 1990). No

tice th a t this does not exclude the possibility of a phrase containing more th an

one obligatory element; indeed, this is th e norm al case in exocentric construc

tions such as prepositional phrases. Starosta argues th a t “unlike conventional

versions of dependency g ra m m a r... Lexicase does not require th a t every con

struction have a single head” (S tarosta 1988: 12). This is misleading: con

ventional versions of DG do not require th a t every construction have a single

head; rath er, they require th a t every dependent ha.ve a single head. The notion

‘head of a construction’ is at best derivative in m any dependency theories .

However, Lexicase retains the idea of th e construction or phrase (although it

is not clear w hat work it does. Most constructions are endocentric and have a

single head. T he rest are exocentric and contain at least two coheads^ exactly

one of which is the lexical head and the rest of which are phrasal heads. Two

kinds of exocentric construction are recognized, nam ely prepositional phrases

and coordinate constructions. In prepositional phrases th e preposition is the

lexical head and the noun is the phrasal head. In coordinate constructions the

lexical head is the conjunction and the conjuncts are each phrasal heads.^

It is im p o rtan t to understand S taro sta’s use of the term s ‘endocentric’,

‘exocentric’, and ‘h ead ’. In Bloomfield’s sem inal discussion of the endocen-

tric/ex o cen tric distinction (Bloomfield 1933: 194-7) his definitions rested upon

th e su b stitu tab ility of one word in a construction for th e construction as a

whole. T he distribution of poor John and John is identical so John is the

head of an endocentric construction. N either in nor Wales has th e same dis

trib u tio n as in Wales so the construction is exocentric. By Bloomfield’s defi

nition, coordinate constructions are endocentric since fish^ chips, and fish and
'Conjuncts may, of course, be realized as single lexical items.

156
chips all have th e same distribution. However, by S taro sta’s optionality-based

conjunction-as-head definition, coordinate constructions are exocentric. It is

clear th a t when S tarosta refers to a ‘h ead ’ he is referring to a relationship

which holds between a word (or words) and a whole construction. He reserves

the term ‘regent’ to describe a word in relationship w ith a dependent word.^

For exam ple, in the sentence I saw big bad John^ John is th e regent of big', it

is also the regent of bad; but it is the head of the whole phrase big bad John.

T h e one-bar con strain t

The one-bar constraint states th a t “each and every construction (including the

sentence) has at least one im m ediate lexical head, and every term inal node is

the head of its own construction” (S taro sta 1988: 14). This has the effect of

guaranteeing th a t only single-bar phrases are possible nodes in a Lexicase tree.

Every term inal node has its own one-bar projection and every non-term inal

node is an X which is a m axim al projection of its head X.

The m ost im portant consequence of th e one-bar constraint is th a t it is no

longer possible to analyse a sentence as consisting of an NP followed by a VP.

R ather, a sentence is analysed as a V and the subject can be analysed as bo th

a sister and a dependent of the m ain verb. S tarosta argues th a t the absence

of a V P removes the need to introduce an ab stract IN F L node to do to the

subject w hat the verb would have done if it were the head of the sentence.

T he effect of the one-bar constraint on th e sentence shown in Figures 8.1

and 8.2 is to reduce and simplify the range of possible stru ctu ral analyses. T he

overall shape of the tree thus constrained would be similar to th a t shown in

Figure 8.3.
^Starosta (personal communication) names the Kielikone project (described in Chapter 6)
as his source for this usage.

157
Stan lexicase

Figure 8.3: a syntactic stru ctu re constrained by th e one-bar constraint

T h e s is te r h e a d c o n s tr a in t

T he sisterhead constraint states th a t “lexical items are subcategorized only by

their dependent sisters” (S tarosta 1988: 20). In other words, all gram m atical

relationships are statable in term s of regent-dependent pairings. Any word

which depends directly or indirectly on X is said to be in the syntactic dom ain

of X.
The relationship between regents and dependents is antisym m etric; regents

are sub categorized by their dependents b u t dependents can not impose con

straints on their regents. For exam ple, a dependent could not require its regent

to precede it.

T h e f e a tu r e s o n le x ic a l ite m s c o n s tr a i n t

The ‘features on lexical item s c o n strain t’ states th a t “features are m arked only

on lexical item s, not on non-term inal nodes” (S taro sta 1988: 23). This con

strain t is the final step from a stan d ard X gram m ar to a DG. If only the lexical

item s carry features, and lexical item s are sub categorized by th eir dependent

sisters, then clearly all the X stru ctu re is doing is relating lexical item s p air

wise. This can be clarified by simplifying th e Lexicase tree representation

further. Since every node in a tree is a one-bar projection of its head lexical

item , node labels are predictable and therefore redundant. Thus th e analysis

158
invented
Stan |"+y "| lexicase
[.N ] - - M

Figure 8.4: a Lexicase syntactic stru ctu re

of the sentence Stan invented Lexicase can be represented finally as shown in

Figure 8.4.

This looks rem arkably like a trad itio n al dependency stem m a except for

th e presence of a feature m atrix attach ed to each word. In Figure 8.4 only

th e word class features have been shown. However, several different kinds of

feature m ay ap p ear in the lexical entry for a word. These are described in the

next section.

8 .2 .2 L e x ic a l e n tr ie s in L e x ic a s e

A ssociated w ith each word in the lexicon is a bundle of features. Features

can be divided into contextual and non-contextual features. N on-contextual

features are binary; a lexical item either has or has not got some property.

The presence of property P is identified thus: [+P], its necessary absence thus:

[-P]. C ontextual features determ ine which words are dependent on which other

words. They can be viewed as well-formedness conditions on th e dependency

trees associated w ith the words in a sentence. C ontextual features can be

positive, negative, or im plicational. The following exemplify some uses to

which features can be put:

159
(34)
a [-j-Det]
b [-fint]
c [-[+Det]]
d [+[+N]_]
e [+-(+N ]l
f O -I+ N ll

Exam ple (34a) is a positive non-contextual feature indicating th a t th e word

bearing the feature is a determ iner. Exam ple (34b) is a negative non-

contextual feature indicating th a t the word bearing th e feature is not finite.

Exam ple (34c) is a negative contextual feature indicating th a t the word bearing

the feature does not have a dependent determ iner (relative position u n stated ).

Exam ple (34d) is a positive contextual feature indicating th a t th e word bearing

the feature requires a preceding dependent noun. Exam ple (34e) is a positive

contextual feature indicating th a t the word bearing th e feature requires a fol

lowing dependent noun. Exam ple (34f) is an im plicational contextual feature

indicating th a t the word bearing th e feature is expected to have a following de

pendent noun. Under certain circum stances the expected word m ay be absent

(for exam ple, in the case of ‘m oved’ wh-words). Double contextual features are

prohibited. T h a t is, a contextual feature may not be included w ithin another

contextual feature. An exhaustive listing of the form al properties of lexicase;

features can be found in S tarosta (1988: 57).

If a Lexicase gram m ar were to consist solely of a num ber of lexical entries;

consisting of contextual and non-contextual features, th en no useful general

izations would be made. However, S tarosta takes the trad itio n al view th a t ai

gram m ar should consist of a set of generalizations and a lexicon should be au

repository for exceptions. It ju st happens th a t all gram m atical rules in a Lex

icase gram m ar are generalizations about lexical item s. Accordingly, he sets 5

up rules which are responsible for inserting all predictable features into lexical 1

entries. These rules he divides into redundancy rules, sub categorization ru le s,,

infiectional redundancy rules, morphological rules, derivation rules, semanticc

160
in terpretation rules, and phonological rules. This classification is purely a de

scriptive convenience. Each type of rule has th e sam e basic operation: if a set

of conditions is m et by a word (the left hand side of the rule) then a set of

features is added to the feature m atrix of th a t word.^

We shall briefly consider the range of features utilized w ithin Lexicase.

T here are five basic types:

1. syntactic category features;

2. inflectional features;

3. sem antic features;

4. case relations; and

5. case forms.

S y n ta ctic category features

Syntactic categories are atom ic. They can not be defined, for exam ple, as

[+N ,+V ]. M ajor syntactic categories are drawn from a very small inventory

which contains th e following items: noun (N), verb (V), adverb (A dv), preposi

tion or postposition (P ), sentence particle (S P art), adjective (A dj), determ iner

(D et), and conjunction (Cnjn). These m ajor categories are divided into dis

trib u tio n al sub categories (e.g. sub categories of N include pronoun and proper

noun) and this sub classification is indicated by th e addition of ex tra features

(e.g. [+prnn], [-fprpn]).

As will become clear in the following discussion, syntactic category features

play a very im portant p a rt in the functioning of a Lexicase parser.

Inflectional features

Inflectional features correspond to the trad itio n al inflectional categories of per

son, num ber, gender, case, tense, etc. These features have a central role to

play in agreem ent so they are also im p o rtan t in parsing.

"^Starosta’s most recent work seems to suggest that there may be some slight formal
differences amongst rule types (Starosta forthcoming).

161
S em an tic features

Sem antic features serve to distinguish words from each other. It is assumed

th a t the gram m ar contains enough sem antic features to distinguish every lex

ical item from every other (non-synonymous) item in respect of a t least one

distinctive sem antic feature. In parsing, sem antic features have the charac

te r of selectional restrictions. These restrictions are im plicational rath er th an

absolute. Thus, the verb drink m ight expect an object m arked [+dkbl] (drink

able) b u t in th e absence of such an object a m etaphorical reading would be

forced. This seems very close to th e position adopted in W ilks’ Preference

Sem antics (W ilks 1975).

C ase relations

Lexicase assumes five ‘deep’ case relations, nam ely A G EN T, PA TIEN T, LO

CUS, C O R R E S PO N D E N T and MEANS. T he P atient Centrality Hypothesis

(S tarosta 1978, 1988: 128ff) asserts th a t there is a PA TIEN T in th e case fram e

of every verb, i.e. every sentence contains a PA TIEN T. T h e inventory of case

relations is kept to only five since m any of th e distinctions typically m ade by

case relations in other Fillm orean systems are m ade by th e sem antic features

in Lexicase. S tarosta and N om ura cryptically claim th a t:

T h e ... reduced non-redundant case relation inventory improves

the efficiency of case related parsing p ro ced u res... It is necessary
to refer to case relations in parsing structures containing m u lti
argum ent predicates, in accounting for anaphora and sem antic
scope phenom ena and tex t coherence, and of course in tran slatio n
(S tarosta and N om ura 1986: 128).

U nfortunately there appear to be no published accounts of how these case

relations should be used in the parser.

162
C ase form s

Unlike the other features, case forms are not atom ic. R ather, they are con

figurations of surface case m arkers such as case inflections, word order, pre-

and post-positions, relator nouns, etc, which function to m ark th e presence of

case relations. They are grouped together according to which case relations

they identify and on the basis of shared localistic features. Case forms are

composed of gram m atical features such as ‘nom inative’ or ‘ergative’ and lo

calistic features such as ‘source’, ‘goal’, ‘term inus’, ‘surface’, ‘association’, etc.

S taro sta and N om ura claim th a t case forms are used by th e parser to recognize

the presence of particular case relations. They state th a t

this m eans th a t in parsing, such inform ation is obtainable directly

by simply accessing the lexical entries of th e case-m arkers rath er
th an by more complex inference procedures needed to identify the
presence of the more usual Fillm ore-type case relations (ibid.).

Once again, this m ust be taken on tru st as no docum ented exam ples are avail

able.

A t the conclusion of this overview of Lexicase, it may be observed th a t the

theory makes use of dependency, although the variety of dependency adopted

is defined in term s of a very highly constrained X system . It also makes use

of m any different kinds of features, representing m any different things. A

considerable num ber of pages could be devoted to exploring Lexicase’s statu s

as a case gram m ar b u t this would lead away from my prim ary objective of

investigating dependency parsing. S tarosta and his colleagues have yet to

publish a detailed explanation of the place of case relations and forms in parsing

so I shall not second-guess w hat m ight be intended. A m ore detailed and

critical discussion of case in Lexicase can be found in Valency and Case in

C om putational Linguistics (Somers 1987), although m any of the points made

therein are disputed in S taro sta’s review of th a t m onograph (S taro sta 1990).

T he next section investigates how some of th e featural constraints of Lexi

case are employed in parsing.

163
INPUT

Pre-processor Morphological
Analyser

Placeholder Placeholder
Substitution Expansion

Parser

Det 's
Adj ^s
Adv 's
Conjunctions
Orphans

OUTPUT

Figure 8,5: com ponents of S tarosta & N om ura’s Lexicase parser

8 .3 L e x ic a se p a r sin g

In this section I exam ine two Lexicase parsers. The first, and b e tte r docu

m ented parser, was developed by Stanley S tarosta and H irosato N om ura (N T T

R esearch Labs, Tokyo) and reported in C O LIN G ’86. T he second is th e p ro d

uct of Francis Lindsey Jr., a grad u ate stu d en t a t the U niversity of Hawaii. It

is described in a short technical report.

8 .3 .1 S ta r o s ta a n d N o m u r a ’s p a r se r

T he principle reference for S taro sta and N om ura’s parser is S taro sta and No

m ura (1986).

C om p on en ts

T he overall architecture of the parser is shown in Figure 8.5.

164
T h e p re -p ro c e sso r The pre-processor replaces each word in the input

sentence w ith a feature m atrix, fully specified for all contextual and non-

contextual syntactic and sem antic features. If a word form in the input sen

tence could correspond to more th an one feature m atrix then the word is

replaced w ith a ‘cluster’, a list of all the possible feature m atrices. The o u t

p u t of th e pre-processor is a string of feature m atrices and clusters of feature

m atrices corresponding to the words of the input sentence.

M o rp h o lo g ic a l a n a ly z e r T he pre-processor is a basic look-up system which

finds a word in the input sentence and looks it up in the gram m ar-lexicon. If

the word can not be found then the morphological analyzer checks to see if

th e form m atches any known stem-affix p attern . If a m atch is found, further

searches are carried out with the stem to see if any o ther affixes produce

hom ographie words. Once again, all of th e possible feature m atrices are stored

together in a cluster.

P la c e h o ld e r s u b s t i t u t i o n Every cluster of feature m atrices is tem p o rar

ily replaced by a ‘placeholder’ which consists of th e intersection of all feature

m atrices. If the only thing the feature m atrices have in common is th e word

form then th a t is all the placeholder will consist of. T he object of placeholder

su b stitution is to minimize the am ount of processing which has to be done. A

parse can be produced for the unam biguous p arts of th e sentence and then,

when it becomes necessary to try to in tegrate different readings for the am

biguous p arts, th e placeholder can be expanded and different possibilities tried

w ithout any need to reprocess common p arts of the input.

P a rse r T he parser uses the positive contextual sy ntactic features of head

lexical item s to search for dependents. These dependents m ust satisfy the

criteria of the contextual features and they m ust be accessible. According

to the definition of Lexicase, dependency relations (branches in a tree) are

165
not allowed to cross, i.e. Lexicase has an explicit adjacency constraint.^ As

soon as a potential link betw een words is established, th e negative contextual

features of the words are checked. If they are violated, the dependency link

is discarded immediately. After each word pair has been linked by m eans of

positive contextual features and checked and passed by negative contextual fea

tures, th e im plicational sem antic contextual features (selectional restrictions)

are checked. If the link violates the im plicational features the analysis is not

abandoned b u t it is m arked as sem antically anom alous.

P la c e h o ld e r e x p a n s io n Each string th a t contains a placeholder is ex

panded into separate structures by replacing th e placeholder clusters w ith sub

clusters of item s sharing more features in common. The resulting strings are

passed through the parser once m ore to add links th a t become possible as th e

new clusters and entries become accessible. This process of placeholder expan

sion is repeated until all placeholders are eventually resolved into their original

constituent entries. This ensures th a t all possible readings are obtained for a

sentence w ithout any sequences of words having to be reparsed.

P a r s in g a lg o r ith m

Clearly this is a m ulti-pass system . Pre-processing constitutes th e first pass,

m orphological analysis the second, placeholder su b stitu tio n th e th ird and then

some num ber of iterations through th e parser/placeholder expansion cycle. In

principle, there is no reason why pre-processing, m orphological analysis and

placeholder substitution should not take place increm entally from left to right.

However, this would not buy anything ex tra since th e p a rser’s input is required

to be a string of feature m atrices and placeholders corresponding to th e whole

sentence.

T he parser/placeholder expansion process is necessarily cyclic since the

^Since Lexicase is defined as a highly constrained X grammar, the adjacency constraint

is basic and non-negotiable. In DGs which do not owe a debt to X grammar, the adjacency
constraint is an optional extra. It can be used, not used, modified, or whatever.

166
effect of the interacting com ponents is to maximize generalizations about the
sentence and to proceed, iteratively, to all possible specific analyses. The

process produces a maximally general analysis for the whole sentence, then it

copies the analysis and adds different, more specific, details to each copy and

then repeats the process for each copy. T he process runs to completion for

each candidate sentence. T he effect of the parser/placeholder expansion cycle

is similar to th a t of a chart parser in th a t it only builds stru ctu re once, no

m a tte r how m any tim es it is used. However, this system lacks th e elegance

and simplicity of a chart parser’s single pass through a word string. Even if

there were some way for th e Lexicase parser to construct a chart-like structure

in a single pass in order to m anage ambiguity, the parser is still required to

pass through the word string m any tim es for other reasons.

The parser sweeps through the word string eight times during each iteration
of the parser/placeholder expansion cycle. This is because it tries to spot

particular kinds of word on each pass. The passes are ordered as follows:

1. P rep o sitio n s. The parser attem p ts to link each preposition w ith an

accessible N, V, or P by means of contextual features. The object of this

pass is to link P ’s w ith their dependents to form P P ’s which delimit closed

domains whose internal non-head constituents are then inaccessible to

external heads or dependents. Subsequent passes m ay search inside or

outside these phrasal domains b u t they need never consider any links

between internal and external items. Recall th a t P P ’s are considered to

be exocentric and th a t P ’s and N’s have the statu s of coheads. W hen a P

and an N are linked to form a PP, their non-contextual features combine

to form a virtual m atrix for th a t phrase. The features of both coheads

thus become available to sub categorize the head of the phrase in which

the P P is located.

2. V erbs. Verbs are linked to their dependents next to form ‘sentences’.

Once again, this has the effect of delimiting domains w ithin which sub-

167
sequent linking may take place.

3. N o u n s . Nouns are next to be linked to their dependents.

4. D e te r m in e r s . D eterm iners are linked with accessible nouns next. It is

not entirely clear why this phase exists in th e parser since all determ iners

are dependents of nouns in Lexicase, so step 3 should already have linked

them to their regent nouns. It m ust be assum ed th a t w hat is going on

is th a t determ iners select their heads rath er th a n vice versa. This is in

direct contradiction of S taro sta and N om ura’s description of th e o pera

tion of th eir parser: “Based on the positive contextual features of head

lexical item s, the heads are linked to eligible and accessible dependent

item s” (S tarosta and N om ura 1986: 131). W hatever th e statu s of steps

3 and 4 m ight be, their desired effect is obvious: in English determ iners

m ark the left boundary of N P ’s and so linking them to their head nouns

has the effect of closing off domains of government.

5. A d je c tiv e s . Link each adjective w ith an adjacent noun. T he same

points apply here as in step 4.

6. A d v e r b s . Link each adverb w ith a head verb or adjective. Once again

the objections of steps 4 and 5 apply.

7. C o n ju n c tio n s . Link each conjunction w ith one or m ore m ajor con

stituents. Since m ost of th e constituents will already have been discov

ered, the num ber of linking choices should be extrem ely lim ited. Since

coordinate constructions are exocentric, th e non-contextual features com

bine to form a virtu al m atrix for th e whole construction.

8. O r p h a n a g e . Link all rem aining free nouns, determ iners, adjectives,

adverbs, prepositions and verbs w ith an accessible head. All u n attach ed

lexical item s will be found em bedded inside other constructions and th e

attach m en t possibilities will be extrem ely Hmited. T he exceptions are

adverbs and P P ’s which ten d to have m ore possible attach m en ts available

168
to them .

Each of these passes through the sentence could take place in any direction but

it makes sense to proceed from head to dependent. Therefore, passes could

be expected to proceed from left-to-right in head first languages and from

right-to-left in head second languages.

T he presence of apparent contradictions in th e published description of the

parser, coupled w ith the general lack of published fine-grained detail, rule out

th e possibility of a more explicit PARS description of S taro sta and N om ura’s

algorithm .

Given th e algorithm described here, it would not be surprising to find th a t

parsing a relatively short sentence involved som ething of th e order of 100 passes

through the sentence! No perform ance figures are supplied for the parser since

it has never been im plem ented (although this is not m ade clear in any published

description). The fact th a t m ultiple passes are required need not hav a negative

effect on the efficiency of the parser, since the num ber of passes is fixed (i.e.
independent of input length). However, th e fact th a t th e same string has to

be processed tim e and again does beg several questions ab o u t the exact n atu re

of th e d a ta structures used and th e inform ation represented. For example, if a

subtree has been constructed somewhere in a string, does anything prevent the

algorithm looking (pointlessly) at the corresponding substring in subsequent

passes? U nfortunately, answers to im p o rtan t questions of this kind are not

supplied in any published accounts.

A fundam ental problem w ith this algorithm is th a t it does not m aintain

a distinction betw een gram m ar and parser. By building searches for specific

kinds of lexical item s into the parser, S tarosta and N om ura have built in th e

assum ptions (i) th a t all languages make use of th e sam e inventory of word

classes and (ii) th a t the appropriate order in which to analyse th em rem ains

constant between languages. The fact th a t the parser refers explicitly to things

called ‘nouns’ and ‘verbs’ ensures th a t it will fail to work if it is presented w ith

169
a perfectly good gram m ar which happens to use different word class labels

(such as ‘a ’ and ‘b ’) to identify nouns and verbs. In practice this objection

could be m itigated if the algorithm were re-designed so th a t th e parse order

(e.g. P, V, N, Det, Adj, Adv, Conj, O rphans) was defined in a separate

declarative database rath er th a n being hard-w ired into th e algorithm . The

parser would be invoked with two argum ents: a pointer to th e gram m ar to use

and a pointer to the parse order definition to use. Any inconsistency between

these two would lead to the result of the parse being not ‘succeed’ or ‘fail’, but

‘erro r’.

8 .3 .2 L in d s e y ’s p a r se r

Lindsey’s parser is sim pler th a n S tarosta and N om ura’s b u t unfortunately even

less inform ation is available describing it. All of the inform ation in this section

has been gleaned from Lindsay (1987).

Lindsey’s parsing system — known as ‘FL X ’ — was w ritten in Common

LISP and runs on an HP-9000 Series 300 Bobcat w orkstation. It is based on

Lexicase. The system consists of two prim ary com ponents, th e lexicon builder
and the parser.

T h e lexicon builder

The lexicon builder takes as its input a Lexicase lexicon and a set of Lexicase

rules. It uses the rules (which are, of course, statem ents of predictable infor

m ation about lexical entries) to expand out the lexical entries to produce a

fully specified, full form lexicon. In the case of hom ographie entries, a m aster

e ntry containing as its m atrix th e intersection of the features shared by the

m atrices of all th e words with th e same form is created. A m aster en try would

have the form shown in Figure 8.6.

170
(word (category shared features)
(distribution shared features)
(other shared features)
(wordl
(category features specific to wordl)
(distribution features specific to wordl)
(other features specific to word 1)
(word2
(category features specific to word2)
(distribution features specific to word2)
(other features specific to word 2))))

Figure 8.6: a m aster entry showing th e intersection of th e feature sets of two

hom ographie words

T h e parser

Once again, the parser examines the contextual features of a head word in order

to establish dependencies. T he parser assumes th a t each w ord is the head of a

phrase and th a t a phrase is com plete when all dependents of a word have been

found. The parser proceeds as follows (quoted directly from Lindsey 1987: 3):

1. Find the entries for each word of th e input sentence in the lexicon.

2. E lim inate from consideration all words which because of their position

or word class could not be sisters of a head.

3. D eterm ine which words m ust be sisters of a head because of the distri

butional requirem ents of the head.

4. For all words not unam biguously assigned as sisters to some head by

the above steps, determ ine possible alternative assignm ents. This step

provides for m ultiple parses.

5. U npack m aster entries and determ ine which specific hom onym success

fully satisfies all distributional restrictions. This is done from top down,

exam ining the parses given by step 4.

6. P rin t out those parses in which all words of th e in p u t sentence fit into

one hierarchical structure.

171
Steps 1 through 4 set up a list of potentially successful parses which can then
be exam ined as the basis of alternative sentence readings once the m aster

entries have been unpacked. T hus, th e algorithm allows th e simple p arts of

th e parse to be constructed and then reused in successive attem p ts to integrate

alternative readings for words.

U nfortunately, the parsing algorithm is described in term s which are too

terse to be really inform ative. The words “determ ine which words m ust be

sisters of a head because of the distributional requirem ents of th e h ead ” are

tantalizing in w hat they w ithhold rath er th an in w hat they tell. Lindsey’s

examples do not shed light on this process. However, it is clear th a t th e parser

is distinct from the gram m ar in this system. Lindsey writes:

This com plete parsing program is designed as a m odular rule ap

plication system . T he lexicon builder, given the appropriate m in
imally specified lexicon an d Lexicase rules, may be used to create
fully specified lexicons for any lan g u ag e... T he parser is also a flex
ible program since it is nothing more th an a program to determ ine
possible (one-bar) dependency relationships betw een item s in an
input string on th e basis of features associated w ith those item s
(Lindsey 1987: 4-5).

Thus, Lindsey’s system is more flexible th a n S tarosta and N om ura’s but in

sufficient docum entation is available to m ake a more informed comparison. It

is not clear, for exam ple, w hether or not th ere is any loss of analytic accuracy

on th e p a rt of th e simpler system.

Insufficient inform ation is available to construct a PARS description of

Lindsey’s algorithm .

8 .4 Sum m ary

In this chapter I have briefly reviewed th e theory of Lexicase and two parsers

which are based on it. It is clear th a t the theory is m uch b e tte r developed

th a n the parsers based on the theory. T he parsers do not m ake use of th e full

range of Lexicase resources, such as case relations and case forms.

172
Table 8.1: main features of S tarosta and N om ura’s Lexicase parser

Search origin top-down

Search m anner breadth-first
Search order left to right
N um ber of passes at least eight
Search focus heads seek dependents
A m biguity m anagem ent packing/unpacking

Table 8.2: m ain features of Lindsey’s Lexicase parser

Search origin unspecified

Search m anner breadth-first
Search order unspecified
N um ber of passes m ulti-pass
Search focus heads seek dependents
A m biguity m anagem ent packing/unpacking

S tarosta and N om ura’s parser searches top-dow n for dependents for dif

ferent classes of word on each of several passes. Unam biguous p arts of the
sentence are built first and then these ‘com m on’ p arts are copied to different

parse trees, one for each possible reading of the sentence. T he m ain features

of S tarosta and N om ura’s parser are sum m arized in Table 8.1.

Very few details are available for Lindsey’s parser. It seems to share a

num ber of properties w ith S tarosta and N om ura’s parser. For example, heads

seek dependents, and am biguity is m anaged by packing ambiguous words into

clusters which can later be unpacked and tried in different parse trees. The

m ain (known) features of Lindsey’s parser are sum m arized in Table 8.2.

173
C h ap ter 9

W ord G ram m ar parsers

9.1 O v erv iew

In 1976 R ichard Hudson published a m onograph introducing his theory of

‘D aughter Dependency G ram m ar’ (DDG) (Hudson 1976). This publication

was notable for two principal reasons. F irst, it argued th a t transform ations

were unnecessary in syntax — a heretical position in th e linguistic clim ate of

the day. Second, it argued th a t dependency as well as constituency should

have a place in syntactic theory.

By the end of the 1970’s Hudson was arguing against w hat he perceived to
be an under-m otivated and artificial distinction between gram m ar and lexicon

in linguistic theories. Instead, he argued th a t all gram m atical and lexical (and

sem antic) inform ation should be stored in a single homogeneous representation

w ithin a single com ponent — th e so-called ‘pan-lexicon’ — which could be

viewed as a body of facts about words (Hudson 1981a). A round this tim e he

also published an im portant paper in Linguistics (H udson 1980a) arguing th a t

while dependency is necessary in syntax, constituency is not. Clearly, these

two positions — the ‘pan-lexicalist’ and ‘dependency only’ positions — are

com patible. In fact, the first implies th e second since a collection of facts about

words could not include facts about supra-w ord constituents. T he second

implies the first since a grarrunar w ithout constituents leaves th e word as the

largest unit of analysis.

174
These ideas were molded into a coherent theory which came to be known as

W ord G ram m ar (W G) (Hudson 1983). The first m onograph-length description

of the theory appeared as Hudson (1984). Since th e publication of th a t text

th ere has been a m ajor revision of the W G notation and a succession of papers

describing W G treatm en ts of various ‘test case’ constructions such as coordi

nation (Hudson 1988a), extraction (Hudson 1988b), gapping (H udson 1989b),

and passives (Hudson 1989a). The state-of-the-art in W G is detailed in a re

cent m onograph (Hudson 1990), which includes a gram m ar of a substantial

fragm ent of English.

Section 9.2 introduces W G theory. Section 9.3 provides an overview of

W G parsing, and presents W G parsers developed by myself and by Richard

Hudson.

9 .2 W ord G ram m ar th e o r y
9 .2 .1 F a c ts a b o u t w o r d s

A W G consists of a body of facts about words. In this section I describe the

form th a t these facts take and the inform ation th ey contain.

First of all, it is w orth pointing out th a t ‘w ord’ in th e context of WG

includes any w ord-length unit, however specific or general. Thus, the first

word of this sentence, th e lexeme ‘PLIM SO LL’, th e word-type ‘n o u n ’ and the

relation ‘su b ject’ are all words.

Each lexical e n try is essentially a complex feature structure. As such it

could be represented in a standard DAG form at such as the one provided by

PATR-H (Shieber 1986). However, Hudson has evolved his own m etalanguage

which has a quasi-English syntax and is often simpler to read th an more fa

m iliar DAG structures.

A lexical entry is viewed as a collection of propositions. Each proposition

has the general form at

A rgum ent 1 Predicate A rgum ent2

175
T he predicate is placed in infix position rath er th a n th e m ore fam iliar prefi>x

position

P red icatedA rgum ent 1. A rgum ent2)

for the sake of readability. T he chosen ordering is congruent w ith the normaal

SVO order of English predicate-argum ent structure. However, nothing restes

on th e predicate-argum ent order of the notation. Any ordering would do sgo

long as it was used consistently.

Five predicates appear in W G propositions.^ These predicates are thae

following:

1. is

2. has

3. precedes

4. follows

5. i ^

T h e pred icate is

T he is predicate is used to express identity betw een argum ents. Thus

X is Y

identifies X and Y as being alternative nam es for th e sam e object. An objecct

can be identified in more th a n one way because of th e facility for relativve

naming in W G. In th e sentence OUie obeyed Ronnie shown in Figure 9.1, Olliie

could be described either as ‘word 1’ or as ‘the subject of word 2 ’.

R elative nam es are expressed in th e form

(N am el of Name2)
^It is possible to define a WG system which has only one predicate and which makes thhe
necessary distinctions in terms of features (see Section 9.2.3 below). However, for the sakke
of clarity of presentation I shall work with the five predicate system here.

176
SU B JEC T O B JE C T

( Ï 1
Ollie obeyed Ronnie
1 2 3

Figure 9,1: dependency stru ctu re of OUie obeyed Ronnie

W here ‘N a m e l’ m ust be the atom ic nam e of a relational concept (such as

‘su b ject’ or ‘ag en t’) and ‘Nam e2’ m ay be either th e atom ic nam e of a non

relational concept (such as ‘noun’ or ‘w ord2’) or another relative name. Thus,

the following are both possible:

(35)

a (subject of word2)
b (agent of (referent of word2))

T he identity predicate is can be used to unify sets of propositions (alternatively

conceived of as feature structures) associated w ith labellings in th e system. In

this way categorial, functional, and sem antic inform ation can be combined in

the property stru ctu re of a given word instance.

T h e pred icate has

T he has predicate is used for two main purposes. F irst, it is used to assign

features to words. For example,

(36)

noun has (a num ber)

It should be obvious th a t values can be assigned to features using is proposi

tions:

(37)

(num ber of wordN) is singular

177
T he second use of the has proposition is in specifying th e dependency require

m ents of a word. For example,

(38)

finite verb has (a subject)

Here the use of a quasi-English formalism is slightly misleading. The use of the

predicate has in (38) does not express the fact th a t some particu lar finite verb is

in possession of a subject. R ather, it expresses th e fact th a t th e prototypical

finite verb has a subject.^ T hus, it could be read as follows: ‘A finite verb

typically has a subject (slot)’.

W G has a m echanism for distinguishing optional and obligatory depen

dents, as well as for signaling a num ber of more subtle distinctions. The

general form at of ‘slot’ propositions is:

A has (Q B)

where A is some nam ed entity, B is th e nam e of a slot (e.g. ‘su b jec t’) and Q
is a ‘q u a n tita to r’. A q u an tita to r (H udson’s term ) specifies th e num ber of slots

of th e variety specified by B . To date, m ost of H udson’s w ritings have m ade

use of the following set of quant it ators:

a a X = one X required
b ano X = a t most one X allowed
c m ano X = any num ber of Xs allowed
d m any X = two or more Xs allowed
e mony X = one or more Xs allowed
f no X = X prohibited

T he u tility of these should be fairly obvious. (39a) is used when exactly one

filler is required, as in th e case of subjects. (39b) applies when a slot is optional

b u t can never have more th a n one filler. For exam ple, a noun can optionally

have a dependent relative pronoun. (39c) is the least constrained — any num

b er of fillers will suffice. For exam ple, a noun can be modified by any num ber

of adjectives. (39d) is used when at least two fillers are required. T he principle
^Hudson intends his theory to be based on the notion of ‘prototypes’ (for useful intro
ductions see LakofF 1985 and Taylor 1989).

178
use for this is coordinate constructions where a conjunction m ust conjoin at

least two conjunct s. (39e) is used when at least one filler of the specified type

is required. For exam ple, a whole has mony parts. (39f) is a simple prohibition

stating th a t a word can not have a slot of some stated kind. In general, a WG

gram m ar follows the closed world hypothesis, i.e. anything which is not explic

itly allowed is considered to be im plicitly forbidden. However, there are cases

when explicit prohibitions are required, as we shall discover in section 9.2.2.

In recent presentations of the theory, Hudson has adopted an alternative,

m ore flexible form of q u a n tita to r (Hudson 1990: 23-4). T he new kind of quan

tita to r is stru ctu red ra th e r th an atomic. Its stru ctu re is [i-j] where i and j are

integers, i indicates the m inim um num ber of fillers for the slot and j indicates

the m axim um num ber of fillers for th e slot. Equivalences betw een th e old and

new system s are given in (40). I shall use th e old system in all examples.

(40) a a X = [I-l]
b ano X = [0-1]
c m ano X = [0-_]
d m any X = [2-_]
e mony X = [l-_]
f no X = [0—0]

T he question of w hether quant it ators have the effect of creating m ultiple slots

w ith identical properties or allowing single slots to have m ultiple fillers has to

be worked out for any im plem entation but it has no theoretical im portance.

C onstraints can be placed on the range of p o ten tial slot-fillers by m eans of

identity (is) propositions. For example,

(41)

a (subject of verb) is (a noun)

b (pre-adjunct of noun) is (a adjective)
c (comp of preposition) is (a noun)

In these examples, the second argum ent has th e form (a X). This use of a

should not be confused w ith the use shown in (39a) and (40a). This version is

simply used to distinguish the general case X from an instance of th e general

179
case (a X). T he two versions appear in com plem entary distribution: th e qman-

tita to r only appears in has propositions; the instance m arker only appearrs in

is propositions.

T h e p r e d ic a te s p re c e d e s a n d follow s

precedes and follows are used to express relative linear orderings. For extam-

ple:

(42)

a (subject of word2) precedes word2

b (object of word2) follows word2
O nly one of these predicates is required to express linearization constraiints.

For exam ple, (43) shows the same facts as (42) b u t uses only one predicatte.

(43)
a (subject of word2) precedes word2
b word2 precedes (object of word2)

R edundancy is allowed to aid readability. T here is no reason why an imjiple-

m entation should have to include bo th predicates. See Section 9.2.3 for fu rtth er
examples of the use of positional constraints.

T h e p r e d ic a te i ^

T he isa predicate is used to relate entities to more general entities. For excam-
ple:

(44)

a A P P L E i ^ common-noun
b common-noun isa noun
c noun i ^ word

I say th a t the i ^ predicate is used to relate entities to entities, rather tlh a n

entities to classes because W G assumes th a t th e isa relation is a relatiom of

instances to prototypes rath er th an a relation of mem bers to classes. U nilike

th e is relation, the i ^ relation is antisym m etric.

180
entity

person thing relation set

situation theme

event companion

action dependent

communication

speech

word

Figure 9.2: p a rt of the W G ontological hierarchy

So far, I have described th e kinds of predicates which can appear in propo

sitions. I have presented propositions as devices for expressing facts about

words. As it stands, this system has no m echanism for m aking or using gen

eralizations. An adequate gram m ar m ust consist of more th a n a list of entries

specifying all the properties of every word. It m ust make generalizations over

collections of words. In the next section I describe how th e isa predicate is

used to m ake generalizations by allowing th e properties of general cases to be

transferred to specific instances.

9 .2 .2 G e n e r a liz a tio n s a b o u t w o r d s

All entities in a W G are thought of as belonging to a single, vast ontological

hierarchy. E ntities in the hierarchy are related by isa relations. P a rt of the

top of the hierarchy is shown in Figure 9.2 (from Hudson 1990: 76).

T he connections between lower and higher concepts in the hierarchy rep-

181
word

noun verb adword conjunction

common proper pronoun

noun noun .
I I I
DOG SIMPSON HIM

Figure 9.3: p a rt of the W G word type hierarchy

resent i ^ relations. T he hierarchy includes non-linguistic, as well as linguistic

entities. T he details need not concern us here. (For more inform ation on

th e kinds of knowledge which a W G hierarchy represents see H udson 1985a;

Hudson 1986b; Hudson 1990: chapter 4). One p a rt of th e hierarchy of imme

diate relevance to this discussion is th a t p a rt which comes below th e ‘word’

en tity and which could be described as the ‘word type hierarchy’. P a rt of the

word type hierarchy is shown in Figure 9.3.

T he purpose of this hierarchical organization is to facilitate generalization.

A ny p roperty which is shared by m ost or all common nouns is stored in rela

tion to the ‘com m on-noun’ node in the hierarchy ra th e r th an a t the level of

‘D O G ’ or any other specific common noun. Any such pro p erty is said to be

‘inheritable’ by the lower node from th e higher node. T he sim plest version of

inheritance can be defined as follows (where P is any proposition):

(45)

IF XisaY,
P is tru e of Y
THEN P is tru e of X
M ost generalizations which can be m ade about language have got exceptions.

Exceptions can be accom m odated w ithin th e inheritance fram ework by stating

th e exceptional properties in relation to the highest node for which they hold

182
true. T he property inheritance principle is then revised as follows:

(46)

IF XisaY,
P is tru e of Y,
not: not: (P is tru e of X )“
THEN X h a s (Q P)
‘*‘It is not the case that X is prohibited from having the property P ’.

Since inheritance is overrideable, it is often referred to as default inheritance.

T he usual properties are assumed for an entity unless there are good reasons

(i.e. contradictory propositions) for thinking otherwise. For example, the

usual way to form plural nouns in English is to add the S-morpheme (‘m S’) to

th e noun stem . This generalization can be m ade for all nouns. The relevant

proposition would look som ething like (47).

(47)

(plural of noun) is ((stem of noun) + mS)

However, there are a small num ber of nouns (e.g. salmon) which exceptionally
do not follow the norm al plural rule. These would have to be specially marked

so as to override the general rule. For example,

(48)

a (plural of SALMON) is < salm on>

b not: (plural of SALMON) is ( < salm on> + mS)

In the case of words such as Aoo/which have coexistent default and exceptional

plural forms, a proposition such as (49) is added to introduce th e exceptional

form and nothing is added to block the default form from being generated by
(47).

(49)

(plural of HOO F) is < hooves>

T hus, hoofs can be generated using (47) and hooves can be generated using
(49).

183
head dependent

predependent postdependent

subject visitor pre-adjunct complement post-adjunct

object oblique

direct indirect
object object

Figure 9.4: p a rt of th e W G g ram m atical relation hierarchy

D efault inheritance is used in m ost m odern linguistic theories

(G azdar 1987), although few of th em m ake this explicit — H PSG

(Flickinger et al. 1985; Flickinger 1987) is a n otable exception. It is widely

assum ed th a t th e prim ary use of default inheritance in linguistics is for ex

pressing generalizations in m orphology. However, since everything is expressed

in relation to word-sized units in W G , sy n tax and sem antics can also m ake use

of default inheritance. T h e p o rtio n of th e W G in h eritan ce hierarchy presented

in Figure 9.4 shows how th e g ram m atical relations (i.e. types of dependency

relation) can also be arranged in an inheritance hierarchy.

T hus, proposition (50) implies proposition (51).

(50)

X has (a subject)

(51)

X has (a predependen t)

184
Table 9.1: inheriting properties for w l
S t o r e d p r o p o s it io n s A d d e d p r o p o s it io n s
w l isa DOG

DOG has (a structure) w l has (a structure)

DOG isa common-noun w l isa common-noun

common-noun isa noun w l isa noun

noun has (a number) w l has (a number)

noun has (mano pre-adjunct) noun has (mano pre-adjunct)
noun isa word w l isa word

word has (a head) w l has (a head)

word follows (pre-dependent of word) w l follows (pre-dependent of word)
word precedes (post-dependent of word) w l precedes (post-dependent of word)

T he following simple (overrideable) propositions tak e care of norm al English

word order.

(52)

a word has (a dependent)

b (pre-dependent of word) precedes word
c (post-dependent of word) follows word

W hen a sentence is analysed in W G , every word is assigned a unique identifier

such as w l (‘word 1’). Each word is analysed to establish its lexeme and

m orphosyntactic features. Once its lexem e has been found, th e word instance

can be attach ed to th e b o tto m of th e inheritance hierarchy u n d ern eath its

lexeme. It can th en inherit as m any properties as possible from higher nodes.

Consider w l, th e first word in sentence (53).

(53)

Dogs chase large w hite rabbits.

In Table 9.1, th e colum n on the left shows propositions contained in th e gram

m ar, while th e colum n on the right lists th e new propositions added for w l.

O nly a representative sam ple of propositions are shown.

T he effect of the inheritance process is to build up a feature set for the

word. A lthough absent from Table 9.1, constraints on slots are also inherited

185
during the process.

More detailed introductions to inheritance in WG can be found in Fraser

and Hudson (1990), Hudson (1990a: chapterS), and Fraser and H udson (1992).

9 .2 .3 A s in g le -p r e d ic a te s y s t e m

I have already noted th a t the predicates precedes and follows are not bo th nec

essary. In fact, as Hudson points out (Hudson 1990: 24ff), only one predicate

is really required, namely the is predicate. If this predicate is instead repre

sented w ith the symbol the gram m ar begins to look very sim ilar to any

o ther unification-based gram m ar. For exam ple, th e following exam ples show

equivalent (a) standard W G five-predicate propositions, (b) W G one-predicate

propositions, and (c) unification gram m ar feature structures (Shieber 1986).

(54)

a DOG isa noun

b (category of DOG) : noun
DOG cat : N o u n

(55)
a verb has ([1-1] subject)
b (quantity of (subject of verb)) : [1-1]
cat : Verb
a rg l : Subject

(56)

a (subject of verb) is (a noun)

b (subject of verb) : (a noun)
cat : Verb
subject : cat : N o u n ]

(57)

a (pre-dependent of word) precedes word

b (position of (predependent of w ord)) : before it
cat : W o rd
predep : posn : before

186
ri
People w ith
m
spare cash spend
1 ri
it in Capri

Figure 9.5: a W G dependency analysis

In his single-predicate version of W G , H udson introduces ex tra ‘positional

n a m es’: before, after, adjacent-to and next-to. ‘i t ’ identifies th e word referred

to by th e m ost deeply em bedded concept to the left of the ‘:’ predicate (i.e.

‘w ord’).

T h e purpose of this section is to emphasize th e sim ilarities between the

expressiveness of the W G form alism and th e expressiveness of other unification-

based formalisms (e.g. G PSG , LEG, and Hell wig’s DUG). This is not, however,

to claim th a t they are identical nor th a t the insights typically expressed in these

fram eworks are the same. (For example, W G provides a m uch richer system

of q u a n tita to rs th an any of the other frameworks.) The extent to which one

theory differs from another is a complex question and one which can only

be hindered by differences of notation. I have tried to show how easy it can

be to convert W G gram m ars into a more fam iliar notation. This is a first

step tow ards theory comparison. The next step goes beyond th e scope of the

present work.

9 .2 .4 S y n t a x in W G

Syntactic stru ctu re is expressed in term s of dependencies between word pairs,

with th e sole exception of coordinate constructions for which m inim al con

stituent stru ctu re is used (Hudson 1989b; Hudson 1990: chapter 14). T he

sentence shown in Figure 9.5 illustrates a typical W G dependency analysis.

The sentence shown in Figure 9.6 is an exam ple of th e use of constituency

187
ni
{[Big Mark] and
ri
[wee Nicki]}
r ïi
live in Edinburgh

Figure 9.6: the use of constituency in W G

in W G. T he brackets simply serve to identify the boundaries of the coordinate

stru c tu re and its com ponent conjunct s. Dependencies betw een elem ents of

the coordinate structure and elem ents outside the coordinate stru ctu re are

controlled by the Dependency in C oordinate Structure (DIGS) principle. This

states th a t:

any word which is outside a coordination C b u t which is in a depen

dency relation D to some conjunct-root of one conjunct of C m ust
also be in relation D to one conjunct root in every o th er conjunct
of C (Hudson 1990: 413).

A ‘conjunct-root’ is simply a head of a conjunct. In the case of Figure 9.6, the

conjunct-roots are Mark and Nicki.

A part from the exceptional case of coordination, all other syntactic stru c

tu re is expressed in term s of pairwise dependencies.

W G makes use of a modified adjacency principle since, under certain cir

cum stances, words are allowed to depend on more th a n one head.

T h e A djacen cy P rin cip le

D is adjacent to H provided th a t every word betw een D and H
is a subordinate either of H, or of a m u tu al head of D and H
(Hudson 1990: 117).

T he sentence in Figure 9.7 shows an exam ple of a dependency stru ctu re which

is p erm itted by W G ’s adjacency principle b u t forbidden by th e stan d ard ver

sion of adjacency as defined by G aifm an (1965) ( / i s separated from its head

to by want which does not depend on either / o r to).

188
want to leave

Figure 9.7: a stru ctu re perm itted by W G ’s version of adjacency

SU B JEC T

C ats adore

V ISITO R

Figure 9.8: the use of visitor links to bind an extracted element to th e m ain
verb

T he W G analysis of extraction relies upon a word having more th a n one

head. In this analysis, the extracted word is first bound to th e m ain verb by

a sem antically em pty dependency link known as the ‘visitor’ relation. T he

gram m ar would include rules such as those in (58).

(58)

a finite verb has (ano visitor)

b (visitor of verb) precedes (subject of verb)

T hus, in the sentence Cats I adore^ cats is bound as th e visitor of adore as

shown in Figure 9.8 (visitor links are draw n below the sentence).

It is a simple m a tte r to use th e visitor link to establish th e object relation

betw een the verb and cats. T he general form of the rule for identifying norm al

189
O B JE C T

SU B JEC T

C ats adore

V ISITO R

Figure 9.9: th e use of the visitor link to relate the ex tracted elem ent to the
m ain verb as its object

post dependents w ith the unusual visitor link is shown in (59),

(59)

(visitor of word) is (a (post-dependent of word))

This would lead to the analysis shown in Figure 9.9.

Since the visitor relation is sem antically vacuous, th e prepositional content

of th e sentence is the same as it would be if extraction had not taken place.

However, the presence of the visitor link introduces m arkedness to th e con

struction, as would be expected. T he exam ple sentence does not represent a

convincing argum ent for the use of visitor links since a simple rule could have

allowed the object to depend on th e verb directly w ithout the m ediation of the

visitor. (60) offers a b e tte r exam ple since th ere is m ore intervening m aterial

betw een th e extracted item and its head.

(60)

C ats I think you know I adore

O nly one e x tra rule is required to copy th e visitor link from verb to verb, thus

producing a ‘hopping’ analysis. T he rule appears in (61).

(61 )

(visitor of word) is (a (visitor of (com plem ent of w ord)))

190
cats think you know adore

Figure 9.10: the use of visitor links to interpret the object of an em bedded
sentence

This rule allows sentences like (60) to be analysed w ithout difficulty. The

resulting stru ctu re is shown in Figure 9.10.

A m ore detailed exposition of the use of visitor links in W G can be found

in Hudson (1988b).

A part from (i) allowing constituency in coordinate constructions, (ii) al

lowing m ultiple heads, and (iii) providing a modified adjacency principle, W G

abides by th e definition of DG supplied by Gaifman. Exceptions (i)-(iii) m ay

be regarded as extensions to the expressiveness of the stan d ard dependency

form alism , whose form al properties and consequences are as yet undefined

formally.

9 .2 .5 S e m a n tic s in W o rd G r a m m a r

Sem antics in W G relies upon two basic premises:

1. V irtually every word in a sentence is linked to a single element in the

sem antic structure.

191
cl c2 c3 c4 c5 c6

Fred loves Jane for her w ealth

Figure 9.11: sem antic structure is very similar to syntactic stru ctu re in W G

2. There is a high degree of congruence between syntactic dependencies and

sem antic dependencies.

The elem ents in sem antic structu re to which words are linked are called ‘refer

e n ts’. These are taken to be m ental concepts rath er th a n objects in th e world.

T he two basic kinds of relation which m ay hold between referents are depen

dency and identity. A simple exam ple of a possible W G sem antic rule which

is parasitic upon the syntactic structure, is given in (62).

(62)

(referent of (subject of LOVE)) is (actor of (referent of LOVE))

T he diagram shown in Figure 9.11 (from H udson 1990: 123) should serve to

illustrate the extent of congruence between sy ntactic and sem antic structures

in W G . In th e diagram , the labels c l, c2, etc. are the conceptual referents of the

words to which they are linked by d o tted lines. Arrows betw een referents show

sem antic dependencies. Equality operators between referents show identity.

This degree of isom orphism between syntax and sem antics allows th e se

m antics simply to be ‘read off’ th e syntactic stru ctu re in m any cases. One of

my own early W G parsers succeeded in constructing sem antic structures for

192
a respectable range of sentences with m inim al effort required (Fraser 1988).

However, it would be foolish to pretend th a t all sem antic analyses are equally

easy. Some difficult problems rem ain to be solved. To d ate, sem antics in W G

has not received as much atten tio n as syntax. It is to be hoped th a t this

im balance will be corrected before too long. In th e m eantim e, the only com

p u ter system to a ttem p t any W G sem antic analysis, other th a n my own, is

G orayska’s small-scale ‘W G sem antic analyzer’ (Gorayska 1987).

9 .3 W o rd G ram m ar p a rsin g

In early 1985 R ichard H udson produced a very m odest W G parser w ritten

in BBC Basic and running on a home com puter w ith ju st 32K of RAM

(Hudson 1985b). At th a t stage Hudson described himself as “an am ateu r w ith

m ore enthusiasm th an program m ing skills” . However m odest the parser may

have been, it becam e the inspiration for my own larger scale parser (w ritten in

Prolog) which formed the basis of my M asters dissertation (Fraser 1985). The

m ain strengths of this system were its com plete separation of gram m ar and

parser and its simple b u t effective im plem entation of default inheritance. These

features have continued to inform subsequent system s developed by Hudson

and myself a t U niversity College London. U nfortunately, m y parser failed to

solve the problem of im plem enting th e adjacency principle and so it failed in

the m ost im p o rtan t task of a parser, nam ely building appropriate syntactic

structures.

E arly in 1986 we becam e aware of a group of com puter scientists at Im perial

College, London who were beginning to show interest in th e ideas contained in

H udson’s 1984 m onograph. This group, and especially Derek Brough, w rote a

num ber of very small trial parsers (Brough 1986). Around this tim e a former

student of H udson’s, M ax Volino, also wrote a small parser based on WG.

H udson himself had moved on from his com putational small beginnings and

was now using Prolog on a much more powerful m achine. Hudson (1986a)

193
reports H udson’s first parser w ritten in Prolog.

In late 1986 I sta rte d to work as H udson’s research assistant and began

to develop some of the ideas first presented in my M asters dissertation. This

soon resulted in the production of a parser which combined an inheritance

m echanism w ith a functional (though clumsy) parsing strategy (Fraser 1988).

Like all W G parsers developed up until then, this one incorporated an explicit

check on the adjacency of words to be linked. This was very expensive com

putationally so the parser ran rath er slowly. It was th e first W G parser to

build simple sem antic structures as well as syntactic structures. L ater th a t

year, B arbara G orayska (a former doctoral stu d en t of H udson’s) produced a

more sophisticated W G sem antic analyzer (Gorayska 1987) although it was

not p a rt of a parsing system . Parsers loosely based on W G were also produced

as final year projects by Francis Bell, an undergraduate a t Westfield College in

London and, in 1987, by Phil G rantham , a p o stg rad u ate stu d en t at Sheffield

Polytechnic (G ran th am 1987).

D uring th e period 1987-8, the two largest scale W G parsers produced to

date were being developed in parallel a t U niversity College London by Hudson

and myself. W hile we exchanged views and insights on theoretical m atters

during this period, we kept th e im plem entational and algorithm ic details of the

system s to ourselves, thus ensuring th a t two distinct im plem entations evolved.

In the rem aining sections of this chapter, these two parsers are described in

more detail.

9 .3 .1 F r a s e r ’s p a r se r

O b jectiv es o f th e parser

I had two m ain objectives in w riting my parser. The first objective, which it

shared with m y earlier W G parsers, was simply to see w hat a W G parser would

look like. Could it be a m inor m odification of an existing parsing algorithm

or would it involve distinct problem s requiring distinct solutions? Once a few

194
trial systems had been constructed I felt th a t I was in a position to identify

some problems which seemed to be common to all of th e parsers. Solving these

problems becam e the principal focus of the parser I rep o rt here.

T he m ain difficulty which plagued th e early W G parsers was th e tim e

they took to parse sentences. They ran very slowly, even when working with

small gram m ars on powerful machines. My best W G parser up to th a t tim e

had taken 254 seconds to find a first reading for th e seven word sentence

This sentence was analyzed by a computer^ even though th e gram mar-lexicon

contained little more th an w hat was required to process th e sentence and in

spite of the fact th a t the program was running on a single-user Sun w orkstation

(Fraser 1988: 58). At least p art of th e reason for th e poor perform ance could

be a ttrib u te d to some features of the version of Prolog I was using. My program

had m ade extensive use of the a s s e r t and r e t r a c t predicates to add facts to,

and remove facts from th e Prolog database. A larm ed by th e poor perform ance

tim es, I carried out a series of benchm ark tests and discovered th a t it was

much quicker to m aintain a record of th e current state of th e parse in long

environm ent lists which could be passed betw een predicates th a n to w rite to

and erase from the Prolog database. This problem was easily solved, and the
parser described here seldom asserts and never retracts during parsing.

However, not all speed-related problems sprung from th e m undane details

of th e im plem entation. Some had more significant theoretical origins. Chief

am ongst these were the role of the adjacency constraint in th e parser and the

question of how best to generate all readings for a sentence.

In all of th e W G parsers available up to th a t tim e, the adjacency constraint

was im plem ented as an explicit perm issibility check on a hypothesized depen

dency relation between two words, no doubt because th a t is th e way in which

it is presented in H udson (1984). In this respect these parsers differ from all

of th e other parsers described in this thesis which either have no adjacency

constraint or which build the constraint into th e p arser’s control strategy. By

195
profiling my earlier parsers I discovered th a t most of the processing tim e was

devoted to selecting potential word pairs, checking th a t they could contract a

dependency relation, checking w hether potential dependency pairs were ad ja

cent and then discovering th a t they were not. Given an n word sentence, it is

possible to hypothesize dependency relations between any word and every one

of the other words in the sentence, i.e. n — 1 other words. T he sentence as a

whole (assuming no lexical am biguity) could generate a m axim um num ber of

n{n — 1) hypothetical relations. Most of these relations would be rejected by

the adjacency constraint (and, of course, by the dependency requirem ents of

each word). It struck me th a t this was approaching the problem the wrong way

round. If a parser were constructed in such a way th a t it never hypothesized

a relation between two words unless they were adjacent, this ought to avoid a

considerable am ount of w asted effort.

T he solution to this problem was to construct the parser around an ex

plicit stack and to stipulate th a t the only place which could be searched for

a dependent or head for the current word was a t the top of the stack. The
m ain difficulty for this approach was in establishing dependencies betw een

word pairs when one of the words had been extracted. T he solution I adopted

was to separate dependency relations into those which had to be discovered by

search and those which could be derived from the dependency relations already
in existence.

The second problem I addressed in my parser was how to increase effi

ciency in the discovery of all possible readings for a sentence. I did not w ant

to use a chart because I was unsure how to represent discontinuous groups

of dependents and, more problem atically, how to deal w ith th e uncertainties

raised by th e possibility of m ultiple headedness. Choosing a com pletely differ

ent approach I decided to construct a backtracking parser which was designed

in such a way as to spot im plausible analyses as early as possible, thus keeping

to a m inim um the am ount of useless stru ctu re which would be built. Needless

196
to say, this could not prevent the parser from duplicating effort in some cases.

T h e p a rser

As we have seen, one of the central claims of DG in general and W G in par

ticular is th a t a gram m ar need only refer to word-sized units. However, there

is no theoretical reason why a W G parser should share th e sam e restrictions

as the gram m ar it uses. I propose th a t, while a gram m ar m ay only refer to

word-sized item s, a parser should be allowed to refer in addition to two other

kinds of d a ta structure, namely molecules and stacks.

Following th e exam ple of Tesnière, I draw an analogy from m olecular chem

istry and the process of chemical bonding. An atom w ith an overall positive

or negative charge is called an ion and is said to have a valency. Similarly, a

single word is said to have a valency. W here ions have positive charge, words

have a requirem ent for dependents; where ions have negative charge, words

have a requirem ent for a head. W hen a positively charged ion m eets a neg

atively charged ion (and other factors perm it) the two ions bond to form a

single molecule. Any im balance in charge between the two ions remains as

a p roperty of the molecule (although ultim ately it is th e p ro p erty of a single

nucleus). In sim ilar fashion, a word which requires (or allows) a dependent

can bond w ith a word which requires a head to form a molecule unless any

constraints prevent it. Any dependency slots (charges) not involved in this

bond rem ain as properties of the molecule. Molecules can bond w ith other

molecules. W ell-formedness is analogous to molecular stability in chem istry

— in my m odel a molecule w ith satu rated valency can serve as a sentence (so

long as its root does not require a head).

For obvious reasons, the parsing procedure presented here is called the

bonding algorithm.

M o le c u le s A molecule is a stru ctu re consisting of a root word plus all of its

subordinates discovered so far. Molecules are 4-tuples of th e form shown in

197
(63).

(63)
[Negative-list, Positive-list, Subordinates, Derivable]

Negative-list is a list of unfilled head slots. Positive-list is a list of unfilled

dependent slots. T he general form of a slot is shown in (64).

(64)
[NUMBER, TYPE, SLOT-LABEL, SLOT-TYPE, POSITION]

N U M B ER is a unique identifier for the word which has th e slot (e.g. w5).

T Y P E is th a t w ord’s word type (e.g. verb). SLOT-LABEL identifies the kind

of dependency relation which m ust hold between the word and its slot filler

(e.g. subject). SL O T-T Y PE is the word type required of th e slot filler (e.g.

noun). PO SITIO N indicates the filler’s position relative to th e word which has

th e slot. There are three values for PO SITIO N , nam ely b e f o r e , a f t e r and

e ith e r.

There is, of course, a certain am ount of arbitrariness in th e association of

positive charge w ith dependency requirem ents and negative charge w ith head
requirem ents ra th e r th an vice versa. The significant point to note is th a t they

are m utually a ttractiv e opposites.

Subordinates is a structured list containing a record of all of th e root w ord’s

subordinates and the dependency relations involved.

Derivable is a list of slots and inform ation detailing how to derive th eir

fillers from existing dependency relations.

S ta c k s The bonding algorithm makes use of a single parse stack, and only

molecules m ay be pushed onto it. T he way in which th e stack is used ensures

th a t only adjacent words can be bonded.

P r e lim in a r ie s The parser works in a left-to-right, b o tto m -u p , single-pass

m anner. T he parser reads one word at a tim e, constructing for each word a

198
fram e of slots and constraints on fillers. T he inform ation which is used to build

a fram e is obtained from the gram m ar by a process of property inheritance.

For example, the propositions shown in (65) could be inherited for the first

word of sentence (66).

(65)
word-1 i ^ proper-noun

word-1 has (a head)

word-1 has (mano pre-adjunct)
word-1 has (ano post-adjunct)

(pre-adjunct of word-1) is (a adjective)

(post-adjunct of word-1) is (a preposition)
(pre-adjunct of word-1) precedes w ord-1
(post-adjunct of word-1) follows word-1

(66)

John loves M ary

The same inform ation can be expressed much more com pactly when it is con

verted into molecule form at. T he molecule which would be constructed for

word-1 is shown in (67).

(67)

[ [ [ 1, proper-noun, [a, head], word, either] ],

C [ 1, proper-noun, [mano, pre-adjunct], adjective, before],
[ 1, proper-noun, [ano, post-adjunct], preposition, after] ],
[].
[]
]
(68) shows the molecule initially constructed for the second word of sentence

( 66 ).

(68)

[ n.
[ [ 2, finite verb, [a, subject], noun, after],
[ 2, finite verb, [a, object], noun, after] ],
[],
[] ]
For the sake of simplicity I have ignored th e requirem ent in English for subject-

verb agreem ent. This can be accom m odated in the fram ework but it would

199
require some digression.

M o le c u la r b o n d in g At the h eart of the parser lies a process for combining

molecules to form larger molecules. In general, if some elem ent of the Positive-

list of one molecule can be combined w ith some elem ent of the Negative-list

of another molecule then the second molecule can be merged into the first to

produce a new, larger molecule.

In order to facilitate description of th e process of m olecular bonding, I shall

identify the elem ents of Positive-lists and Negative-lists by m eans of the names

given in (69).

(69)

[ number, type, [quantitator, slot], slot-type, order ]

An a tte m p t can be m ade to bond (67) and (68) by trying to unify an ele

m ent from one Negative-list w ith an elem ent from th e Positive-list of th e other

molecule. We shall say th a t the two unify if the following conditions hold:

1. A i ^ B, where A is the Negative-list ty p e and B is th e Positive-list

s l o t - t y p e ; and

2. C ÎM D, where C is the Positive-list ty p e and D is th e Negative list

s l o t - t y p e ; and

3. the Positive and Negative orders unify (b e fo re unifies w ith b e fo re ,

a f t e r unifies w ith a f t e r , e i t h e r unifies w ith anything, b u t b e f o r e

and a f t e r will not unify w ith each other).

Let us consider th e first element of the Positive-list of (68) and the first (and

only) elem ent of the Negative-list of (67). These are shown in (70).

(70)

-fve [ 1, proper-noun, [a, head], word, either ]

-ve [ 2, finite verb, [a, subject], noun, before ]

W hen we try to unify these lists we find th at:

200
1. ‘proper-noun n oun’ succeeds; and

2. ‘finite verb isa w ord’ succeeds; and

3. ‘unify e i t h e r with b e f o r e ’ succeeds

therefore all of the conditions are satisfied and the molecules m ay bond. The

stru ctu re of the resulting molecule is shown in (71).

(71)

[ □,
[ [ 2, finite verb, [a, object], noun, after],
[ 1, proper-noun, [ano, post-adjunct], preposition, after] ],
[ subject, 2, 1],
[] ]
Several interesting things have happened here. First of all, th e m atching el

em ents — a Negative elem ent of (67) and a Positive elem ent of (68) — have

collapsed into a single elem ent which is recorded in th e Subordinates list (read

this as ‘the subject of word-2 is w ord-1’). In addition, two Positive elements

of (67) have been deleted. T he reason for this will soon become apparent.

U s in g t h e s ta c k Only two molecules are available for bonding at any tim e,

nam ely th e top two molecules on the parse stack. I shall refer to th e top-m ost

molecule as M l and the next one down as M2. To begin w ith, a test is made

to see if M2 can depend on M l (i.e. if some elem ent in M l’s Positive-list will

unify w ith some elem ent in M 2’s Negative-list). If this test succeeds then the

two molecules bond to form a new molecule. If not, then a te st is m ade to

see if M l can depend on M2 (i.e. if some elem ent in M l’s Negative-list will

unify w ith some elem ent in M 2’s Positive-list). Again, if they unify, the two

molecules bond to form a new one. This becomes the new M l and th e next

highest stack elem ent becomes available as M2. If two molecules will not bond,

then the stack rem ains unchanged. The next word of th e sentence is read and

a new molecule is constructed and added to th e parse stack.

By the end of a sentence, there should be exactly one molecule left on the

stack. If there is more th a n one then th e parser has failed to find a single

201
dependency structure for the input string.

T he parser only ever searches its im m ediate left context. In this way the

operation of the stack implicitly applies the adjacency constraint. T hus, one

of th e objectives of the parser has been satisfied: the parser never attem p ts

to establish a dependency relationship between a pair of words unless they

are adjacent. Note also, th a t (unlike in earlier W G parsers) th ere is no search

involved. There is only one place to look for a head or for a dependent, nam ely

M2. If it is not there then there is no need to look any further.

A nother strength of this stack-based approach is th a t it provides neat ways

of identifying and closing down doomed search paths as early as possible —

thus satisfying th e other m ain objective of th e parser. Recall th a t when we

com bined molecules (67) and (68), we produced a new molecule (71). However,

in th e process, we lost the two slots shown in (72).

(72)

a [1, proper-noun, [rnsmo, pre-adjunct], adjective, before]

b [ 1 , proper-noun, [emo, post-adjunct], preposition, after]

It should be obvious th a t the first word of a sentence can not possibly have

a pre-adjunct. However, it is possible to appeal to a more general principle

which states th a t any M l which has optional slots for dependents w ith the

before order feature, will have these options closed if it is found th a t there is

nothing else on the stack. This is because there are not, and never will be, any

available fillers. This accounts for the disappearance of slot (72a). Likewise,

if M l has non-optional slots for preceding fillers and th ere is nothing else on

the stack, then no single dependency stru ctu re will ever be able to link all

of th e words in the string into a coherent sentence. This fact can be used

to spot impossible analyses before fu rth er stru ctu re is built fruitlessly. If this

heuristic were not applied, parsing could continue until the end of th e input

string before the problem was spotted.

T he reason for the erasure of the post-adjunct slot (72b) is th a t th ere is a

rule which states th a t when an M I becomes the head of an M2, any optional

202
* 1

Figure 9.12: a prohibited dependency stru ctu re

a f t e r slots the M2 may have had are removed. This is because structures of

th e sort shown in Figure 9.12 can not occur.

Had the slot been obligatory and not ju st optional, this would have signalled

th a t further processing would be pointless: no successful parse could ever

result.

Thus, a t the cost of two simple tests a t bonding tim e, th e am ount of need

less processing can be significantly reduced. I shall show below how th e parser’s

efficiency can be further enhanced by exam ining th e gross characteristics of the

stack whenever a molecule is pushed onto it.

F irst, though, here is a PARS description of my parsing algorithm.^

^It is necessary to define an extra condition ‘o bligatory_slots(X ,Y )’ for this PARS de
scription. This condition succeeds if word Y has any obligatory slots in position X (e.g.
obligatory_slots(before, C)), otherwise it fails. It is also necessary to define a special action
‘strip_optionalj5lots(X ,V )’ which strips out any optional slots belonging to Y with positional
feature X (e.g. strip_optional_slots(after, C)).

203
IN I T IA L I Z A T I O N : read input words into a list
(in molecule form at);
C is the current word in the list;
C := l;
X is a pointer;
X := l;
initialize an em pty stack;
the result is stored in the variable Result.

1. IF empty(Stack)
T H E N IF obligatory_slots(before, X),
T H E N fail
E L S E strip_optional_slots(before, X),
push(X),
C :=C + 1 ,
X:=C,
goto( 2 )
E L S E IF X top(Stack)
T H E N IF obligatoryjslots(after, top(Stack))
T H E N fail
E L S E strip_optional_slots(after, top(Stack)),
record(X —> top(Stack)),
pop(Stack),
goto(l)
E L S E IF top(Stack) X
T H E N IF obligatory_slots(before, X)
T H E N fail
E L S E strip_optional_slots(before, X),
record(top(Stack) —> X),
X:=top(Stack),
goto(l)
E L S E push(X).
C:=C+1,
X:=C,
goto( 2 ).

204
2. IF C=e
T H E N Result:=top(Stack),
pop(Stack),
IF empty(Stack)
T H E N succeed
E L S E fail
E L S E goto(l).

A lg o r ith m 9.1: Fraser’s ‘bonding’ algorithm

D e r iv e d d e p e n d e n c y r e la tio n s Consider sentence (73), in which the ob

ject the thesis has been extracted out of its norm al post-verbal position.

(73)
T he thesis I w rote

A t a certain point in the analysis of this sentence, M l will be the molecule

I wrote (headed by wrote) and M2 will be th e molecule the thesis (headed

— according to norm al W G practice — by th e determ iner the). Recall from

our discussion of visitors th a t a tensed verb m ay have a preceding visitor. In

this case, the (thesis) is recognized as the visitor of wrote. W hen the (thesis)

becomes visitor of wrote it is absorbed into the molecule headed by wrote and

disappears from view. However, it is still necessary to identify the (thesis) as

th e object of wrote. This is where the ‘Derivable’ com ponent of a molecule

finds its use. T he Derivable list contains identity propositions. In this case,

th ere is a proposition which equates the object of a tensed verb w ith th e visitor

of th a t tensed verb. T he Derivable list is checked after each new dependency

relation is established and any additional relations which m ay be derived are

added to the parse record. In this way, th e parser is able to build all of the

m ultiple-headed structures which are sanctioned by WG theory.

205
A d d itio n a l o p tim iz a tio n s The root of a sentence differs from all of the

other words in a sentence in th a t it has an em pty N egative-list (i.e. it does not

require a head). This makes it easily identifiable during parsing. One useful

consequence of the adjacency constraint is th a t no (non-derived) dependency

relation will ever cross the root. Therefore, when th e root is pushed onto the

stack, the stack must be empty, otherw ise th e molecule or molecules left on

the stack will never be integrated into the molecule headed by th e root. This

is a robust test which, together w ith those already m entioned, contributes to

th e p arser’s early recognition of fruitless search paths.

T here is a t least one fragile — but nonetheless useful — heuristic which

can also improve the average perform ance of the parser. A part from th e hand-

analyzed BKB corpus compiled for th e DLT project (7), the only corpora

analyzed in term s of dependency stru ctu re known to me were constructed

by Dick Hudson, Monika Pounder and myself at U niversity College London.

These were very small, exploratory corpora, which had no claims w hatsoever

to statistical significance. However, a striking feature of th e dependency trees

was observable. If an arbitrary word in any of th e corpora were chosen, and

it were assumed th a t the sentence were being parsed by an increm ental, left

to right parser like the one I have ju st described, th en a t th e chosen point

in th e analysis, the m axim um num ber of unsatisfied dependencies hardly ever

exceeded three, and certainly never exceeded four. If this result could be shown

to be valid for a corpus of significant size, it would have im plications for the

design of backtracking parsers. If it is valid, th en th e chances of a successful

result would be very slim from a parser sta te in which four or more molecules

were resident on the stack. This constraint is fragile — after all, it is possible

to stack up arbitrarily m any adjectives before a noun — b u t it m ay prove to

have a useful heuristic function in th e m ajority of cases.

I m p l e m e n ta tio n d e ta ils The parser is im plem ented in Poplog Prolog on a

Sun 3/52 w orkstation. It can analyze a wide range of English constructions

206
while m aintaining consistently high levels of efhciency. The parser analyzes

sentences left to right increm entally in real tim e — it takes 0.23-0.25secs to

establish a dependency relation, w ith a vocabulary of approxim ately 500 lex

ical item s. This is roughly 1/64 of the tim e taken by th e parser’s im m ediate

predecessor.

C o m p le x ity The absolute tim e taken by th e parser is, of course, depen

dent on the hardw are and software platform s used. Some im plem entation-

independent m easure of perform ance is more desirable. In particular, the

asym ptotic com plexity of the parser is of interest. It is w orth pointing out

th a t the optim izations to the bonding algorithm described above do not affect

the asym ptotic complexity; they only affect the size of the constant in the

calculation.

Assuming for a m om ent th a t the parser operates w ith a completely unam

biguous gram m ar, the m axim um am ount of work required to find the depen

dents of a word and the head of a word is constant for all words. Therefore the

parser takes tim e proportional to n, i.e. it operates in linear tim e. A lthough no

form al com plexity proof has been constructed, it is hard to see how th e formal

result could differ from the one arrived a t informally here. Em pirical experi

m ents w ith th e parser support this result. (T he average tim e of 0.23-0.25 secs

taken to establish a dependency relation was constant even for sentences of

m ore th an forty words in length.)

Given the prevalence of am biguity in n a tu ral language, it is unrealistic to

suppose th a t a practical version of th e parser would be able to o perate w ithout

being forced to backtrack. Like m ost parsers which make no use of charts, the

tim e taken to find every reading for an am biguous sentence is proportional to

n ” in the worst case. The effects of stack-related early recognition of failure

have not been taken into consideration in arriving at this figure. It seems

likely th a t addition of a chart to th e parser would result in polynom ial tim e

complexity.

207
9 .3 .2 H u d s o n ’s p a rse r

O b jectiv es o f th e parser

H udson had two main objectives in writing his W G parser. F irst, he was in ter

ested in developing a tool which would help him to w rite consistent large scale

gram m ars of n atu ral languages. It is very difficult when w riting realistically-

sized gram m ars to m aintain internal consistency and to an ticip ate all the con

sequences of the addition of some new gram m ar rule or rules. One solution to

this problem is to build a com putational environm ent, a ‘gram m arians work

b en ch ’, which allows the gram m ar w riter to modify the gram m ar and then to

check the consequences of the m odification by parsing a set of test sentences.

T he test sentences have previously been parsed ‘by h a n d ’ so th e targ et struc

tures are known. Ideally, any modifications to th e gram m ar should increase the

num ber of test sentences which th e parser analyses correctly. Since H udson’s

w orkbench is designed to be used by linguists rath er th a n com puter scientists,

gram m ar rules can be w ritten in a slightly modified dialect of the W G notation

reviewed above. This is autom atically compiled into a denser, less readable

system -internal representation. It is not necessary to be fam iliar w ith this rep

resentation in order to understand the algorithm . A g ram m arian ’s workbench

should be usable w ith a range of gram m ars so th a t alternative analyses can be

tried. This requires th a t the analysis system be com pletely separate from the

gram m ar. In H udson’s system (as in my own) th e parser and th e gram m ar

are clearly distinct, even to the extent of residing in different com puter files.

T he gram m ars are collections of declarative facts which can be slotted into the

procedural parser. The only linguistic objects th a t th e parser knows ab o u t are

very general objects (which are not specific to any language or construction)

such as ‘d ep en d en t’, ‘h ead ’ and ‘w ord’.

H udson’s second object in w riting his parser was to produce a m odel of

hum an sentence processing. T he desire to produce a parser which is, in some

sense, a cognitive model leads to a design strategy which eschews parsing

208
techniques which are com putationally efficient but cognitively unm otivated. In

so far as W G has am bitions to be a theory with claims to make about cognition

— and it does — th e aim of building a com putational cognitive model should be

satisfied by following the theory as closely as possible in th e im plem entation.

T he theory calls for increm ental processing of sentences and the generation

of all possible alternative analyses for each sentence. This is a considerable

simplification of w hat hum ans seem to do. There is evidence th a t while people

do process sentences increm entally, they do so by entertaining several analyses

for a lim ited period only before selecting some particular reading for a word.

This m eans th a t m ost of the tim e alternative analyses are not carried all the

way through a sentence. One consequence of this is th a t it is possible to

m ake a wrong decision which subsequently has to be undone. G arden p ath

sentences (M arcus 1980) illustrate this phenom enon. H udson’s parser is thus

only a model of certain aspects of increm entality and am biguity handling since

it always processes increm entally and always builds all possible readings for

a sentence in parallel. W hat is perhaps W G ’s m ost interesting cognitively-

m otivated principle, the ‘Best Fit Principle’ (B F P ), is not m odelled a t all in

th e parser. The B FP is designed to allow the gram m ar to be used to analyse

sentences which are to some extent ill-formed. T h at is, they do not reflect

anything in the com petence gram m ar directly. T he B F P is worded as follows:

T he B est F it P rin cip le

An experience E is interpreted as an instance of some concept C if
more inform ation can be inherited about E from C th a n from any
alternative to C (Hudson 1990: 47).

In effect, th e B F P is a pragm atic principle which always steers processing

in the direction of the greatest net gain in inform ation (in this respect it is

ra th e r like th e Principle of Relevance of Sperber and W ilson 1986). It calls

for constraint relaxation in m atching a word instance to its model in th e isa

hierarchy since it is accepted th a t the m atch m ay not be exact. (For a review

of some constraint relaxation techniques in n atu ral language processing see

209
Fraser and Wooffitt 1990.) This could be expected to have a profound effect

on the design of a com puter model. Sadly, all W G parsers produced to d ate

im plem ent an ‘Exact F it Principle’ rath er th a n a B FP. It is to be hoped th a t

the next generation of W G parsers will tackle this problem.

T he parser

H udson’s parser is w ritten in Prolog2 and runs on an IBM XT. It processes

each sentence from left to right, one word a t a tim e. W hen a word is read into

the system it is first analyzed morphologically into a stem and (optionally)

an affix. The stem is used to locate where to a tta ch the word instance in th e

inheritance hierarchy. T he affix (or absence of one) is used to determ ine th e

w ord’s m orphosyntactic features. I shall not describe th e m orphological a n a

lyzer here. Details can be found in Hudson (1989c: 327ff; a fuller tre a tm e n t

of m orphology in W G can be found in Hudson (1990a: A ppendix 8). E ach

word is assigned a unique identifier which, for convenience, is an integer co r

responding to the position of the word in th e input string. A dditionally, each

reading of a word is assigned a distinct num ber which is also an integer. T h e

first reading found is assigned th e identifier 1, th e second reading is assigned

2, etc. Thus, any word in the system is identified by a two elem ent list. T h e

first elem ent identifies its position, the second element identifies its reading. Im

sentence (74), th e first occurrence of saw would have a nom inal and a verb ad

reading. Thus, two distinct words would be identified: [2,1] and [2,2]. T h e

second occurrence of saw similarly has two readings, distinguished as [4,1] a n d

[4,2]. (In the la tte r case the am biguity is only local).

(74)

I saw his saw

N ext, th e parser has to inherit properties for each reading identified. It is no>t

clear w hether it is b e tte r to inherit properties all at once or in a dem and-drivem

way. In this parser all inheritable properties are collected together at once ancd

210
built into a feature structure associated with the word instance. The feature

stru ctu re includes properties inherent to the word such as tense, num ber, etc.

It also includes inform ation about the w ord’s possible dependents and, more

controversially, about its head. T h a t is, m ost words other th a n finite verbs

will have associated w ith them a proposition such as

(75)

[6,1] has (a head)

and possibly an additional proposition which identifies th e kind of word which

m ay serve as a head:

(76)

(head of [6,1]) is (a noun)

The algorithm always tries to link a word to a preceding word. It never searches

the right context. It begins by trying to find a dependent for th e current word,

startin g w ith the closest preceding headless word and working back towards

the first word. This process continues until all dependents are found for the

current word or until no more options are available. Next, the current word

searches the previous context trying to find another word (only roots of p artial

trees are considered) which could serve as its head. If it is successful then the

next word is read in, morphologically analysed, assigned default properties and

m ade the current word in the parser. If no head is found for th e current word

then checks are m ade to see w hether (on th e basis of local knowledge) the

current word could possibly be the sentence root or, alternatively, if it could

depend on a word which has not been read in yet. If either of these options is

not ruled out th en the next word becomes the current word. If neither option

is possible then an a tte m p t is m ade to take th e current word as th e root of a

conjunct in a coordinate structure. If this is possible then it is necessary to

copy any dependency relations which hold between any other conjunct roots

and words outside the coordinate structure. If none of these tests succeeds

th en the parse has failed. T he parse succeeds when the final word has been

211
processed and all of the words are subordinate to a single root.
Hudson describes his algorithm as follows (quoted from H udson 1989c:

334):

1. try to take the nearest preceding word X th a t has no head as a dependent

of W.

(a) If successful, repeat 1, w ith reference to th e last word before X th a t

has no head;

(b) O therwise, go to 2.

2. Try to take a root of the nearest preceding word Y as head of W.

(a) If successful, stop.

(b) O therw ise, go to 3.

3. Try to take W as a word which need not have a preceding head, either

because it needs no head at all, or because it m ay have a following head.

(a) If successful, stop.

(b) O therw ise, go to 4.

4. Try to take W as the root of a conjunct which shares its external relations

w ith earlier conjunct-roots of a coordination.

(a) If successful, stop.

(b) Otherw ise, fail.

A lgorithm 9.2 presents a PARS description of H udson’s parsing algorithm

(om itting the conjunct root test for sim plicity).

212
I N I T IA L I Z A T I O N : read input words into a list;
C is the current word in the list;
C := l;
initialize a stack, Stack;
push(Stack, C);
C:=2;
X is a global variable;
the result is stored in the variable Root;
the action ‘ro o t(R o o t)’ succeeds if Root
does not require a head.
1. IF C = e
T H E N goto(3)
E L S E IF empty(Stack)
T H E N goto(2)
E L S E IF C ^ top(Stack)
T H E N record(C top(Stack)),
remove(top(stack)),
pop(Stack),
goto(l)
E L S E X:=C-1;
goto( 2 ).

2. IF X = 0
T H E N push(C),
C :=C + 1 ,
goto(l)
E L S E IF X -> C
T H E N record(X -4 C),
C:=C+1,
goto(l)
E L S E X := X - 1 ,
goto(3).

213
3. Root:=top(Stack),
pop(Stack),

IF (root(Root) & empty(Stack))

T H E N succeed

E L S E fail.

A lg o r ith m 9.2: H udson’s dependency parsing algorithm

Unlike H udson’s previous parsers (Hudson 1985b; H udson 1986a), this one

includes no explicit adjacency test. Instead, adjacency checking is im plicit

in th e parsing algorithm . This is interesting since Hudson is concerned with

cognitive modelling. By making th e adjacency principle inhere in th e parser,

he is making the claim th a t the adjacency constraint applies in all languages.

T his would b e an unreasonable claim to m ake for the trad itio n al version of the

adjacency constraint. However, it m ay not be unreasonable given H udson’s

revision of th e principle. This is an em pirical question which awaits further

investigation.

We have seen how structured nam es for word instances distinguish them

on th e basis of position and reading. However, this does not cover all possible
am biguities. T here m ay also be am biguities of attach m en t. For example, in

sentence (77) the phrase with a telescope could modify either saw or the man.

T he alternative analyses are shown in Figures 9.13 and 9.14. (T he stan d ard

W G analysis requires nouns to depend on determ iners ra th e r th a n vice versa.)

(77)

I saw th e m an w ith a telescope.

To distinguish th e different attachm ents, it is necessary to add another

com ponent to a word instance’s identifier. So, for example, th e instance of

with which depends on saw m ight be identified as [5,1,1], whereas th e instance

which depends on telescope would be identified as [5,1,2]. A lthough not shown

in th e form al specification of th e algorithm , the parser m ust generate new

214
saw the m an w ith a telescope

Figure 9.13: with a telescope depends on saw

1
saw the m an w ith a telescope

Figure 9.14: with a telescope depends on the man

identifiers during the parse to cope w ith cases like this. Needless to say, two

instances sharing the sam e position in the sentence may not enter into any

dependency relationship w ith each other whatsoever.

By m eans of the above nam ing convention, all possible readings for the

sentence are generated breadth-first. T he parser consequently runs ra th e r

slowly b u t, given its academic rath er th an engineering m otivations, this is not

a serious fault.

To date, th e parser has been tested w ith a fairly small gram m ar b u t it

has been able to handle an impressive range of English constructions. These

include a variety of different kinds of com plem ent and adjunct structures,

shared dependent (a.k.a. m ultiple head) structures, negatives, and coordinate

constructions, including examples w ith gapping.

9 .4 Sum m ary

B oth parsers described here work bottom -up, left to right, w ith a single pass.

Furtherm ore, b o th altern ate between dependent-seeking and head-seeking.

215
Table 9.2: main features of Fraser’s Word Grammar parser

Search origin bottom -up

Search m anner depth-first
Search order left to right
Num ber of passes one
Search focus heads seek dependents;
then dependents seek heads
A m biguity m anagem ent chronological backtracking;
(early identification of failure)

Table 9.3: main features of H udson’s W ord G ram m ar parser

Search origin bottom -up

Search m anner breadth-first
Search order left to right
N um ber of passes one
Search focus heads seek dependents;
then dependents seek heads
A m biguity m anagem ent all trees constructed in parallel

This is m ade possible by the fact th a t W G words are subcategorized for: heads
as well as for dependents. The parsers differ in respect of th eir tre a tm e n t

of ambiguity. My parser aims to produce a first parse as quickly as pmssible

by spotting problems early and backtracking over th e shortest possib)le dis

tances. H udson’s parser is much slower and m uch more thorough, g e n eratin g

all possible parses breadth-first w ithout the help of a chart.

The m ain features of my parser are sum m arized in Table 9.2. Thiose of

H udson’s parser are sum m arized in Table 9.3.

216
C h a p te r 10

C o v in g to n ’s parser

1(0 1 O v erv iew

In I tlhis chapter I describe a dependency parser w ritten by M ichael Covington,

a irefsearch scientist at the University of Georgia, USA. Covington is unusual

in tlhat he brings together expertise in classics and history of linguistics w ith

mcone contem porary interests in artificial intelligence. A com parison of two of

hiss ^publications. Syntactic Theory in the High Middle Ages (Covington 1984)

anid Prolog Programming in Depth (Covington et al. 1987), serves to illustrate

hiss m nusual blend of interests.

S ection 10.2 presents a brief review of some of C ovington’s work on me-

diaiewal gram m ar which informs his work in DO. Section 10.3 describes the

uniifiication-based gram m atical form alism Covington assum es, and Section 10.4

defsciribes his dependency parser.

1(0-2 E a r ly d e p e n d e n c y g ra m m a r ia n s

Co)viington’s work in the history of linguistics is more p ertin en t to the concerns

of thiis thesis th an m ight at first be apparent. Covington traces th e origins

of D'G back to the M odistae, a group of mediaeval gram m arians startin g w ith

Maarttin of D acia in the mid 1200s who a tte m p ted to m ake ‘modes of signifying’

thœ Ibasis of all gram m atical analysis (Covington 1984: 25). One of th e most

im p o r ta n t principles of m odistic syntax is th a t th e relation betw een two words

217
in a construction is not symmetrical; one of the words is the dependens, the

other is th e terminans. Thom as of E rfurt offers a m etaphorical definition in

his Grammatica Speculativa:

Ju st as a com posite entity in n atu re consists of m atter and form,

of which one is actual and the other is potential, in th e same way
construction in language comes about through th e exerting and
fulfilling of dependencies. T he dependent constructible is th e one
th a t by virtue of some mode of signifying seeks or requires a te r
m inus to fulfill its dependency; the term inant is the constructible
th a t by virtue of some mode of signifying gives or supplies th a t
term inus (C ovington 1984: 48, C ovington’s translation).

Superim posed on the dependens-term inans relation is another, the relation of

prim um to secundum. Covington notes th a t

the relation of prim u m to secundum is similar to th e basic relation

posited by m odern dependency gram m ar, in th a t th e secundum
presupposes th e presence of the prim um (C ovington 1986: 31).

This concern of C ovington’s with th e origins of gram m atical theory in general

and DG in particular informs his work in parsing. In introducing his parser

he ties it to the work of the M odistae:

In a sense, the algorithm is not new; there is good evidence th a t

it was known 700 years ago. B ut it has not been im plem ented on
com puters [before] (Covington 1990a: 1).

To say th a t C ovington’s work is informed by m ediaeval gram m atical theory is

not to say th a t his parser slavishly follows its dictates. His parser is n o t an

im plem entation of th e gram m atical theory of Thom as of Erfurt!

10.3 U n ific a tio n -b a se d d e p e n d e n c y g ra m m a r

Covington bases his DG on a variation of M iller’s ^D-rules’ (Miller 1985). In

stead of using atom ic symbols like N and V he uses feature structures of th e

218
kind th a t are commonly used in unification-based gram m ars (Shieber 1986).

T he following rule:

(78)
category : X ■ category : Y -
gender : a gender : a
num ber : N num ber : N
case : . a case : a
indicates th a t a word of category Y w ith gender G, num ber N and case C can

depend on a word of category X w ith gender G, num ber N and case C. The

rule says nothing about word order. By convention th e head is always w ritten

first. If th e feature structure corresponding to some word unifies w ith the left
h and side of the rule and the feature stru ctu re corresponding to some other

w ord unifies w ith the right hand side of the rule, th e two words can en ter into a

dependency relationship in which the head is the word whose feature structure

m atches the left hand side of the rule.

A simple sem antics can be built into this fram ew ork as follows:

(79)
category : noun
category : verb
num ber : N
num ber : N
person : P
person : P
sem a n tics : Y
sem an tics : X{Y^Z)
case : nom

This rule allows subjects to depend on verbs and also ensures th a t th e subject

becomes the v erb ’s first argum ent.

This kind of simple sem antics is used to m anage optionality and obliga

toriness in the gram m ar. If an argum ent is obligatory then it is also unique.

Once an obligatory argum ent is found it in stan tiates a variable in th e feature

m atrix which can not be subsequently rein stan tiated . Therefore th ere can not

be m ultiple m atches. If this sem antic constraint were not present, th e above

rule could be used to provide the verb w ith as m any ‘su b jects’ as there were

nouns in the sentence. There m ust be an explicit check at th e end of parsing

to ensure th a t no sem antic argum ents rem ain u n in stan tiated . In order to add

219
optional dependents to a word, the rules relating to these dependents must

be w ritten so as to add feature-value pairs rath er th a n to supply values for

existing features.
Even variable word order languages place some constraints on order such

as the requirem ent th a t prepositions precede their nouns. This is handled by

m arking rules where necessary as ‘head first’ or ‘head la s t’ and requiring the

dependents to be ordered accordingly. The gram m ar and parser Covington

describes do not provide a m echanism for handling strict contiguity require

m ents. Covington proposes a scheme for im plem enting these by m arking th e

head of the constituent in question w ith a feature contig which would be copied

recursively to all its dependents. An explicit check would ensure th a t all words

bearing this feature were contiguous.

1 0 .4 C o v in g to n ’s p a rser

Covington declares his principal objective in w riting his parser to be th e in

vestigation of parsing techniques for languages w ith variable word order an d ,

in particular, languages w ith discontinuous constituents. T he parser is im p le

m ented in V M /Prolog on an IBM 3090 Model 400-2V F computer.^ There iis

no m orphological analyser; all forms of a word are stored in th e lexicon. Thie

features used in lexical entries include:

p h o n the w ord’s phonological or orthographic form;

c a t the w ord’s syntactic category;
c a s e , n u m , g e n , p e r s gram m atical agreem ent features;
id a unique identifier for each word;
d e p an open list containing pointers to the feature structures of all th e word ’s
dependents.

T he parser m akes an initial pass through th e sentence, looking up each word iin

th e lexicon and replacing the word in th e in p u t string w ith its feature structurée.
^Covington’s paper describing his parser (Covington 1990a) won first prize in the Socital
Sciences, Humanities and Arts section of IBM’s Supercomputing Competition (see Thhe
F in ite S tring 16:3, September 1990, page 31; L S A B u lletin 129, October 1990, page 16).

220
T here is no reason why this lexical scan phase should not be interleaved with

the linking procedure in an increm ental parser.

Two lists are m aintained by the parser: ‘PrevW ordL ist’ which contains all

words th a t have been input to the parser so far, and ‘H eadL ist’ which contains

only words which are not dependents of other words. At th e start of parsing

b o th of these lists are empty. At the end, HeadList should contain a single

item , th e only word w ithout a head left in the sentence, i.e. the sentence root.

Parsing proceeds by processing each of the words in th e sentence in turn,

as follows (quoted from Covington 1990a: 19):

C o v in g to n ’s parsing algorithm

1. Search PrevW ordList for a word on which the current word can depend.

If there is one, establish the dependency; if there is more th an one, use

th e m ost recent one on the first try; if th ere is none, add th e current

w ord to HeadList.

2. Search HeadList for words th a t can depend on the current word (there

can be any num ber), and establish dependencies for any th a t are found,

rem oving them from HeadList as this is done.

221
I N I T IA L I Z A T I O N : read input words into a list;
C is the current word in the list;
C:=l;
initialize two empty stacks: Stackl and Stack2;
push(Stackl, C);
C:=2;
Root is the result variable;
X is a global variable;
X :=l.

1. IF C=e
T H E N goto(4)
E L S E IF X=0
T H E N push(Stack2, C),
goto(2)
E L S E IF X C
T H E N record(X C).
goto(2)
E L S E X:=X-1,
goto(l).

2. IF empty(Stackl)
T H E N goto(3)
E L S E IF C top(Stackl)
T H E N record(C —> top(Stackl)),
pop(Stackl),
goto(2)
E L S E push(Stack2, top(Stackl)),
pop(Stackl),
goto(2).

3. IF empty(Stack2)

T H E N X:=C,
C:=C+1,
goto(l)
E L S E push(Stackl, top(Stack2)),
pop(Stack2),
goto(3).

222
4. Root:=top(Stackl),
pop(Stackl),
IF empty(Stackl)
T H E N succeed
E L S E fail.

A lg o r ith m 10.1: C ovington’s dependency parsing algorithm

(no adjacency requirem ent)

Notice th a t the parser begins by searching for a word on which the present

word m ay depend and afterw ards searches for words which can depend on

the present word. This is unusual; for example, my own dependency parser
and those of Hudson, and S tarosta and N om ura all begin by searching for

dependents for the current word and thereafter proceed to searching for a

head for the current word. T he reason for proceeding in this way is simple.

If a word has b o th a head and a dependent occurring on th e same side, the

dependent is alm ost always closer to th e word th a n th e head. By searching for

the dependent first, the possibility of considering th e dependent as a p otential

head is ruled out. Perhaps this difference is not yet relevant to C ovington’s

system since he has so far tested his algorithm only against d a ta from Russian

and Latin, bo th of which have variable word order and rich case systems.

As the parser stands, it could be expected to produce spurious parses for a

fixed order, virtually case-free language like English. C ovington claims th a t his

parser could be modified to respect th e sort of adjacency required for English

by modifying his two algorithm steps as follows:

M o d ification s to algorith m to introduce adjacen cy *

1. W hen looking for the word on which the current word depends, consider

only th e previous word and all words on which it directly or indirectly

depends.

2. W hen looking for potential dependents of th e current word, consider only

a contiguous series of members of HeadList beginning w ith th e one most

recently added.

223
A PARS description of Covington’s modified algorithm is given below.

I N I T IA L I Z A T I O N : read input words into a list;

C is the current word in the list;
C;=l;
initialize two empty stacks: Stackl and Stack2;
push(Stackl, C);
C:=2;
Root is the result variable;
X is a global variable;
Top is a global variable;
H is a local variable
(it is not bound between subroutine calls).

1. IF C=e
T H E N goto(5)
E L S E IF C-1 C
T H E N record (C-1 C),
goto(3)
E L S E X:=C-1,
goto(2).

2. IF H - , X
T H E N IF H - , C
T H E N record(H — C).
goto(3)
E L S E X:=H,
goto(2)
E L S E push(Stack2, C),
goto(3).

224
3. I F empty(Stackl)
T H E N goto(4)
E L S E Top:=top(Stackl),
IF C -> Top
T H E N record(C —> Top),
pop(Stackl),
IF top(Stackl)=(Top-l)
T H E N goto(3)
E L S E goto(4)
E L S E pop(Stackl),
pop(Stackl),
IF Top(Stackl)=(Top-l)
T H E N push(Stack2, Top),
goto(3)
E L S E push(Stackl, Top),
goto(4).

4. I F empty(Stack2)
T H E N C:=C+1,
goto(l)
E L S E push(Stackl, top(Stack2)),
pop(Stack2),
goto(4).

5. Root:=top(Stackl),
pop(Stackl),
IF empty(Stackl)
T H E N succeed
E L S E fail.

A lg o r ith m 10.2: C ovington’s dependency parsing algorithm

(including adjacency requirem ent)

C ovington’s claim is th a t with these requirem ents added, th e algorithm

would be equivalent to th a t of Hudson (1989c). Certainly, th e algorithm s are

similar in spirit although H udson’s parser can deal w ith phenom ena such as

coordination and movement which Covington’s can not handle. Links would

225
not be established in th e same order in bo th parsers since, as I have already

pointed out, H udson’s parser searches for dependents first and C ovington’s

parser searches for heads first. This difference is not trivial. In m any cases it

leads to C ovington’s parser failing to find an analysis where H udson’s parser

succeeds. Consider sentence (80).

(80)

I like blue cheese

W hen cheese is being parsed, th e algorithm requires blue to be considered as

its head. This fails. Since only blue and any word on which blue depends (in

this case none) m ay be considered as a head for cheese^ cheese m ust be added

to HeadList. Next HeadList is searched in order to find dependents for cheese.

T he only dependent which is found is blue so it is removed from HeadList.

T here are no more words in the sentence so parsing term inates. However,

cheese has not been linked to its head like. T he parsing algorithm has failed to

find a stru ctu re for (80). Having read an earlier draft of this chapter, C ovington

accepts these criticisms. A modified version of his parser, in which dependents

are searched for first, has now been published (Covington 1990b). It appears

to work unproblem atically.

C ovington’s main interest, however, is in th e version of his parser which

has no adjacency constraint. He points out th a t though his parser is capable

of finding discontinuous constituents, it nonetheless ‘prefers’ analyses in which

constituents are continuous. This is because it always begins searching as close

as possible to th e current word, and works backwards. W hen an analysis fails,

the parser uses Prolog’s backtracking facility to ‘unpick’ w hat has been built

back to the point where the wrong choice was m ade and th en starts building a

new analysis. This approach to recovery from failure (it is also th e m echanism

which produces exhaustive enum eration of all possible readings of th e sentence)

is com putationally expensive since there is no way of preventing backtracking

from discarding structure which will have to be rebuilt. In C ovington’s favour.

226
it m ust be said th a t he presents his parser as a pro to ty p e so it is probably too

early to criticize it on grounds of im plem entational inefficiency.

Covington does, however, address th e question of the tim e complexity of his

parser. T he tim e required to parse an n-word sentence using the most efficient

C FPS G algorithm s, is proportional to a t most n^. Covington suggests th a t

th e sam e is tru e of any dependency parser w ith an adjacency constraint. He

m akes his case as follows (quoted from Covington 1988):

1. A dependency parser m ust attach every word in th e sentence (except the

m ain verb) to some other word.

2. W ith o u t backtracking, this would require, at m ost, exam ining every com

bination of two words, checking w hether a dependency relation between

them is possible. There are such com binations.

3. However, the dependency parser m ay have to backtrack, i.e., discard

attachm ents already m ade and replace th em w ith other possibilities. At

w orst it m ust repeat all its previous work every tim e it parses another

word, thus introducing another factor of n. Hence the to tal worst-case

tim e is proportional to n^.

Inevitably, parsing w ithout an adjacency constraint will be more complex since

th e search space will be larger. Covington suggests a worst case tim e p ro p o r

tional to n ” . In defence of a parser w ith such a high complexity he notes th a t

(i) th e com plexity is due to allowing discontinuous constituents, not to the use

of dependency; (ii) worst case complexity is irrelevant to n atu ral language pro

cessing (after all, hum ans are typically unable to process ‘worst cases’); (iii)

the com plexity can be reduced by p u ttin g a rb itra ry limits on how far away

from th e current word the search for heads and dependents may proceed.

However, even if we were to accept his observations, it is still the case

th a t, other things being equal, the parser w ith th e lowest com plexity is to be

prefered over any alternatives. The argum ents he offers could be m ade for

227
any high-com plexity parser so they do not distinguish his parser from others

of sim ilar complexity. N either do they justify the selection of this parser over

others of lesser complexity.

C ovington acknowledges th a t coordinate constructions pose a problem for

DGs; his parser does not handle them . Since he has placed special em pha

sis on producing a variable word order parser, it can be argued th a t he has

selected th e task to which dependency parsers are best-suited. A fter all, his

parser operates w ith the m inim um of constraints; it spots possible dependency

pairs and thereafter has no furth er constraints to check to see if th e words

are accessible to each other. The parsing com plexity m ay be high b u t the

algorithm ic com plexity is low. If, on the other hand, he modifies his parser

so th a t it embodies an ex tra adjacency constraint, the search space is reduced

but the algorithm ic com plexity is increased. Furtherm ore — in addition to the

problems I have already noted — the adjacency constraint he proposes is not

sufficient to allow the parser to produce correct analyses of norm al movement

phenom ena in English. W hat is required is an adjacency con strain t plus some
principled way of analysing th e small num ber of discontinuous constituents

which regularly occur in fairly fixed word order languages like English.

10.5 Sum m ary

C ovington’s parser is loosely inspired by th e work of th e medieval M odistae.

It has as its prim ary objective th e parsing of variable word order languages.

I have presented two versions of the parser. W hereas th e first version has

no adjacency constraint a t all, the second version does include one. B oth

parsers im plem ent left to right, bottom -up, depth-first search. T hey both also

establish dependencies by first seeking heads and then seeking dependents. As

I have observed, this results in th e failure of th e version w ith an adjacency

constraint to parse some sentences correctly. Each version yields one parse

only, although it is possible to produce all parses by forced backtracking.

228
Table 10.1: main features of Covington’s first two dependency parsers

Search origin bottom -up

Search m anner depth-first
Search order left to right
N um ber of passes one
Search focus dependents seek heads;
then heads seek dependents
A m biguity m anagem ent chronological backtracking

T he m ain features of both versions of C ovington’s parser are sum m arized

in Table 10.1.

A more recent version reverses the order of search so th a t dependents are

searched for before heads. This algorithm is virtually identical to th a t of

H udson, as described in PARS in the last chapter.

229
C h ap ter 11

T h e CSELT la ttic e parser

11.1 O v erv iew

In this chapter I describe the SYNAPSIS parser developed a t th e C entro Studi

e L aboratori Telecomunicazioni^ (CSELT) in Turin, This is not th e only parser

to be produced a t CSELT which makes use of th e notions of dependency, or at

least valency. A system called SHEILA (‘Syntax Helping Expectations In L an

guage A nalysis’) analyzes and ‘u n d erstan d s’ inform ation from a news agency

wire by using a m ixture of PSG and DC (Danieli et al, 1987), PSG is used

to construct the m ajor phrases of a sentence; DG is used to establish depen

dencies between m ajor phrases. The rational for this approach is th a t phrase

stru ctu re parsing is well-understood and consequently should b e used where

possible and effective, i,e, in building im m ediate constituents. However, de

pendency is useful for linking the m ajor constituents of th e sentence because,

by and large, syntactic and sem antic dependencies are isomorphic. This is

claimed to assist in early disam biguation since sem antic constraints can be

brought to bear im m ediately a syntactic dependency is postulated. This sys

tem bears a striking sim ilarity to N iederm air’s divided valency-oriented parser

(Niederm air 1986; briefly described on page 64, above).

T he object of the SYNAPSIS parser differs significantly from th a t of all

the other parsers described here: it is designed speciflcally for th e purpose

of analyzing spoken rath er th a n w ritten language. T he difference turns out

^The research division of the Italian telecommunications company.

230
to be non-trivial as we shall see in Section 11.2. The parser is described in

Section 11.3.

11.2 T h e p rob lem : la ttic e p a rsin g

One of the m ost im portant differences between spoken language and w ritten

language is the markedness of word boundaries. In w ritten language, word

boundaries are clearly indicated by the presence of a space. In com puter sys

tem s it is norm al to regard the space not simply as a gap — an absence of

writing — bu t rather, as an explicit boundary marker. In spoken language,

while pauses m ay occur betw een words, there are no guarantees th a t this will

happen in every case. R ather the opposite is the case: it is norm al for words to

be run together to the extent th a t the final segment of a word is coarticulated

w ith the initial segment of the following word. Thus, th e speech recognition

problem does not consist solely in th e identification of w hat lies between word

boundaries; it also requires the hypothesization of the boundaries themselves.

If th e set of hypotheses is to include th e correct segm entation then it is likely

to have to contain some alternatives. For example, a short speech signal could

be segmented as I see or icy. Given the lim itations of present speech recog

nition technology, most sentences are analysed in term s of m any alternative

segm entations and, for the signal chunk between each hypothesized pair of

word boundaries, there will be several different word candidates. Far from

o u tp u ttin g a single string of words, a connected speech recognizer typically

outputs a lattice o f hypothesized paths, one of which hopefully corresponds to

the ‘correct’ analysis of the sentence. Figure 11.1 shows a very simple lattice

based on the two words I know. This is a portion of a larger lattice presented in

Phillips (1988). In reality, m ost lattices are likely to be m uch m ore com plicated

th a n this.

Not all paths through a lattice are equally likely. W hen a speech recognizer

constructs a word hypothesis it weighs the evidence for and against the validity

231
inner, honour, owner, army

in, an, on, own, iron, oh, or, are, air,

earn, him,am, aim, arm________ ear,our,hour

I, eye, oh,our,hour, know, no, nor, now,near,

are, or,her, air, ear mayor, more,near,mere

Figure 11.1: a simple lattice for the u ttered words I know

of the hypothesis and assigns a num eric ‘confidence score’ to the hypothesis.

If th e confidence score is greater th a n some threshold th en th e hypothesis is

entered in the lattice, otherw ise it is discarded. Since all words in the lattice
have an associated confidence score, it is possible to rank-order p ath s through

th e lattice on the basis of confidence scores. In an ideal system operating

under ideal circum stances, the highest-scoring p a th would correspond to the

‘correct’ analysis. However, there are no guarantees th a t this will be th e case.

In fact, there are no guarantees th a t th e ‘correct’ analysis will be represented

in th e lattice a t all, although p arts of it almost certainly will. We shall see in

the next section how th e SYNAPSIS parser is able to analyze some sentences

correctly, even when certain words are missing from th e lattice.

Clearly, m ost of the paths through th e lattice will be incoherent a t the

levels of syntax and sem antics. Ideally there will be a single p a th through

th e lattice which satisfies the higher level constraints, although th e possibility

of there being more cannot be ruled out a priori. T he task of recognizing a

spoken sentence should thus reduce to the task of constructing a lattice and

232
then parsing every p a th to find the syntactically and sem antically coherent

one(s).
T here is a simple reason why this approach is im practical for most purposes:

there are too m any possible paths through the lattice. Speech understanding is

a real tim e activity. Most speech interfaces are conceived w ith the aim of facil

itatin g rapid hands-off interaction w ith a com puter. T here may simply be in

sufficient tim e for all possible paths to be considered (assum ing th e constraints

of state-of-the-art com puting technology) if th e speech interface is to produce

an in terp retatio n w ithin the lim its of the desired response time. In the case of

‘conversational’ com puter systems such as the Sundial system (Peckham 1991),

rapid response m ay be necessary for other reasons. For example, it has been
shown th a t in everyday hum an-hum an conversation, speakers seldom leave u n

filled pauses of more th an about 1 second (Jefferson 1988). If speaker A asks

speaker B a question and speaker B does not respond w ithin the crucial %1

second period, speaker A will feel compelled to take th e initiative and begin

a new conversational tu rn . T here are certainly exceptions to this general

ization and it is unlikely th a t the phenom enon transfers exactly to hum an-

com puter conversations. However, initial results from ‘W izard of O z’ experi

m ents (Fraser and G ilbert 1991b) in which subjects conversed w ith a sim ulated

com puter, suggest th a t a related phenom enon can be found in hum an-com puter

interactions (Fraser and G ilbert 1991a, Fraser et al. forthcom ing). Clearly, a

conversational com puter m ust be able to u nderstand an u tteran ce and generate

a reply before the hum an user starts responding verbally to an ‘accountable

silence’.

T he lattice to be searched by a speech recognizer is typically very large

indeed. A ten word sentence, analysed as a lattice consisting of ten edges, each

having four com peting hypotheses, would yield m ore th a n a million possible

paths. M ost speech recognition systems construct lattices containing m any

m ore th a n four hypotheses for each edge. For exam ple, th e CSELT speech

233
recognizer constructs lattices containing approxim ately fifty tim es the num ber

of actual words u ttered . I shall use a much lower figure to illustrate the n atu re

of the lattice parsing problem. Suppose a ten word sentence is analysed as a

lattice containing ten com peting hypotheses for each word. This lattice would

yield m ore th an ten billion possible paths. Phillips reports th a t “An actual

parser I have used would usually find a parse [for a ten word sentence] after

trying a couple of hundred million paths — an average of six or seven words

for each position” (Phillips 1988). Assuming it were possible to produce one

hundred parses per second for a ten word sentence, it would take about eleven

and a half days to produce one hundred million parses!

According to G azdar and Mellish, “Am biguity is arguably the single most

im p o rtan t problem in N L P” (G azdar and Mellish 1989: 7). It introduces the

possibility of m ultiple syntactic analyses of p arts or all of a sentence. However,

th e word class or word sense am biguity which preoccupies m ost com putational
linguists and to which G azdar and Mellish refer, is norm ally considered from

th e startin g point of a string of distinguished words. W hen th e startin g point is

a lattice, and the indeterm inacy of th e acoustic signal is com pounded w ith the

indeterm inacy of th e gram m ar, the com binatory explosion of possible paths

from signal to analysis is alarm ing.

It is unrealistic to expect a parser to search a lattice and find a solution by

‘b ru te force’ w ithin a reasonable tim e period. The m agnitude of the problem

precludes th e use of such a technique. The CSELT SYNAPSIS parser is an

a tte m p t to solve th e problem by applying appropriate ‘intelligence’ ra th e r th a n

‘b ru te force’. It is an a tte m p t to use th e inform ation to be found in th e acoustic

signal to lim it the search space of th e parser, and the inform ation contained

in the gram m ar to constrain th e search space of th e word recognizer. As

such, its concerns are different from those of the other parsers I have described

and it is not readily com parable w ith them a t an algorithm ic level. However,

th e fact th a t it is b o th based on DG and algorithm ically innovative makes

234
it particularly relevant to our present concerns. It also serves to illustrate a

prom ising application of DG in NLP.

1 1.3 T h e so lu tio n : th e S Y N A P S I S p a rser

Section 11.3.1 provides a brief overview of the SYNAPSIS parser. This is

followed in Sections 11.3.2 to 11.3.4 by a more detailed exam ination of the

form of syntactic and sem antic inform ation used by th e parser. Section 11.3.5

describes the basic SYNAPSIS parsing strategy and Section 11.3.6 outlines a

suggestion for parallelizing the parser.

1 1 .3 .1 O v e r v ie w o f S Y N A P S I S

T he SYNAPSIS (SYN tax-A ided P arser for Sem antic In terp retatio n of Speech)

parser is p a rt of a larger question-answ ering system for extracting inform ation

from a database by means of relatively unconstrained spoken n atu ral lan

guage requests (Fissore et al. 1988). The d atabase used during developm ent

contained inform ation about the geography of Italy. T he earliest references

to SYNAPSIS in the literatu re are dated 1988, although SUSY, th e overall

speech understanding system (recognizer + parser -f generator -f synthesizer)

of which SYNAPSIS is ju st one p a rt, is described in Poesio and R ullent (1987).

T he principle m otivating th e design of SYNAPSIS was th a t syntactic, and

indeed, sem antic constraints should be brought to bear as early as possible

in the interp retatio n of a lattice. T h a t is, knowledge of syntax and sem antics

should provide expectations to guide search in th e lattice, th u s ensuring th a t

syntactically or sem antically impossible structures were not considered. T he

parser had to im plem ent a top-down strategy. On th e o ther hand, since it was

observed th a t correct words were usually — though not always — am ongst the

highest confidence-scoring words, a useful search strategy would be to consider

th e highest-scoring words first. Therefore, th e parser should em body bottom-up

features as well.

235
Since the search space is so large, it was considered ap p ro p riate to apply

as m any top-dow n constraints on search as possible. Sem antic constraints, as

well as syntactic constraints should be allowed to trickle down. A sem antic

representation based on caseframes (Fillm ore 1968) was adopted, for reasons

which have as much to do with th e specific problems of speech recognition as

they have to do w ith the usual range of issues which confront linguists. The

conventional m otivations for choosing caseframes concern th e requirem ent th a t

the sem antics be form ally explicit, descriptively adequate, and com positional.

Caseframe sem antics satisfy these criteria. A first speech-related m otivation is

th a t the sem antics be w ord-based, ra th e r th a n phrase-based. Since th e prim i

tive units in the lattice are words, it is desirable th a t single words should trig

ger sem antic rules. This is tru e of caseframes which are associated w ith single

words. (Effectively th e caseframe expresses the sem antic valency of the word).

A nother m otivation is the desire to “correlate sem antic significance w ith acous

tic certainty” (G iachin and R ullent 1989: 1538). It is claim ed th a t caseframes

facilitate this because “the header word, being th e m ost ‘m eaningful’ one,

tends to be u tte re d more clearly, and hence is easily recognized w ith good

acoustical score” (ibid.). For these reasons, caseframes have been adopted in a

num ber of speech understanding system s (e.g. B rietzm ann and Ehrlich 1986;

Hayes et al. 1986).

Caseframes encode only sem antic slots for a given word, and constraints

on th e sem antic ty p e of each slot-hller. However, it is not sufficient to rely

on sem antic constraints alone. For exam ple, th e sem antic caseframe for the

verb put will indicate th a t it requires a PA TIEN T of some m aterial type (i.e.

th e thing which is ‘p u t’) and a GOAL of type LOCATION (i.e. where the

PA T IEN T is ‘p u t’). This says nothing about the realization of these cases.

For exam ple, it places no constraints on the relative ordering of put, its PA

T IE N T and its GOAL. It is necessary to combine syntactic constraints w ith

sem antic constraints in order to maxim ize th e useful inform ation in top-dow n

236
predictions.
T he way in which syntactic and sem antic inform ation is combined is of

vital im portance. One approach would be to add simple positional features

to the case slots, after the fashion of C onceptual D ependency (Schank 1975;

Schank and Riesbeck 1981). This would produce a ‘sem antic gram m ar’. There

are a num ber of argum ents against this way of tackling the problem . Firstly, it

misses a lot of syntactic generalizations. Most sem antic gram m ars are w ritten

piecemeal w ith new ‘concepts’ being added when required and (usually barely

adequate) word order features being added to each new sem antic entry. There

is typically no principaled way of dealing with general types of construction

such as relative clauses. Secondly, th e gram m ars are necessarily tied to some

sem antic dom ain. A sem antic gram m ar developed in th e context of Italian

geography could not readily be ported to a stock control application in spite of

th e fact th a t m any sentence types would be common to b o th domains. Thirdly,

sem antic gram m ars are not readily modifiable since sy ntactic and sem antic

constraints tend to be mixed up together in a collection of ad hoc rules. (For

a discussion of the shortcom ings of sem antic gram m ars see R itchie 1983.)

T he approach adopted in SYNAPSIS is to keep a sharp distinction be

tween syntax and sem antics during gram m ar developm ent and to ensure th a t

appropriate generalizations are m ade w ithin distinct knowledge bases. In this

way, formal rigour and consistency can be m aintained. W hen th e syntax and

th e sem antics are com pleted for some phase of th e project, they are auto

m atically compiled into a unified fram ework sim ilar to a feature gram m ar

w ith mixed syntactic and sem antic features. G rishm an observes th a t parsers

based on conceptual dependency can be characterized as being “guided by se

m an tic...p attern s and then applying (lim ited) sy ntactic checks, whereas most

parsers are guided by syntactic p a tte rn s and then apply sem antic checks”

(G rishm an 1986: 121). SYNAPSIS tre a ts neither syntax nor sem antics as

prim ary, bu t instead it merges the two into a genuinely mixed gram m ar which

237
is nonetheless easily portable and modifiable.
Syntax in SYNAPSIS is expressed in term s of DG. T he choice is n a tu

ral, given the adoption of caseframe sem antics. B oth system s are w ord-based

and both directly encode the notion of a head or governor and a set of de

pendents or modifiers. I have already observed th a t syntactic and sem antic

dependencies are isom orphic in m any cases. H udson’s description of W G can

be taken as a particularly em phatic expression of a widely-held view am ongst

DG practitioners:

T he parallels [between syntactic stru ctu re and sem antic structure]

are in fact very close — virtually every word is linked to a single
elem ent of the sem antic stru ctu re, and the dependency relations
betw een the words are typically m atched by one of tw o relations
betw een th eir meanings: dependency or identity. M oreover, if word
A depends on word B, and th e sem antic relation betw een th em is
dependency, th en the dependency nearly always goes in th e same
direction as in th e syntax — th e m eaning of A depends on th a t of
B (Hudson 1990: 123).

In order to avoid term inological com m itm ent to regarding either sy n tax or

sem antics as basic, rules containing merged syntactic and sem antic constraints

are given the n eu tral nam e knowledge sources?

These knowledge sources are used by th e parser in a m ixed top-dow n and

b o tto m up control strategy which em bodies the principles of best-first search.

1 1 .3 .2 D ep en d en cy gram m ar

T he DG used in SYNAPSIS is defined as a tuple

D G = { C ,R }

in which C is a set of lexical categories and R is a, set of rules of th e form:

(81)

a jY0 — .A1 , 2 ,..., ^

b = *
^Although the term ‘knowledge source’ is typically associated with blackboard systems,
the SYNAPSIS system is not described as such in any of the published accounts I have seen.

238
X i G C and n > 0.

S tan d ard constraints on sentence well-formedness apply. This is a very slight

m odification to the Gaifman rule form at. Notice th a t because the gram m ar is

defined as a tuple, rath er th an as a 4-tuple, it is not possible to refer to specific

words in the gram m ar. This makes it difficult to express th e strongest possible

predictions which the gram m ar ought to be able to make, nam ely predictions of

single words. For example, the English verb depend requires a nom inal subject

and a com plem ent which m ust be the word on. This observation is very robust

and could be used to direct word recognition w ith pinpoint accuracy. It would

be simple to express the rule in a 4-tuple DG as follows:

(82)

depend = NOUN * on

T he best th a t can be done in a DG of th e sort used in th e SYNAPSIS project

is the following:

(83)
D EPEN D = NOUN * ON

where ‘D E P E N D ’ and ‘O N ’ are classes which each possess exactly one member.

G aifm an form at rules and their im m ediate n otational relatives m ay be ap

propriate for describing formal languages b u t, like stan d ard phrase stru ctu re

rules, they are ill-equipped for making th e full range of generalizations relevant

to the syntax of n a tu ra l languages. In order to cope w ith phenom ena such as

m orphosyntactic agreem ent, it is necessary to augm ent the basic rule set. P re

vious chapters have docum ented how a popular approach has been to define

DGs in term s of complex feature sets which are com binable by unification.

T he approach adopted in SYNAPSIS is to a tta c h conditions to rules. These

conditions take the form of a word class label (which m ust be present in the

rule) followed by arbitrarily m any feature-value pairs. Instead of a value, a

variable (a character preceded by ‘?’) may be used. W here the same variable

is used in two conditions applying to th e same rule, coreference is indicated.

239
For example,
R u le : VERB = ART A D J NOUN * ART A D J NOUN

C o n d itio n s : VERB: (PERSO N (3)) (N U M BER ?X)

ART: (N U M BER ?X) (G E N D E R ?Y)
ADJ: (NUM BER ?X) (G E N D E R ?Y)
NOUN: (NUM BER ?X) (G E N D E R ?Y)
ART: (NUM BER ?Z) (G E N D E R ?W)
ADJ: (NUM BER ?Z) (G E N D E R ?W)
NOUN: (N U M BER ?Z) (G E N D E R ?W)

T he rule and conditions indicate th a t if th e head verb is in th e th ird person,

the article, adjective, and noun preceding it m ust agree w ith it in num ber and

w ith each other in gender. T he article, adjective, and noun following the verb

are not required to agree w ith it at all b u t they m ust agree w ith each other in

gender and num ber. (This exam ple is appropriate for Italian b u t not wholly

appropriate for English’s much sparser agreem ent system ). Published accounts

do not m ake clear how the particu lar symbol in a rule (e.g. one of the two

‘N O U N ’ symbols) is distinguished in the conditions.

It would be straightforw ard to convert a gram m ar expressed in this form

into a unification-based representation such as PAT R-II

(Shieber 1986).

1 1 .3 .3 C a se fr a m e s

A caseframe represents the sem antic valency of a head word. It contains any

num ber of case slots (i.e. roles) and constraints on the types of possible

slot fillers. Some slots m ust be filled; others are optional; they correspond to

necessary p arts of the state, action, or entity the caseframe describes b u t they

do not necessarily have to be m ade explicit in linguistic accounts of th e state,

action, or entity. (This is rem iniscent of W ilks’ (1875) Preference Semantics.

Caseframes in SYNAPSIS are represented in term s of Conceptual Graphs

(Sowa 1984). A detailed introduction to th e conceptual graph notation is

unnecessary for the purposes of the present discussion. T he exam ple shown

240
[LOCATED-IN-REGION]
(AGNT:Compulsory) [MOUNT+PROVINCE+LAKE]
—> (LOC:Compulsory) — [REGION]

Figure 11.2: a SYNAPSIS caseframe

in Figure 11.2 should serve to illustrate w hat a caseframe looks like. (The

exam ple is taken from Giachin and Rullent 1989: 1538.)

This indicates th a t the word whose m eaning is identified as ‘[LOCATED-

IN -R EG IO N ]’ requires an A G EN T of type M OUNT or PR O V IN C E or LAKE

and a LOCATION of type REGION . N either slot m ay be left unfilled in a

sem antically well-formed utterance. Notice th a t both the syntax and the se

m antics are expressed in declarative formalisms.

1 1 .3 .4 K n o w le d g e so u r c e s

T he dependency rules and the caseframes are not used serially, w ith one rule

set producing an initial analysis which is passed to th e other for comple

tion. Instead, the syntactic and sem antic rules are combined to form a unified

syntactico-sem antic gram m ar in which b o th types of constraint apply at the

sam e tim e. In principle, the combining of syntactic and sem antic constraints

could be done ‘on the fly’ during sentence processing, thus creating the re

sources to m eet th e particular needs of th e m om ent. In practice, this would

alm ost certainly be costly in term s of processing tim e and it could result in the

sam e com bination having to be perform ed m any tim es during a single recogni

tion session. T he obvious solution — and the one adopted in SYNAPSIS — is

to pre-com pile th e syntactic and sem antic inform ation into its unified form at.

Figure 11.3 shows p a rts of a syntactic dependency rule. It refers to a present

indicative verb w ith two dependents, one preceding it and th e o ther following

it. The following noun m ust agree in num ber w ith th e verb. Com m ents are

preceded by

Figure 11.4 is a caseframe indicating th a t th e word whose m eaning is iden-

241
VERB(prop) = NOUN(interr-indir-loc) <GOVERNOR> NOUN(subj)
;; Features and agreement
<GOVERNOR> (MOOD ind) (TENSE pres) (NUMBER ?X) ...
NOUN-1 ...
NOUN-2 (NUMBER ?X)

Figure 11.3: a SYNAPSIS dependency rule

[TO-HAVE-SOURCE]
—> (AGNT:Compulsory) — [RIVER]
— (LOC:Compulsory) — [MOUNT]

Figure 11.4: another SYNAPSIS caseframe

tified as ‘[TO-HAVE-SOURCE]’ requires an A G EN T of type R IV ER and a

LO C A TIO N of type MOUNTAIN. N either slot may be left unfilled in a se

m antically well-formed utterance.

Com bining the sem antic inform ation expressed in Figure 11.4 w ith th e

syntactic inform ation shown in Figure 11.3 produces the knowledge source

(KS) shown in Figure 11.5. (All of these d a ta structures are taken from

G iachin and R ullent 1988: 198.)

T he ‘com position’ entry indicates th a t th e syntactico-sem antic head, which

is of sem antic type TO-HAVE-SOURCE, m ust be preceded by an elem ent of

sem antic type M OUNT and followed by an elem ent of sem antic type RIVER.

T he first ‘co n strain t’ entry states th a t th e head word m ust be a present in

dicative verb and th a t the M OUNT elem ent m ust be realized as a noun. The

;; Composition
TO-HAVE-SOURCE = MOUNT <H EADER> RIVER
;; Constraints
<H E A D E R > -MOUNT ((H-cat VERB) (S-cat NOUN) (H-feat MOOD ind TENSE
pres...)...)
<H E A D E R > -RIVER ...
;; Header activation condition
ACTION(TO-HAVE-SOURCE)
;; Meaning
(TO-HAVE-SOURCE ! * agnt 1 loc 0)

Figure 11.5: a SYNAPSIS knowledge source

242
‘header activation condition’ is a flag to tell the parser how to use th e KR.

The ‘m eaning’ entry is used to construct the com positional sem antics of the

construction headed by a verb of sem antic type TO-HAVE-SOURCE.

H aving examined th e knowledge representations used in SYNAPSIS, we

are now ready to consider the parsing procedures it uses. Two versions of the

parser will be presented: a straightforw ard sequential parser and a parallel

version designed to decrease the am ount of tim e required to produce plausible

in terpretations of spoken sentences.

1 1 .3 .5 T h e s e q u e n tia l p a r se r

The in p u t to the parser is an entire lattice. In other words, syntactic and

sem antic constraints are used together to find a plausible p a th through an

existing lattice; they are not used to guide the construction of th e lattice. One

argum ent in favour of left-to-right increm ental processing is th a t a real-tim e

system cannot afford to wait until the end of the sentence has been reached

before starting to analyse w hat has been said. T he argum ent advanced by the
SYNAPSIS designers is th a t a real-tim e system cannot afford to sta rt parsing

as soon as the left-hand-side of the lattice has been built since there is always

the possibility th a t word recognition m ay be locally poor and this would lead

to a lot of wasted effort. It is much more prudent, they argue, to wait until the

end of the sentence and then begin parsing from the word in th e lattice w ith

the highest confidence score. There is a reasonable chance th a t the highest

scoring word will have been recognized correctly, and this allows fairly reliable

top-dow n predictions to be used to guide search in th e less well-scored p arts of

the utterance. Their claim is th a t this non-linear increm ental process results in

a quicker and, more im portantly, a more reliable result th an would be produced

by a left-to-right analysis. It is w orth flagging one problem w ith th e CSELT

approach, namely the fact th a t identifying th e end of a spoken u tteran ce is

a non-trivial task. Full stops are not typically vocalized! It is possible to

imagine a number of heuristics which m ight be useful, such as tim ing pauses

243
against some threshold, or arbitrarily insisting th a t a sentence m ay not exceed

n seconds in duration. However, none of these would be foolproof. It is not

clear how SYNAPSIS copes with this problem.

W h at I have just outlined is a best-first parsing strateg y which begins w ith

the highest-scoring word hypothesis and uses it to generate predictions which

can be tested against th e next highest-scoring hypotheses and so on. A parser

scheduler controls the process by means of a num ber of operators:

A C T IV A T I O N This operator selects th e highest-scoring word hypothesis

and finds a KS for which it could be the header. T he word hypothesis

and th e prediction are combined to produce a tree-stru ctu red deduction

instance (DI), T he tree stru ctu re derives from th e fact th a t the in stan ti

ated header has unfilled case slots. One way of viewing Dis is as phrase

hypotheses.

V E R I F Y This operator is used to fill a case slot in th e current DI w ith a

w ord hypothesis.

M E R G E This operator is like V ER IFY except th a t it is used to fill a case

slot in th e current DI w ith another DI ra th e r th a n a word hypothesis.

In other words, it is used to merge two tree structures.

P R E D I C T I O N This operator is used if the current DI is a/ac^, i.e. it has no

unfilled slots. If the DI is of type T, then this can be used to in stan tiate

another DI having a slot for a filler of type T.

A t least one other operator, SUBGOALING, is available. It is ra th e r more

complex since it is used to decompose and rearrange existing tree structures.

It is not necessary to be familiar with its action in order to appreciate the

general strateg y of th e parser.

R oughly speaking, parsing proceeds as follows. To begin w ith, th e highest-

scoring word is used to construct a DI (ACTIVATION). Next, th e em pty

slots in th e DI are used to generate predictions. For exam ple, a slot m ay

244
require a filler which is syntactically a NOUN and sem antically a R EG IO N .

All the word hypotheses in the lattice are checked using the V ER IFY operator.

W hen several hypotheses meet these conditions, the best scoring hypothesis

is activated while th e others are stored in a ‘w aiting’ list until such tim e as

the current score is worse th a n their score. In the m eantim e, th e best-scoring

hypothesis is used as the filler for the relevant case slot.

W hen a word is used to create a new DI, th e w ord’s confidence score is

assigned to the DI where it is known as the quality factor of th e DI. W hen a

word is added to an existing DI, the confidence score of the word hypothesis

and the quality factor of the DI are combined to produce a new quality factor

for the DI. T he best way to com pute this new quality factor is an open research

question. Versions of SYNAPSIS have been tried out which calculate quality

factors on the basis of joint probabilities (i.e. the sum of the word hypotheses

scores), and of score density (w ith or w ithout shortfall) (Woods 1982).

Once there is at least one DI available in th e system , control passes back

and forth betw een deduction and activation cycles. D eduction s ta rts from the

highest-scoring DI and tries to extend it in th e following ways:

1. if it is a fact DI (i.e. it has no em pty slots), by making it th e filler for a

slot in another DI (P R E D IC T IO N ), or

2. finding a filler in the word lattice for one of its case-slots (V E R IFY ), or

3. merging it w ith another DI (M ERG E).

The highest-scoring candidate is always chosen first, w hether it is a DI or a

word hypothesis. W hen the best DI has a quality factor worse th a n th e best

word hypothesis, the activation cycle begins and a next highest-scoring word

in the lattice is extracted and used to construct a new DI (ACTIVATION).

The parse is com plete when a single DI w ith no unfilled com pulsory case

slots covers the same tim e period as th e entire lattice.

T he parser is described as following a best-first search strategy b u t th e

(available) SYNAPSIS literatu re does not indicate w hether a depth-first or a

245
breadth first strategy is adopted at choice points w ith no m easure of ‘goodnesss’

available to guide th e choice. For exam ple, it is not clear from th e literatu ire

how indeterm inacies caused by lexical am biguities are resolved. One soluticon

m ight be to rank order knowledge sources having th e same header type amd

to insist th a t the highest-ranking knowledge source be used first. Taking thiis

suggestion further, knowledge sources having th e sam e header ty p e could Ibe

assigned probabilities relative to each o ther (established on th e basis of co>r-

pus analysis). These probabilities could potentially be used to weight lexic:al

confidence scores in th e com putation of quality factors.

The way in which SYNAPSIS constructs analyses bears a certain similair-

ity to the m ethod of th e HW IM system (Woods 1982) which builds an ‘islan<d’

around the highest-scoring word in th e lattice. T he crucial difference is t h a t

the SYNAPSIS system does not require phrases to be contiguous, whereas a

stan d ard island parser does.^ Presum ably the possibilities for building spiu-

rious discontinuous constituents are less for Italian th a n for English becauise

of th e additional explicit m orphosyntactic agreem ent in Italian. More im p o r

tantly, the co-presence of sem antic constraints and syntactic constraints o u g h t

to rule out m ost of th e spurious discontinuities a purely syntactic parser w ould

allow. Desired discontinuities (e.g. questions, topicalizations) would be p arsed

w ithout difficulty.

J o llie s

Function words cause serious problems for all speech recognition system s. B e

cause m ost function words are both short and typically unstressed, they are

often not recognized at all. If th e function words are not recognized they a re

absent from the lattice. This can cause problems for parsers when they try

to build constructions which require function words. Even if th e presence of

a function word is spotted, it m ay be very difficult to identify which function

Perhaps SYNAPSIS should be termed an ‘archipelago parser’ rather than an ‘island
parser’.

246
word it is. In general, the longer a word is, the easier it is to identify w ith con

fidence. The shorter a word is, the harder it is to recognize. So, for example,

it is much easier to recognize hippopotamus th a n an (which could be confused

w ith on, and, a, at, ant, etc.).

In the SYNAPSIS system , words which are considered to have only a func

tional role are known by the charming nam e jollies. A robust speech parser

ought to be able to proceed w ithout jollies in m ost cases. On th e other hand, it

ought to be able to find them if they are present in th e lattice since some jollies

m ake a useful contribution to parsing. For example, if they are recognized they

help to ensure th a t the correct p ath through the lattice is tem porally coherent.

Not all jollies are short, and some m ay have good confidence scores associated

w ith them , so it is desirable to use them when they are available.

In SYNAPSIS, “the general philosophy is to ignore a jolly unless there

are su b stan tial reasons to consider it” (Giachin and R ullent 1988: 199). All

jollies are tre a ted as term inal slots in their KS. T here m ay b e syntactic or even

sem antic constraints on them but they do not co n trib u te to th e compositional

sem antics. Since they are assumed to have no sem antic predictive power, jollies

are not available for m anipulation by the stan d ard operators. Instead, a special

operator, JV E R IF Y is used specifically for th e purpose of filling jolly slots.

T he operation of JV E R IF Y depends on the JO L LY -T Y PE of a jolly slot.

T here are three JO LLY -TY PES: SH O RT-O R-IN ESSEN TIAL, LONG-OR-

ESSEN TIA L, and UNKNOW N. The type of th e jolly slot is worked out during

parsing on th e basis of “th e lexical category assigned to th e jolly slot, th e tem

poral, morphologic and sem antic constraints im posed on th a t slot by other

word hypotheses, and the availability of such d a ta ” (ihid.).

If th e jolly is of type LO NG -O R-ESSEN TIA L, it m ust be found in the

lattice. Failure to find it will result in th e parse failing ju st as though it were

a content word which were missing.

If th e jolly is of type SH O RT-O R-IN ESSEN TIAL, it is ignored. T h a t is.

247
th e lattice is not searched in order to find it. However, it is necessary to assign

a short tim e period to the slot, ju st in case a jolly is present. If this time

period were not inserted, the correct p ath through the lattice would not be

tem porally coherent. To allow for a range of durations the tim e period is given

fuzzy boundaries.

If th e jolly is of type UNKNOW N, it is tre a ted much as though it were

of type SHORT-O R-IN ESSEN TIAL, b u t this is followed by a brief search of

the lattice to see if any jollies of greater duration th a n th e m axim um dummy

duration can be found in the lattice. This is done ju st in case a long jolly with

a good confidence score is present. If one is found, it is entered in the slot;

otherw ise the dummy is left in place.

Figure 11.6 shows a simplified DI, based on th e KS in Figure 11.5 and

th e sentence Da quale monte nasce il Tevere? ( “From which m ountain does

the T iber originate?” ). (The exam ple is taken from G iachin and R ullent 1988:

198.) T he DI shows nasce as ro o t, w ith monte and Tevere as slot fillers.

Monte has two slots, neither of which has yet been filled. T he SPEC slot

will eventually be filled w ith quale, while the JOLLY slot corresponding to da

m ay rem ain unfilled unless it is judged to be of type LO N G -O R-ESSEN TIA L.

Tevere also has a JOLLY slot b u t the jolly has already been classified as

‘m issing’. Notice th a t this does not necessarily m ean th a t it is absent from the

lattice, although th a t m ay be th e case. W hat it means is th a t th e jolly has

been judged to be superfluous to requirem ents.

S ta tistics

T he sequential SYNAPSIS parser was im plem ented in Com m on Lisp. It makes

use of around 150 KSs and has a 1011-word lexicon. No details of th e linguis

tic coverage are available, although the gram m ar is said to have a branching

factor of about 35. SYNAPSIS was tested on 150 lattices produced from nor

m ally intoned continuous speech recorded in an office environm ent. Overall,

248
Type : TO-HAVE-SOURCE
Header: NASCE
Left :MOUNT Right :RIVER

Type : MOUNT Type : RIVER

Header: MONTE Header: TEVERE
Left :JOLLY SPEC Right:none Left:JOLLY Right:none
(to be solved)^^^/ [missing]

(to be solved)

Figure 11.6: a simplified DI showing jolly slots

ab o u t 80% of the u tterances were analyzed correctly. A bout 75% of lattices

w ith missing jollies were analyzed correctly. This figure did not increase sig

nificantly as th e num ber of missing jollies per u tteran ce increased. T hus, the

SYNAPSIS parser m ay be judged to be a very successful lattice parser by

current standards.

1 1 .3 .6 T h e p a r a lle l p a r se r

A crucial factor in parsing spoken language is processing speed. T he sequential

version of SYNAPSIS took an average of ab o u t 40 seconds to parse sentences

in the test set. (T he average sentence length was 7-8 words). This is clearly

too long for m ost practical purposes. In response to th e need for b e tte r speed

results, the developers of SYNAPSIS im plem ented a parallel version of their

parser. W hile a full exposition of its detail would be in ap p ro p riate here, it is

w orth m entioning a few of its m ain features.

T he sequential parser is based on a processor which uses th e KSs to build

Dis. The parallel parser consists of n processors called distributed problem

solvers (D PSs), each w ith the full inferencing capabilities of th e sequential

parser. However, each D PS only has access to a subset of KSs. Thus, each

DPS can be viewed as th e expert on a sm all num ber of syntactico-sem antic

249
Figure 11.7: a single parse tree

constraints. In most cases it will be necessary for the experts to collaborate in

order to solve a parsing problem.

D istributing the knowledge base does not autom atically yield a speed-up. If

anything, the opposite could be expected since there is now a com m unications

overhead. To effect a speed-up it is also necessary to d istrib u te the tasks

in such a way th a t the DPSs are working concurrently. This is achieved by

breaking up the parse trees (i.e. the Dis) into one-level sub-trees. For example,

the tree in Figure 11.7 would be represented as a collection of sub-trees, as

shown in Figure 11.8. (T he representation used here is non-standard. Lines

connect lower dependents to higher heads. T he reason for employing this

graphical device is similar to th a t which m otivated th e use of non-standard

trees in the DLT project, namely the need to represent dependency stru ctu re

independently of word order. Each node in this tree represents a single word.

Left to right ordering is not significant.)

Since a parse tree (1)1) can now be distributed am ongst several D PS, it is

possible for different parts of it to be developed concurrently. For example,

one processor m ight know about KSs of type M OU N T while another knows

about KSs of type RIV ER. T he left and right branches of th e DI shown in

Figure 11.6 above could now be grown in parallel.

250
Figure 11.8: a distributed representation of th e sam e parse tree

T he parallel version of SYNAPSIS has been im plem ented on a pool of

Symbolics Lisp M achines com m unicating via E th ern et. T h e system has been

shown to work b ut the relatively slow E thernet is a m ajor hindrance to record

ing significant speed-ups. In fact, no parsing speeds for parallel SYNAPSIS

are reported in the literature. T he designers have signalled their intention to

im plem ent the parser on a Transputer-based d istrib u ted architecture.

This sketch of the parallel version of SYNAPSIS has necessarily been brief.

More details can be found in G iachin and R ullent (1989).

1 1 .4 S u m m a ry

SYNAPSIS is unique am ongst the parsers reported in this survey in address

ing the distinctive problems of parsing spoken, ra th e r th a n w ritten language.

Instead of startin g from an input w ith distinguished words it is necessary to

s ta rt from a mesh of alternative hypotheses which m ay not even include all

of the words uttered. SYNAPSIS uses a language m odel based on DO a t the

251
syntactic level and caseframes a t th e sem antic level. DG builds on th e n o

tion of lexical combination; caseframes build on the notion of lexical concept

combination. In order to bring together syntactic and sem antic c o n strain ts at

parse tim e, DG rules and caseframes are combined to form syntactico-sem anltic

knowledge sources. It is also possible to conceive of the two being b ro u g h t t o

gether elegantly in a unification DG. For example, a unification-based versio n

of Lexicase, w ith its b a tte ry of syntactic, sem antic, and case features womld

provide a theoretically m otivated base for a SY NAPSIS-type parser.

T he special requirem ents of speech parsing have led to the developm ent of

a parallel version of the SYNAPSIS parser. This also m arks SYNAPSIS o>ut

as unique in this survey of DG parsers.

I have not provided a formal PARS description of the SYNAPSIS parsim g

algorithm . PARS is designed for th e description of tex t parsers and womld

have to be extended significantly to do justice to SYNAPSIS. T he purpose of

expressing algorithm s in PARS is to facilitate com parison of different d e p e n

dency parsing algorithm s. SYNAPSIS is so different from th e other p a rse rs

th a t a PARS description would not be of much assistance. This difference is

underlined by the observation th a t although the other parsers differ in respeîct

of the order in which they construct parse trees, each individual parser is onily

capable of constructing a parse tree in one order for each gram m ar. If SYNAIP-

SIS were to be used to parse several different utterances of a test sentence, it

would m ost probably add branches to its parse tree in a different order eacch

time. This is because SYNAPSIS is strongly guided by acoustic confidenice

scores, as well as by the gram m ar.

In spite of the differences, an exam ination of SYNAPSIS is profitable iin

serving to illustrate some novel ways in which DGs can be applied in tlhe

solution of practical problems.

T h e m ain features of the serial SYNAPSIS parser are sum m arized in Tra

hie 11.1.

252
Table 11.1; main features of the SYNAPSIS dependency parser

Search origin bottom , then mixed

Search m anner best-first
Search order score-driven
N um ber of passes one
Search focus heads and dependents seek each other
Am biguity m anagem ent best-scoring parse only

253
C h ap ter 12

E lem en ts o f a ta x o n o m y o f
d ep en d en cy parsing

Let the teacher, or the m an of science who does not always fully
appreciate gram m ar, consider for a m om ent th e m ental processes
a boy is p u ttin g himself through when he parses a sentence, and
he will see th a t there is in intelligent and accurate parsing a tru e
discipline of the understanding. (Laurie 1893: 92)

In this chapter I exam ine a num ber of dependency parsing param eters to see

how they com pare w ith the corresponding param eters of PSG parsing. In so

doing, I outline the elements of a first general taxonom y of dependency parsers.

M y approach is driven by the results of th e preceding survey of existing

dependency parsers. The param eters I shall investigate are those I have used

in sum m arizing the properties of each parser surveyed, nam ely origin o f search

(Section 12.1), m anner o f search (Section 12.2), order o f search (Section 12.3),

number o f passes (Section 12.4), focus o f search (Section 12.5), and ambigu

ity management (Section 12.6). In addition, I shall exam ine th e role of the

adjacency constraint in dependency parsing (Section 12.7).

12.1 S ea rch o rig in

In PSG parsing, search can proceed bottom -up, top-dow n, or some m ixture of

the two. A t a coarse level, the sam e is tru e of dependency parsing. Table 12.1

records the origin (and direction) of search for each of th e parsers surveyed.

254
Table 12.1: origin of search—summary

D e p e n d e n c y P arser S e a r c h origin

Hays (bottom -up) bottom -up

Hays (top-down) top-down
Hellwig ( P l a i n ) bottom -up
Kielikone ( a d p ) bottom -up
DLT ( A T N ) top-down
DLT (probabilistic) bottom -up
Lexicase (S tarosta) top-down
Lexicase (Lindsey) unspecified
WG (Fraser) bottom -up
WG (Hudson) bottom -up
CSELT ( S y n a p s i s ) bottom , then mixed
Covington ( 1 & 2 ) bottom -up

Do th e fam iliar term s ‘botto m -u p ’ and ‘top-dow n’ have their usual m ean

ings when applied to dependency parsers?

A first answer m ust be ‘yes’. B ottom -up parsing starts from the words in a

string and uses a gram m ar to combine the words into constructions. In th e case

of PSG , the constructions are phrases; in th e case of DG, th e constructions are

head-dependent relata. Top-down parsing starts from th e rules in a gram m ar

and a tte m p ts to find realizations of structures generated from th e rules in the

string.

A second answer, however, m ust be ‘no’, th e term s do not m ean exactly

the sam e for dependency and constituency parsers. W hereas in PSG there

m ay be a tree of a rb itra ry depth betw een a gram m atical s ta rt symbol a t the

‘to p ’ and the word instances a t the ‘b o tto m ’, in DG this is not the case. The

s ta rt sym bol of a DG is a word or, a t worst, a word class. T here are no nodes

interm ediate betw een the ‘to p ’ (start) node and the ‘b o tto m ’ (word) node

a ttach ed to it. T he num ber of nodes in a dependency tree m ay not exceed the

num ber of words in the sentence whose stru ctu re the tree describes. W hereas

PSG trees can be arbitrarily deep (unless th e PSG is expressly constrained to

prevent this), DGs — in ju st the way indicated — are necessarily shallow.

255
In the three following subsections I shall examine these concepts in more

detail.

1 2 .1 .1 B o t t o m - u p d e p e n d e n c y p a r sin g

It is im m ediately noticeable th a t the m ajority of th e parsers listed in Table 12.1

search bottom -up, i.e. eight out of twelve, with one unspecified. This probably

reflects the general word-centred view adopted by dependency gram m arians,

A bottom -up PSG parser a tte m p ts to take some contiguous group of words

and replace them by a single phrase; a bottom -up DG parser a tte m p ts to take

a group of words (in m any cases, exactly two words), and replace th em by

whichever word is deemed to be th e head. Thus, bo th kinds of parser effect

a reduction. Since a DG parser can only effect reductions by discarding one

or m ore words while retaining a lexical head, there is a strict upper bound on

the num ber of reduction steps required (ignoring any requirem ents for back

tracking), i.e. n -1 , where n is the num ber of words in a sentence. No such

upper bound can be placed on a PSG parser, unless th e rules th e gram m ar

uses are restricted so th a t, for a rule of th e form a —> ^5, where a and /? are

non-term inals, \/3\ > |a |.

S h ift-red u ce b o tto m -u p d ep en d en cy parsing

A shift-reduce dependency parsing algorithm can be defined as follows:

1. Let G be a DG in G aifm an form at, except th a t in th e body of each

rule is replaced by the sym bol corresponding to th e head of th e rule.

2. Let S be an input string.

3. Shift a word from S onto a stack unless S is empty, in which case go to

step 5.

4. Check w hether one or more words a t th e top of th e stack exactly m atches

th e body of one of the rules in G. T here are two possible outcomes:

256
(a) they m atch, in which case pop the m atching words off the stack and

push the word which m atched the head of the rule back onto the

stack. R epeat step 4.

(b) they do not m atch, in which case repeat step 3.

5. If there is exactly one elem ent on the stack then succeed, otherwise fail.

This is essentially the algorithm im plem ented in Prolog in the file

s h i f t _ r e d u c e .p l in A ppendix A .3.

In cases where there are equivalent PSG and DG analyses of a sentence,

th e num ber of reductions required is identical for shift-reduce parsers of both

varieties.

PSG DG
V N V P [1] v(n,*,p)
N A N [2] n(a,*)
P P N [3] P(*,n)
[4] *(v)

PSG and DG analyses of the sentence Tall people sleep in long beds are shown

in Figure 12.1.

T he PSG shift-reduce parse trace is given below. W ord class assignments

are not shown. T he num ber of each rule used to effect a reduction is given in

square brackets. (‘Ü’ indicates the b o tto m of th e stack.)

□ A
□ AN
□ N [2]
□ N V
□ N VP
□ N V P A
□ N V P A N
□N V P N [2]
□ N VP [3]
□ V [1]

T he DG shift-reduce parse trace is given below.

257
A N V P A N
Tall people sleep in long beds
a n V p a n

Figure 12.1: PSG and DG analyses of the sentence Tall people sleep in long
beds

□ a
□ a n
□ n [2]
□ n V
□ n V p
□ n V p a
□ n V p a n
□ n V p n [2 ]
□ n V p [31
□ V [1]

In this case, the numbers of shift and reduce operations are identical for

PSG and DG systems. T he num ber of shift operations is fixed for all ver

sions of shift-reduce parsing, i.e. it is equal to th e num ber of words in the

sentence being parsed. T he sm allest num ber of reduce operations possible for

any sentence is also the same, in principle, for PSG and DG, nam ely 1. This

is because it is possible either to m ake all words in a sentence belong to a

single phrase, or to make all words in a sentence depend on a single head. T he

m axim um num ber of reduction operations is also the sam e for PSG parsing

and DG parsing, w ith one im p o rtan t exception which I shall describe shortly.

258
In PSG parsing, the num ber of reductions equals the num ber of phrases. T he

m axim um num ber of phrases for an arb itrary sentence is achieved w ith a bi

nary branching phrase stru ctu re tree. T he num ber of phrases in a binary

branching tree for an n-word sentence is n — 1. In DG parsing, th e num ber

of reductions equals the num ber of prim ary dependencies in th e sentence. (I

use the term primary dependencies to m ean dependencies which are found by

search, ra th e r th a n by derivation from existing dependencies, as in the case

of dependent-sharing.) T he m axim um num ber of prim ary dependencies — in

fact, th e required num ber of prim ary dependencies — in an n word sentence

is n — 1.

This equivalence excludes those PSG s which allow unit rew rite rules, i.e.

productions having the form

a ^

in which b o th a and jd are single non-term inal symbols. In this case, the

m axim um num ber of reductions required is not bounded, given an arb itrary

sentence and an arb itrary gram m ar. I know of no version of DG which would

not place an upper bound of n — 1 on the num ber of reductions.

In crem en tal b o tto m -u p d ep en d en cy parsing

T he shift-reduce dependency parser I have ju st described is a reasonably faith

ful DG version of a well-known PSG parsing algorithm . However, none of the

b o tto m -u p DG parsers described in th e preceding survey uses this kind of al

gorithm . T he shift-reduce dependency parser is required to w ait un til all of

th e dependents of some head are available in a contiguous block a t th e top of

th e stack before it can effect a reduction. All of the other botto m -u p depen

dency parsers I have described estabhsh dependency links betw een heads and

dependents as soon as b o th become available, and independently of any other

dependency relations involving the same head. This results in th e increm ental

259
building of dependency structures. This process is centred on relations ra th e r

th a n constructions.

I believe th a t the difference between shift-reduce dependency parsing and

increm ental bottom -up dependency parsing can be characterized in th e fol

lowing way. In shift-reduce parsing, words (or word class labels or feature

structures) are put on the stack and gram m ar rules are used to license reduc

tions. In increm ental parsing, sentence words are used to pick out rules headed

by these words and these rules are then p u t on the stack. A slightly more com

plex general rule is th en used to effect reductions. A first characterization of

th e Rule of R eduction is given below. T he rule has 2 clauses, as follows (a and

(3 are arb itrary strings of dependent symbols, including the em pty string):

T he R ule o f R ed u ction

1. If a rule of the form X(o:,Y,*,yd) is th e top elem ent of a stack and the next

elem ent is a rule of th e form Y(*), then pop the top two stack elem ents

and push a new elem ent of th e form X(a,*,/?) onto th e stack.

2. If a rule of the form X (*,o) is th e top elem ent of th e stack and the

next elem ent is a rule of the form Y(*,X,^d), th en pop the top two stack

elem ents and push a new elem ent of th e form Y(*,a,/d) onto th e stack.

If all words in the in put sentence have been read and th e only rule on th e stack

has the form X(*) and there is a rule of th e form *(X) in th e gram m ar then

succeed. O therw ise fail.

A trace of the bottom -up increm ental parse of the sentence Tall people

sleep in long beds, using the sam e gram m ar as before, is presented below.

Stack items are separated by means of the ‘|’ m arker. T h e bracketed numbers

indicate which clause of the Rule of R eduction has been applied.

260
□ a(*) tall
□ a(*) I n(a,*) people
□ n(*) [1]
□ n(*) Iv(n,*,p) sleep
□ v(*,p) [1]
O v(*,p) Ip(*,n) in
□ v(*,n) [2]
□ v(*,n) Ia(*) long
□ v(*,n) Ia(*) In(a,*) beds
□ v(*,n) In(*) [1]
□ v(*) [2]

T he sim ilarities w ith functional application in CG should be readily ap

parent. Clause [1] of the Rule of R eduction is the dependency correlate of CG

backw ard application and clause [2] of th e Rule of R eduction is th e dependency

correlate of CG forward application.

An im plem entation of an increm ental shift-reduce

dependency parser which makes use of the Rule of R eduction can be found

in incremental_shift_reduce.pl in A ppendix A.3.

In conclusion, DG provides a framework which is com patible w ith bo th

PSG -style shift-reduce parsing (in which the DG form alism provides an upper

bound on th e num ber of reductions which the PSG form alism does not) and

C G -style (weakly increm ental) shift-reduce parsing.

1 2 .1 .2 T o p -d o w n d e p e n d e n c y p a r sin g

T he less explored terrain of top-dow n dependency parsing offers several inter

esting divergences from PSG parsing.

T hree of the parsers in the survey of DG parsers are classified as top-

down parsers in Table 12.1. These each im plem ent top-dow n search in distinct

ways which I shall call deep top-down parsing, shallow top-down parsing, and

category-driven top-down parsing.

261
D eep top -d ow n parsing

The DLT ATN parser is an exam ple of a deep top-dow n dependency parser.

Parsing is successful if it is possible to traverse the m ain v e r b network, using

the words of the sentence. T he (simplified) m ain v e r b netw ork for English

would consist of a s ta rt state, a ju m p arc to the s u b j e c t network, a verb arc,

a jum p arc to the o b j e c t network, and a final state. T he s u b j e c t network

could involve a num ber of jum p arcs to other networks which could themselves

contain jum p arcs, and so on. T hus, it is possible for the parser to build

quite a deep search tree on the basis of th e netw ork before th e first word is

ever exam ined. W hen th a t word is exam ined, it has th e function of either

falsifying th e hypothesis developed during the preceding search, or allowing

the hypothesis to be developed further.

A bstracting away from the detail of th e ATN im plem entation of this search

m ethod, I shall try to show how it m ight work given a m ore conventional

Gaifman type DG. F irst though, I shall reconsider non-ATN top-dow n PSG

parsing. A top-dow n left to right PSG parser begins by selecting th e sta rt

symbol and expanding each successive left-m ost symbol un til a term inal is en

countered. E ith er this m atches th e first word of the sentence or th e hypothesis

has been falsified and another m ust be tried.

For exam ple, Figure 12.2 shows a PSG analysis of th e sentence A cat sleeps

on the computer. A top-dow n PSG parser would begin by selecting th e s ta rt

symbol (S) and seeing how it could be expanded (S —> NP V P). It selects the

left-m ost sym bol (N P) and finds an expansion for it (N P —> D et N). Once

again, th e left-m ost symbol is selected b u t this tim e it corresponds to a term i

nal. (For ease of exposition I ignore th e distinction betw een words and word

classes.) Now, for the first time, it is possible to establish contact betw een the

hypothesized stru c tu re and the actu al words of the sentence. A n exam ination

of th e first word in the sentence reveals th a t it is a determ iner, so th e hyp o th

esis may be extended w ith the expansion of th e next left-m ost symbol, and so

262
Det N V PP

Det

A cat sleeps on the computer

Figure 12.2: phrase stru ctu re of A cat sleeps on the computer

on.

If this process is used directly w ith a DG, problems are encountered. The

s ta rt sym bol (v) corresponds to a term inal (sleeps), b u t this word is not located

a t th e s ta rt of the sentence. T here are two possible courses of action here.

E ither th e parser can look for a rule to expand for the s ta rt symbol, or it can

search for the s ta rt symbol in the sentence. In this section I shall explore the

first course of action, and in my discussion of shallow top-dow n parsing I shall

explore th e other.

A ssum e th a t the gram m ar contains a rule

(84)

v(n, *, n)

T he left-m ost symbol can be selected. Like all symbols in a D G , this one m ust

identify a word. T hus, it is necessary to see w hether or not this m atches the

first word in the sentence. Since it does not, it is necessary to find a rule

headed by n, e.g. (85).

(85)

263
T he cat sleeps on th e com puter

Figure 12.3: dependency stru ctu re of A cat sleeps on the computer

n(det, *)

Once again, the left-m ost symbol m ust be com pared w ith th e first sentence

word. This tim e a m atch is found. It is still necessary to check w hether or not

‘d e t’ may occur w ithout left side dependents, before going any further. If it

can (it can), then it is necessary to try to find th e next left-m ost dependent.

This involves selecting the left-m ost of d e t’s right side dependents (if it has

any) and then expanding leftward once again, testing each expansion against

the first headless word in the sentence.

Figure 12.3 shows the dependency stru c tu re of th e sentence. Before word

1 can be parsed, it is necessary to hypothesize word 3 and word 2 (although

their actual position in the sentence as words 3 and 2 is not know n until p ars

ing successfully completes). Since this parsing m ethod builds a stru ctu re of

a rb itra ry depth before it finds a sentence word, I call it deep top-down parsing.

This m ethod of dependency parsing has not previously been described in th e

literature, although it is closely related to top-dow n PSG parsing. U nfortu

nately it carries an overhead not found in top-dow n PSG parsers, namely the

necessity to check each left-m ost symbol against th e first sentence word after

every expansion.

A right to left variety of deep top-dow n dependency parser could also be

defined.

264
It is possible for a deep top-dow n parser to enter a loop from which it can

not escape. T he following rules illu strate this, assum ing th a t a right to left

deep top-dow n parser is being used.

( 86 )

a p(*,n)
b n(*,p)
c n(*)

W hen searching for an ‘n ’, the first ‘n ’-headed rule the parser encounters tells it

to hypothesize a ‘p ’ (86b). In order to find a ‘p ’ it is necessary to hypothesize

an ‘n ’ (86a). So th e loop is entered. Since the m axim um num ber of heads

possible in a dependency stru c tu re is equal to the num ber of words in the

sentence minus one, th e length of the hypothesized p a th may never exceed

this num ber. This test can be used to term inate fruitless searches, w hether

caused by looping or some other reason.

Shallow top -d ow n parsing

All top-dow n PSG parsing is ‘deep’ in th e sense I have indicated. However,

as I shall show in this section, it is possible to define a ‘shallow’ top-dow n

dependency parser.

Shallow top-dow n parsing also begins by selecting th e s ta rt symbol (i.e.

the root symbol). Assume th a t the s ta rt symbol is v and once again the

sentence to be parsed is A cat sleeps on the computer. S tartin g from th e left,

the sentence is scanned in order to find a v. W hen sleeps is reached there is a

m atch. This word becomes the hypothesized sentence root. Now the gram m ar

is searched for a rule headed by v. If one is found (e.g. v(n, *, n)), th e left-m ost

dependent is selected and the p a rt of the sentence prior to sleeps is searched.

This process continues recursively until th e first word is found and there is no

more left context to search. At this stage, th e most deeply em bedded right

context is selected and searched, once more from the left. W hen all words

in th a t right context are accounted for, control passes back to th e next m ost

265
deeply em bedded process which has a right context to search. In this way all

of th e words to the left of the root can be parsed. T h e same process can now

begin for the ro o t’s right context. Parsing succeeds when heads have been

found for all the words in the left and right contexts of th e root (and th e left

and right contexts of all the ro o t’s subordinates).

A positive feature of this parser is th a t it never makes an hypothesis w ithout

checking im m ediately th a t it is a t least lexically plausible. In this way a certain

am ount of spurious structure-building can be avoided.

T he basic operation of the parser is simple: call th e parsing procedure

d iv id e - c o n q u e r w ith inputs S and W, where S is a symbol and W is a word

list. Initially, S is the root symbol. For descriptive sim plicity I assum e here

th a t no word m ay have more th a n one preceding and one following dependent.

(T his is not a PARS description.)

P R O C E D U R E divide-conquer(S,W )

IF S is in W

T H E N call the string to the left of S W ';

call the string to th e right of S W";

search the gram m ar for left side (L) and right side (R) dependents

for S ;

if L exists, call divide-conquer(L,W ');

if R exists, call divide-conquer(R ,W ")

E L S E fail.

This description is intended to convey a basic sense of how shallow top-dow n

parsing works by recursively calling the sam e procedure w ith a sh o rter word

string to search in each call. For obvious reasons I call shallow top-dow n

dependency parsing ‘divide and conquer’ parsing. T he above description om its

a num ber of details which are necessary to th e functioning of th e parser. In

particular, it fails to describe how th e algorithm works when confronted w ith

266
rules allowing m ore th an one dependent on each side of th e head. A som ewhat

m ore complex algorithm is required to deal w ith this. It functions, when more

th a n one dependent is hypothesized in the same string, by successively applying

th e basic divide-conquer procedure, w ith each dependent and the p art of the

string still unaccounted for serving as inputs on each procedure call. T h e parse

succeeds if each dependent accounts for different p arts of th e string, and all of

th e string is accounted for. A Prolog im plem entation of th e full algorithm can

be found in th e file d iv id e _ c o n q u e r.p l in A ppendix A.3,

Suppose th a t it takes some constant am ount of tim e k to check a word to

see w hether or not it is th e word being sought. In the best case, th e word

being sought will always be a t the s ta rt of the string, so th e tim e taken to find

each word will be exactly k. T he tim e taken to parse an n-w ord sentence w ith

an unam biguous gram m ar is therefore in the order of n in th e best case. In

the w orst case, the word being sought will always be a t th e end of th e search

string. Thus, for an n word sentence, it will take k n to find th e sentence root.

T he next tim e th e divide-conquer procedure is called there will be n -1 words

to search so this will take tim e k { n - l) . In th e worst case, th e tim e taken to

parse a sentence, given an unam biguous gram m ar will be:

kn -j- k(^n — 1) 4- k(^n — 2) -j- ■■■T 2A: T k

T hus, divide and conquer parsing w ith an unam biguous gram m ar takes, a t

w orst, tim e in th e order of n^.

Now assum e th a t the gram m ar is ambiguous. In the worst case, any word

in a string could be th e root of th a t string. Thus, the tim e required to find

every reading for the sentence is proportional to:

kn X k{n — 1) x k{n — 2) x - • • x 2k x k

T h e tim e required to find every parse of an n word sentence w ith an ambiguous

gram m ar is, in th e worst case, proportional to n\. Presum ably this figure can

be im proved by th e use of a chart.

267
The divide and conquer variety of shallow top-down dependency parsing

has not previously been described in the literature, unless this is what Hays

intended by his top-down parser. As I noted earlier, Hays provides only an

outline sketch of his top-down parser and it is not clear if he ever implemented

it.

The attraction of the divide and conquer variety of parser lies not in the

serial version of the algorithm, but rather in the parallel version. W hat the

parser does is to take a string, divide it in two, decide what to search for in

each half, and then proceed to repeat this process for each half. Once a string

has been halved, search in one half can take place independently of search in

the other. It is necessary for both searches to succeed in order for the original

search to succeed, but otherwise there is no connection between the two. Thus,

every time a process divides a string it can activate two new processes, one for

each substring. The original process simply has to wait to receive the root of

the subtrees describing its left and right contexts, in which case it can succeed

and inform the process which created it. Alternatively, one of the processes

it spawned will fail to find what it was looking for, in which case the original

process will die.

Consider the case of parallel divide and conquer parsing with an unambigu

ous grammar. Each newly created process is assigned to a processor dynami

cally. The best and worst case parsing times remain the same. In the best case

the word being searched for is always found first (On). In the worst case the
word being searched for is always at the end of the string (On^). However, the

average time ought to be cut significantly because of the possibility of doing

at least some search in parallel.

A number of interesting options exist for coping with ambiguity. In prin

ciple, it ought to be possible to assign to each word in the sentence as many

processes as there are different readings for th a t word. Each process would then

be required to find all possible dependents for a word, given a dependency rule.

268
Thus, all possible dependency trees of depth one would be found concurrently.

In this approach, time would be consumed mostly in inter-process communi

cation, rather than in search. Much more work needs to be devoted to this

problem before any results can be reported.

Shallow top-down dependency parsing, such as divide and conquer parsing,

in its capacity to divide a string into two substrings, each with a separate

‘things to look out for’ list, appears to have no counterpart in PSG parsing.^

C ategory-driven top-down parsing

In describing their Lexicase parser, Starosta and Nomura make the following

claim:

Lexicase parsing is bottom -up in th e sense th a t it be

gins with individual words rather than some ‘ro o t’ node S
(Starosta and Nomura 1986: 132).

It is true th a t their parser does not proceed by trying to expand the

sentence root. However, it does try to expand nodes which have been

designated a priori. For example, the first step of their algorithm reads:

“Link each preposition by contextual features with an accessible N, V, or P ”

(Starosta and Nomura 1986: 131). W hat is this if not an attem p t to build all

of the prepositional phrases top-down?

In recognition of the fact th at this is not standard top-down parsing, and

certainly not standard bottom -up parsing, I call it category-driven top-down

dependency parsing. It works by effecting one-level expansions to designated

categories in a designated order, not necessarily starting with the root symbol.

1 2 .1 .3 M ix e d to p -d o w n an d b o tto m -u p d e p e n d e n c y
p a rsin g

The CSELT parser implements a mixed top-down and bottom -up strategy. It

begins by selecting a word in the sentence, not on the basis of some distin-
^Notice that this kind of parsing has got a lot of similarities to old-fashioned schoolroom
parsing: ‘first find the main verb, then find its subject and its objects, then.., ’

269
guished s ta rt symbol in the gram m ar b u t rath er, on th e basis of th e recognition

confidence score associated with the word. T he gram m ar is then searched to

find a rule headed by this kind of word. W hen a rule is found, it is associated

w ith the word in the lattice. T he rule is used to search top-dow n for depen

dents for the word. W hen dependents are found, th e cycle repeats itself for

each of the dependents of the original word.

T he attra ctio n of dependency gram m ar for mixed top-dow n and bottom -

up parsing is th a t the distance between ‘to p ’ and ‘b o tto m ’ is so small th a t

opposite search approaches can be interleaved very sim ply and efficiently.

W h at each of these top-down, bottom -up, and mixed dependency parsing

m ethods illustrates is the proxim ity of ‘to p ’ and ‘b o tto m ’ in dependency struc

tures. T h e s ta rt symbol (and every other symbol in th e dependency tree) is

also a sym bol in the string. Here then is p o ten tial cause for confusion, even

— as we have ju st seen — am ongst designers of dependency parsers. And

here, too, is som ething which clearly distinguishes dependency parsers from

PSG parsers. It is this proxim ity of ‘to p ’ and b o tto m ’ which makes shallow

top-dow n dependency parsing possible. It m ay be possible to im plem ent a

shallow top-dow n PSG parser, for example, one which uses an X or lexicalized

gram m ar to identify the sentence root and each of its subordinates. However,

it is clearly impossible w ith a conventional C FPSG .

D ependency parsers are tied to th e words of th e sentence. B ut, as th e

deep top-dow n dependency parser dem onstrates, it is possible to ignore this

constraint and parse — a t least for a while — on th e basis of hypothesized,

ra th e r th a n actual words. However, unlike some top-dow n PSG parsers, a deep

top-dow n dependency parser may never loop indefinitely since every search

p a th which contains more hypothesized symbols th a n th ere are actu al symbols

in the sentence, m ust be term inated.

T he principal differences between th e origin of search for conventional PSG

parsers and dependency parsers may be sum m arized as follows:

270
1. T he search p a th between the sta rt symbol in a PSG and th e string to be

parsed m ay be arbitrarily long (unless an additional constraint on the

gram m ar prevents this). In a DG, the start symbol is an element o f the

string. T he only search which has to be done is th a t required to associate

a specific instance of a symbol w ith a general reference to th a t symbol

in the gram m ar.

2. T he only exception to the above generalization obtains in the case of

deep top-dow n dependency parsers which m ay construct longer search

p ath s involving hypothesized words. T he num ber of hypothesized words

is, however, bounded by the num ber of words in th e input string.

3. T he co-presence of bottom -up and top-dow n constraints in actual words,

allows dependency parsing search to a lte rn ate simply and usefully be

tween proceeding top-dow n and proceeding bottom -up.

12.2 S ea rch m a n n er

There seem to be no significant differences betw een m anner of search (depth-

first versus breadth-first) for PSG parsers and m anner of search for dependency

parsers. E ither a parser extends one search p ath as far as possible (depth-first)

or it extends all possible search paths in parallel (breadth-first). T he CSELT

parser im plem ents best-first search, a variety of depth-first search in which

the best-scoring option is selected a t each choice point. This too has an exact

correlate in PSG parsing. It is to be expected th a t all other m anners of search

ing problem spaces can also be employed in dependency parsers, e.g. beam

search which takes a m iddle line between depth-first and breadth-first search,

selecting a m axim um of n paths (the ‘beam w idth’) to develop in parallel.

Table 12.2 sum m arizes the m anner of search properties of th e dependency

parsers surveyed.

271
Table 12.2; manner of search—summary

D e p e n d e n c y P arser S earch m a n n e r

Hays (bottom-up) depth-first

Hays (top-down) unspecified
Hellwig ( P l a i n ) depth-first
Kielikone ( a d p ) depth-first
DLT ( ATN) depth-first
DLT (probabilistic) breadth-first
Lexicase (Starosta) breadth-first
Lexicase (Lindsey) breadth-first
WG (Fraser) depth-first
WG (Hudson) breadth-first
CSELT ( S y n a p s i s ) best-first
Covington (1 & 2) depth-first

12.3 Search order

There is a limit to the number of possible search orders for an n word sentence.

(By ‘search order’ I mean the order in which words are considered for inclusion

in a sentence structure.) In practice, most parsers implement either left to

right or right to left search orders. In the survey of dependency parsers —

summarized in Table 12.3 — eight out of twelve parsers operate left to right,

with three search orders unspecified. None operates right to left, but I can see

no reason in principle why any of these parsers should not be able to search
in this way with equal success.

An obvious attraction of searching from left to right is th at this is usu

ally the order in which sentences are presented to the parser and it is not

necessary to wait until the last word has been typed or spoken before pars

ing can begin. There is particular interest in left to right parsing when the

parser not only considers the words in the order in which they appear in the

sentence, but also adds them to the developing syntactic structure in (more
or less) th a t order, thus allowing the sentence to be interpreted incremen

tally left to right. Interest in incremental interpretation is shared by cogni-

272
Table 12.3: order of search—summary

D e p e n d e n c y P a r s e r S e a r c h o r d e r

Hays (bottom -up) left to right

Hays (top-dow n) unspecified
Hellwig ( P l a i n ) left to right
Kielikone ( a d p ) left to right
DLT ( A T N ) left to right
DLT (probabilistic) unspecified, u n im p o rtan t
Lexicase (S tarosta) left to right
Lexicase (Lindsey) unspecified
W G (Fraser) left to right
W G (Hudson) left to right
CSELT ( S y n a p s i s ) score-driven
C ovington ( 1 & 2 ) left to right

tive scientists who believe this to be th e way th a t people process sentences

(e.g. M arslen-W ilson and Tyler 1980) and com putational linguists who w ant

to build real tim e speech or language understanding systems.

A stran d of research in CG instigated by M ark Steedm an has investigated

th e possibility of com bining categories using logical devices called combinators

(C urry and Feys 1958; T urner 1979). This variety of CG is known as Com bi

n ato ry C ategorial G ram m ar (C C G ) (Steedm an 1987). An interesting feature

of com binators is th a t the order in which they apply is unim p o rtan t; th e re

sult is always th e same. This (along w ith th e rules of functional com position

and type raising) leads to the possibility of producing a stric t left to right

word-by-word in terp retatio n of any sentence (H addock 1987; Steedm an 1990).

U nfortunately, since com binators m ay apply in any order, they m ay apply

in every order. This leads to the so-called spurious ambiguity problem (also

known as th e derivational equivalence problem ): weighed against th e advan

tage of being able to in terp ret a sentence left to right increm entally is the

disadvantage of having to deal w ith (i.e. fend off) all of th e o th er possible

ways of arriving a t the sam e conclusion. T hus, m ost effort in th e develop

m ent of CCG parsers has been devoted tow ards trying to solve th e spurious

273
ambiguity problem (Hepple 1987). Different proposed solutions include:

1. Inserting only one of each set of semantically equivalent analyses in

a chart (Pareschi and Steedman 1987). This carries an equivalence-

checking overhead.

2. Only computing normal form derivations (Hepple and Morrill 1989).

This carries a normal form checking overhead.

3. Compiling a left-branching grammar out of a CCG (Bouma 1989). This

carries an initial compilation overhead, and possibly increases the size of

the grammar.

DGs allow what may be termed ‘weak incremental interpretation’, by which

I mean the following: as soon as two words which bear a direct dependency

relation to each other become available in a sentence (i.e. as soon as the second

word is read), the words can be related and accordingly interpreted. Thus, a

subject can be interpreted as a subject and its referent can be interpreted as

ACTOR, or whatever, as soon as the verb is encountered. There is no need

to wait for the construction of a VP or anything else before interpretation can

take place.^

All of the surveyed DG parsers which operate left to right with a single

pass, support incremental interpretation in the weak sense defined above.

T he CSELT SYNAPSIS and DLT ATN parsers embody unusual search

orders. The CSELT parser always selects the highest-scoring word to process

next, regardless of its position. The probabilistic DLT parser enters edges in a

graph and then tries to navigate through the graph. There is no necessity for

the edges to be entered in any specific order, and it is easy to imagine edges

being added for all words in parallel.

O rder of search options appear to be generally the same as for conven

tional PSGs, with most parsers opting for a left to right approach in practice.
^Except in the case of shift-reduce dependency parsers of the sort shown in
s h ift_ r e d u c e .p l in Appendix A.3.

274
S tarosta and N om ura suggest th a t the choice of search order should be guided

by the prevailing direction of dependencies in the language to be parsed.

[The Lexicase parser] scans from left to right or vice versa, de

pending on w hether the language is verb-initial, verb medial, or

verb final, but in fact it is a m echanism which works from head

to dependent ra th e r th a n prim arily from one end to th e other.

(S tarosta and N om ura 1986: 132)

O rder of search is not crucial to th e correctness of parses produced b u t it may

have a significant effect on parsing efficiency. This also depends on th e search

focus of the parser. A left to right parser in which heads seek dependents

would have to read up to the final word of a sentence in which all dependents

precede their heads before it could build any structure. A left to right parser in

which dependents seek heads would build alm ost all stru ctu re before reaching

the final word. A parser in which heads and dependents seek each other would

not be sensitive to variation in the order of search.

12 .4 N u m b e r o f p a sse s

T he num ber of passes m ade by parsers in the survey, by which is m eant the

num ber of times the read head of each parser scans a sentence during the parse,

is sum m arized in Table 12.4.

Nine of the parsers make a single pass through the sentence. Recall th a t

confining the num ber of passes to one is a prerequisite for increm ental in ter

pretation.

T he parsers of Hellwig, Lindsey, and S taro sta and N om ura all require more

th a n one, and possibly very m any passes. S taro sta and N om ura’s parser is

particularly profligate, since it requires a t least eight passes on each placeholder

expansion cycle and there m ay be m any such cycles. In general, increasing the

num ber of passes increases the inefficiency of a parser (since th e sam e symbols

have to be checked m any tim es) and is best avoided.

275
Table 12.4: number of passes—summary

D e p e n d e n c y P a r s e r N UMBER 0 F passes

Hays (bottom -up) one

Hays (top-down) one
Hellwig ( P l a i n ) a t least two
Kielikone ( a d p ) one
DLT ( A T N ) one
DLT (probabilistic) one
Lexicase (S tarosta) a t least eight
Lexicase (Lindsey) m ulti-pass
W G (Fraser) one
W G (Hudson) one
CSELT ( S y n a p s i s ) one
Covington (1 & 2) one

In respect of possibilities and consequences, varying th e num ber of passes

of a DG parser appears to be identical to varying th e num ber of passes of a

PSG parser, except where this interacts w ith certain search focus variables, as

the next section will explain.

12.5 S earch fo cu s

So far, th e m ain difference noted between DG parsing and PSG parsing is in

the n a tu re of the top-dow n/bottom -up distinction. T his section introduces

another m ajor difference which I have chosen to discuss under th e heading

‘search focus’. A discussion of PSG parsers would not contain such a section

because it is not generally recognized to be of significance for them.^

T he basic operation in DG parsing is th e establishing of binary dependency

relations between words. Suppose th a t X and Y are two words; there are a

num ber of ways in which they m ight be considered as candidates to be related

by dependency. These differences depend upon w hat I shall call th e ‘focus of

search’. T he parsers surveyed identify eight different foci of search. These are

sum m arized in Table 12.5.

^HPSG parsing offers the exception to this generalization.

276
Table 12.5: focus of search—summary

D e p e n d e n c y P arser S e a r c h fo cus

Hays (bottom -up) pair-based

Hays (top-dow n) heads seek dependents
Hellwig ( P l a i n ) dependents seek heads
Kielikone ( a d p ) heads seek dependents
DLT ( AT N) netw ork navigation
DLT (probabilistic) heads and dependents seek
each other sim ultaneously
Lexicase (S tarosta) heads seek dependents
Lexicase (Lindsey) heads seek dependents
W G (Fraser) heads seek dependents;
then dependents seek heads
W G (Hudson) heads seek dependents;
then dependents seek heads
CSELT ( S y n a p s i s ) heads and dependents seek each other
Covington (1 & 2) dependents seek heads;
then heads seek dependents

1 2 .5 .1 N e tw o r k n a v ig a tio n

In netw ork navigation parsers, search is focussed on finding an appropriate

next token in the sentence to allow a transition network arc to be traversed.

N etw ork navigation parsers are of m arginal interest in this context since they

focus search on a d a ta stru ctu re in th e gram m ar-parser, rath er th a n on the

words of the sentence being parsed. T he only exam ple of a netw ork navigation

parser in the survey is the DLT ATN parser.

1 2 .5 .2 P a ir s e le c tio n

P a ir selection parsers operate by selecting two words in th e sentence to be

parsed and consulting a look-up table to find out w hether or not a pair of words

of th e chosen types m ay contract a dependency relationship. H ays’ bottom -

up parser is pair-based. He defined the two m ajor operations required in his

parser to be ‘pair selection’ and ‘agreem ent testin g ’. P air selection involved

selecting an adjacent pair of words. Agreem ent testing involved looking up a

277
4000 X 4000 m atrix to find out whether or not the words could be linked and,

if so, which was the head and which was the dependent.

The focus of search is thus a pair of words. As we shall see, all of the other

parsers focus search in a single word.

1 2 .5 .3 H e a d s se ek d e p e n d e n ts

Dependent-seeking parsers (Hays’ top-down parser, the Kielikone parser, the

Lexicase parsers of Starosta and Nomura, and Lindsay, my Divide and Conquer

parser) always search for dependents for the current word. In the course of

searching for a dependent (A) for the current word (B), the word which, in

reality, should be the current word’s head (C) may be tested to see if it can be

a dependent of the current word. The test will fail and search will move on to

consider another word. The inverse dependency relationship will not be tested

until word C becomes the current word, a t which point the original word B

will be found as a dependent for C.

Notice th a t this approach to search is not tied to either top-down or

bottom -up processing, as the surveyed systems illustrate. Starosta and No

m ura’s parser is a category-driven top-down parser; the parsers of Hays and

myself operate in a shallow top-down fashion; the Kielikone parser operates

bottom -up.

As far as I can ascertain, the same strategy is embodied in Proudian and

Pollard’s top-down HPSG parser.

In HPSG it is the head constituent of a rule which carries th e sub

categorization information needed to build the other constituents
of the rule. Thus parsing proceeds head first through the phrase
structure of a sentence, rather th an left to right through the sen
tence string. (Proudian and Pollard 1985: 168-9)

1 2 .5 .4 D e p e n d e n ts seek h ea d s

Hellwig’s parser illustrates the fact th a t a diam etrically opposite search focus

also works. In his parser, all search is directed towards finding a head for the

278
current word. Notice, however, th a t in his system words do not subcategorize
for their heads. Rather, it is necessary to go and look in the subcategorization

frames (slots) of other words in order to see if the current word can depend on

a word (i.e. can fill another word’s slot).

12 .5 .5 H ea d s seek d e p e n d e n ts or d e p e n d e n ts seek
h ea d s

As we have seen, the SYNAPSIS lattice parser alternates between top-down

and bottom -up processing, according to the current state of the parse and

the lattice. It also alternates between searching for dependents (VERIFY and

M ERGE operations) and searching for heads (the PREDICTION operator).

The exact progression from one search focus to the other can not be defined a

priori since this depends on the recognition confidence scores in the lattice.

12 .5 .6 H ea d s seek d e p e n d e n ts an d d e p e n d e n ts seek
h ead s

The DLT probabilistic parser works by searching an annotated corpus for every

occurrence of each word in the sentence. A record is made of all of the upward

and downward dependency relations in which each word is found to partic

ipate. These relations then serve as templates of relations into which each

sentence word could possibly enter. Some pairs of templates will be inverse

copies of each other, and these select each other during a process analogous to

unification. Thus, all words search for all of their heads and dependents, and

they do so — a t least in principle — simultaneously.

1 2 .5 .7 H ea d s seek d e p e n d e n ts th e n d e p e n d e n ts seek
h ea d s

The WG parsers w ritten by Hudson and myself begin by searching for depen

dents for the current word. Once all available (i.e. adjacent) dependents have

been found, the focus of search shifts, and a head is sought for the current

279
word. T he insight embodied in this strategy is th a t, under norm al circum

stances in a relatively fixed word order language like English, the head of a

word does not intervene between th a t word and its dependents whereas the

dependents m ay intervene between the word and its head.

T he rationale for changing the focus of search for the current word is th a t

it allows the parser to construct as much stru ctu re involving the current word

as could possibly be constructed, given w hat has been processed so far. In

fact, it makes it possible to build stru ctu re increm entally in a single linear

pass through the sentence. This is not possible w ith either of the strategies of

searching for dependents only or heads only.

T he parsers which search for dependents only are H ays’ top-dow n parser,

th e Kielikone parser, the Lexicase parsers of S taro sta and N om ura, and Lind

say, and my Divide and Conquer parser. I have previously described Hays’

top-dow n parser and my Divide and Conquer parser as single pass parsers, but

this is slightly misleading since th e single pass tracks not from left to right,

b u t from root to leaves of the dependency tree. This point was also made

by Proudian and Pollard (1985) and quoted above. I have also described the

Kielikone parser as a single pass system b u t this too disguises some im portant

details. W henever a dependent can not be found for the current word, search

suspends (the currently active schema is pushed on th e PEN D IN G stack) and

an o th er word becomes current. Thus, while words enter the parser one a t a

tim e from the left and there is never any a tte m p t to perform th e same op

eration on the sam e word more th an once, words do not become current in

strict linear order from left to right through th e sentence. T he same word

can becom e current for several non-consecutive periods of time. W ithout the

ability to suspend processing of the cu rren t word, th e Kielikone parser would

not be able to parse m ost sentences. B oth of th e Lexicase parsers m ake m any

passes through the sentence. T h e m otivations and effect are much th e sam e as

for th e Kielikone parser, although th e Kielikone parser achieves its goal w ith

280
much greater efficiency.

Only Hellwig’s parser searches for a head for the current word w ithout

searching for dependents. Once again, experience shows th a t this strategy

will not work for a single pass parser. Hellwig’s parser makes m ultiple passes

through a sentence.

T he WG parsers stand in stark contrast to these parsers. By searching

first for dependents and then for heads for each word, they are able to parse

in a single linear pass from the beginning to th e end of the sentence. Once

a word ceases to be th e current word, it will never become th e current word

again. Thus, the strategy of seeking dependents and then seeking heads for

the current word facilitates weak increm ental processing in terp retatio n .

1 2 .5 .8 D e p e n d e n t s se e k h e a d s t h e n h e a d s s e e k d e p e n
d e n ts

A similar approach is adopted in Covington’s parsers, except th a t they search

for a head for the current word and then for its dependents. I have shown how

this strategy, while being perfectly adequate in a parser w ith no adjacency

constraint, fails to work when an adjacency constraint is employed. Covington

agrees w ith this analysis and now advocates searching for dependents before

searching for heads (Covington 1990b).

1 2.6 A m b ig u ity m a n a g e m e n t

T he ways in which the surveyed parsers m anage am biguity is sum m arized in

Table 12.6.

This thesis provides descriptions of a dozen dependency parsers, introduces

some new ones and m entions quite a few m ore in passing. Clearly, a significant

am ount of effort has been and is being directed tow ards extending w hat is

known about dependency parsing. However, very little of this effort has yet

gone towards developing techniques for m anaging am biguity in dependency

parsing.

281
Table 12.6: ambiguity management—summary

D e p e n d e n c y P a r s e r A m b ig u it y m a n a g e m e n t

Hays (bottom -up) first parse only (heuristics guide search)

Hays (top-down) unspecified
Hellwig ( P l a i n ) W EST (‘phrases’ m ay be discontinuous)
Kielikone ( a d p ) chronological backtracking
(heuristics guide search)
DLT ( A T N ) first parse only
DLT (probabilistic) highest-scoring parse selected
Lexicase (S tarosta) packing/ unpacking
Lexicase (Lindsey) packing/unpacking
WG (Fraser) chronological backtracking
(early identification of failure)
WG (H udson) all trees constructed in parallel
CSELT ( S y n a p s i s ) best scoring parse only
Covington ( 1 &: 2 ) chronological backtracking

Some inform ation is available on am biguity m anagem ent for eleven of the

parsers surveyed. O f these, four o u tp u t a t m ost one parse tree, regardless of

how m any possible analyses there are for th e sentence being parsed. T he DLT

ATN parser either finds an analysis or fails. It can not undo any incorrect

choices which m ay have led to a dead end in th e parsing of an otherw ise ac

ceptable sentence. H ays’ b o tto m -u p parser also delivers a t best a first parse,

b u t it makes use of some sim ple heuristics in an a tte m p t to m ake th e best

choices a t each choice point. B oth of th e o th er system s which deliver a t m ost

one parse have the capability to deliver a larger num ber. In fact they m ay

build all or m ost of any possible alternative parse trees. T h e DLT probabilis

tic dependency parser selects th e parse which has th e best global score, which

is some function of th e corpus-derived ‘likelihoods’ of all of its com ponent de

pendencies. T he SYNAPSIS lattice parser delivers th e parse which is ‘b e st’ in

respect of its global score, which is some function of the recognition confidence

scores of its com ponent words.

T he Lexicase parsers em body a novel approach to am biguity m anagem ent.

In slightly different ways they b o th package up different readings for a word in

282
term s of a ‘‘placeholder’ or ‘m aster e n try ’ which contains only the intersection of

all of th e different readings. (Since the gram m ar is fully lexicalized th ere is no

form al difference betw een lexical and syntactic ambiguity.) As much stru c tu re

as possible is built on th e basis of the partially specified placeholders /m a s te r

entries. O n each successive cycle, placeholders/ m aster entries are unpacked to

form disjoint structures which then re-enter th e parsing-placeholder expansion

cycle independently. T h e rationale for this process is th a t as much common

stru c tu re as possible should be build in a generic and underspecified parse tree

before it is split into some num ber of disjoint more specific structures. This

calls for m ultiple parser passes, bu t it is supposed to deliver all readings for

a sentence, so this m ay be tolerable. U nfortunately, no published examples

are available of this am biguity m anagem ent strategy in operation. I have been

unable to re-create it to my satisfaction.

Four parsers use chronological backtracking to undo mistakes and, if re

quired, to generate all possible parses. B oth of Covington’s parsers m ake use

of Prolog’s backtracking facility. T he K iehkone parser uses heuristics to guide

search so th a t backtracking on th e way to a first parse is minimized. Of course,

if all parses are required, th e benefit of th e heuristics will be lost. My B ond

ing parser also uses backtracking to undo m istakes and to generate m ultiple

parses. It uses heuristics, not to guide choice in structure-building, b u t to spot

doom ed p a rtia l parses and so force backtracking as early as possible, thereby

c u ttin g down on th e am ount of effort devoted to developing fruitless paths.

All of these backtracking systems work, b u t they are far from th e s ta te of

th e a rt in am biguity m anagem ent for PSG parsing.

H udson’s parser builds all possible parse trees in parallel. Again, this works,

b u t it is not a viable engineering solution since th e same sub-structures can be

built m any tim es over in th e course of a parse.

Hellwig’s dependency W EST parser has the only system for m anaging am

biguity in this survey which could form the basis of an efficient solution. W EST

283
parsing is known to be an effective way of avoiding duplication of effort in

finding all possible parses for some sentence. W FST s have trad itio n ally been

thought of as graphs in which edges span contiguous phrases. Hellwig offers

a solution to the problem of how to represent discontinuous collections of de

pendents in a table. However, there is currently no known solution to the

problem of how to represent overlapping collections of dependents — of the

sort introduced in shared dependent analyses — in a table.

As m entioned in C hapter 2, Hays offers a brief schem atic description of a

recognition algorithm based on a W FST (Hays 1964: 516-17). A Prolog recon

struction of th a t algorithm can be found in h a y s _ x e c o g n iz e r.p l in A ppendix

A.3. A parser based on the same principles of W FST usage to minimise search

can be found in h a y s _ p a r s e r .p l in A ppendix A.3.

W FST parsers offer a considerable efficiency im provem ent on m ost parsers

which do not check a d a ta stru ctu re of interm ediate results before searching.

However, even greater efficiency can result if a table is used to record current

hypotheses as well as well-formed sub-strings. Such a system is usually known

as an active chart parser (often abbreviated to ‘ch art p arser’). T he same

hypothesis m ay be relevant in several different analyses of th e sam e substring.

By recording th e hypothesis only once, effort can be saved much sooner th an in

a W FST in which only com plete substrings (the result of chains of hypotheses)

are entered. T he classic reference on chart parsing is K ay (1986).

W h at does a hypothesis look like in a stan d ard PSG ch art parser? Suppose

th a t ‘S —> NP V P ’ is a rule of th e gram m ar. T he following hypotheses m ay be

recorded in a chart.
(a) S .NP VP
(b) S N P .VP
(c) S ^ NP VP.

H ypothesis (a) indicates th a t a sentence (S) consisting of an noun phrase (N P)

followed by a verb phrase (V P) has been hypothesized, b u t no evidence has yet

been found to support it. H ypothesis (b) is similar, except th a t th e movement

284
of th e dot in the right hand side of the rule to a position after N P indicates th a t

an N P has been found, thus offering p artial support for the hypothesis. The

position of th e dot a t the right extrem e of th e right hand side in (c) indicates

th a t evidence has been found to support the hypothesis in its entirety; an S

consisting of an NP followed by a VP has been found in th e string.

Each hypothesis m ust be associated w ith a particular substring. It is nor

m al in c h art parsing to identify sub-strings as edges in a graph. Thus, the

first word in a string is usually identified by th e edge which goes from node

0 to node 1; th e second word goes from node 1 to node 2, etc. T he string

consisting of th e first three words is represented by th e edge which goes from

node 0 to node 3. Following G azdar and Mellish (1989: 194ff), I shall represent

hypotheses on edges as follows:

< hhH >

where i is th e s ta rt node, j is th e end node, and H is a, d o tted rule.

To initiahze a chart, an inactive edge (i.e. an edge in which th e dot is a t

the extrem e right hand side of the rule hypothesis) can be placed in the chart

for every word class assignm ent allowed by th e gram m ar for th e words in the

sentence.

Search m ay proceed in a num ber of different ways. Here I shall m ention

only one of these. Proceeding bottom -up, the following rule m ay b e applied

to introduce fresh hypotheses:

B o tto m -u p rule o f P S G chart parsing

If you are adding edge < i,j,A —>W1.> to th e chart, th en for every

rule in the gram m ar of the form B —>A W2, add an edge < i,i,B —>.A

W 2> to the chart. A and B are categories and W1 and W2 are

(possibly em pty) sequences of categories or words. (A dapted from

G azdar and Mellish 1989: 197.)

T he fact th a t the new edge begins and ends a t th e sam e node sim ply results

285
from the fact th a t no p art of it has yet been a ttested in th e string.

T he way in which hypotheses are developed once they enter th e ch art is by

m eans of application of w hat Kay calls th e fundam ental rule:

Fundam ental rule o f P S G chart parsing

If th e chart contains edges < i,j,A -^ W l.B W 2> and < j,k ,B ^ W 3 .> ,

where A and B are categories and W l, W2 and W3 are (pos

sibly em pty) sequences of categories or words, th en add edge

< i,k ,A —>W1 B .W 2> to the chart (G azdar and Mellish 1989: 195).

A version of G azdar and M ellish’s Prolog im plem entation of a b o tto m -u p ch art

parser, slightly modified to enable it to run as a single file under Q uintus

Prolog, can be found in the file g a z d a r j n e l l i s h . p l in A ppendix A 3. (T he

reason for its inclusion will become clear shortly.)

We could go about reconstructing th e notion of a chart parser in th e context

of dependency parsing in a num ber of different ways. In w hat follows I shall

adopt a fairly conservative approach which maximizes sim ilarities w ith PSG

ch art parsing. F irst, let us assume th a t a dot may be placed in th e body of

a DG rule w ith th e interpretatio n th a t everything to th e left of th e dot has

already been atte ste d and nothing to the right of the dot has yet been attested .

Thus the following sam ple dotted dependency rules are possible.

(a) verb(.noun,*,prep)
(b) verb (noun,.*,prep)
(c) verb(.noun,*,.prep)
(d) verb(.noun,*,prep.)

(Let ‘*’ be a variable instantiated to th e sam e category as th e head of th e rule

in which it occurs.)

Exam ple (a) hypothesizes a verb w ith a preceding nom inal dependent and

a following prepositional dependent; no p a rt of th e hypothesis has yet gained

support. In exam ple (b), th e noun has been found, and in (c), th e head verb

has also been found. In exam ple (d), th e dot is a t th e extrem e right hand

side of th e body of the rule, thus indicating th a t the whole stru c tu re has been

286
a tte ste d and the edge is now inactive.

T he bottom -up and fundam ental rules of PSG chart parsing can also be

given a dependency reconstruction,

B o tto m -u p rule o f d ep en d en cy chart parsing

If you are adding edge < i,j,A (W l.)> to the chart, th en for ev

ery rule in the gram m ar of the form B(A ,W 2), add an edge

< i,i,B (.A ,W 2)> to the chart. A and B are categories and W l

and W2 are sequences of categories or words.

F u ndam ental rule o f d ep en d en cy chart parsing

If th e ch art contains

edges < i,j,A (W l,.B , W 2)> and < j,k ,B (W 3 .)> , where A and B

are categories and W l, W2 and W3 are sequences of categories or

words, then add edge < i,k ,A (W l,B ,.W 2 )> to th e chart.

If these rules of dependency chart parsing are applied, all possible depen

dency structures (and sub-structures) for an input string can be produced effi

ciently given a dependency gram m ar in G aifm an form. T he file nm f_chart .p i

in A ppendix A .3 contains an im plem entation of this kind of botto m -u p depen

dency chart parser. Careful com parison of this file w ith g a z d a r _ m e llis h .p l

will reveal th a t th e two are virtually identical in m ost respects, and p artic

ularly in respect of the core parsing algorithm . T h e only difference w orth

noting is th a t dependency gram m ar rules of the form X(*) have no direct PSG

correlates. T hey can not be used as the basis for hypotheses — equivalent

hypotheses have already been entered in the ch art a t initialization — so they

differ from unit rew rite PSG rules which do generate hypotheses. However,

this difference does not interfere w ith the basic control stru c tu re of th e pars

ing algorithm .

We shall return to a discussion of this parser in th e last chapter.

287
12.7 A d jacen cy as a con strain t on search

Most of the parsers surveyed assume an adjacency constraint. The effect of

such a constraint is to limit severely the search space of the parser. This is

clearly illustrated in the case of parsers like my Bonding parser which only
needs to look at the top of a stack. This constraint is also built into most

PSG parsers, since phrases are typically contiguous. At the opposite extreme,

parsers like Covington’s adjacency-free parser — which makes no use of an

adjacency constraint — must search anything up to the whole of the rest of a

sentence in order to find the word they are looking for.

Systems like Kielikone and Hudson’s parser operate within the constraints

of an adjacency constraint but use a dummy relation (e.g. ‘visitor’) to capture

an otherwise non-adjacent word (such as an extracted wh-word) and establish a

link between it and its actual head. This requires the principles of dependency

to be defined so as to allow a word to depend on more than one head or to

depend on the same head by means of more than one dependency relation (i.e.

the moved word must be related by the dummy relation to one head and by

the meaningful relation to th a t head or another head).

I believe th a t one of the major strengths of DG is th at it makes a num

ber of constraints explicit which are usually implicit in PSG. In this way, it

allows the grammar writer and the parser designer to consider each constraint
independently and to experiment with different versions of the constraints.

For example, Hudson found the adjacency constraint to be too tight for his
purposes so he revised it. He is not alone; almost all DG theories and a num

ber of DG parsing systems customize the basic DG mechanism in some ways.

Tinkering with the basic constraints of PSG in this way is almost unheard of

(although when someone does this it tends to revolutionize the way linguists

conceive of problems — witness X grammar and GPSG).

I suggest th a t a potentially fruitful area of research involves refining the

adjacency constraint, so as to minimize the search space of a parser while max

288
imizing the number of phenomena which can be covered. The strict adjacency

constraint built into many of the parsers surveyed is too strict to allow for the

parsing of variable word order languages. However, even variable word order

languages do not allow clauses to intermingle, so some constraints m ust still

apply. The definition of these constraints is a live research topic.

Hellwig has taken an interesting step in exploring one way in which well-

formed structures violating the strict adjacency constraint may be parsed.

This involves increasing the search space during parsing so th a t the top stack

element is not the only one to be examined. However, search in his system is

not unconstrained, as in Covington’s system. Instead, Hellwig’s parser searches

the top stack element in a first parsing cycle, and then searches the next-to-top

element in the next parsing cycle. Thus, the claim implicit in the design of the

parser is th a t an element which is not immediately accessible to its head will

not be separated from its head by more th an one subtree. In this way, head-

dependent pairs which are not adjacent in the standard sense can be found, so

long as they conform to the ‘next-but-one constraint’.

However, a cautious note must be sounded here. If real progress is to

be achieved in this area, modifications and extensions to the basic Gaifman

format of DG rules must be formally defined. W ithout explicit definition of

the systems assumed, all results will be uncertain at best and useless at worst.

Regrettably, strict formal definition has been the exception rath er than the

rule in DG studies. It is to be hoped th a t as interest in dependency parsing

increases, the discipline imposed by the requirements of computers for formal

rigour will help to overcome this shortcoming.

12.8 S u m m ary

In this chapter, drawing on the survey of dependency parsers in the preceding

chapters, I have tried to identify some of the dimensions of variation in depen

dency parsing and to draw out some principles and techniques. Variation was

289
found in search origin, search manner, search order, number of passes, search

focus, ambiguity management, and in the use of an adjacency constraint on

search. Substantial similarities with standard PSG parsing were found. The

main differences concern search origin, search focus, and the use of an adja

cency constraint.

DG trees can be seen as a special case of PSG trees in which every node di

rectly dominates exactly one term inal symbol. One consequence of this is th at

traditional terms relating to the origin of search in constituency parsing, such

as ‘top-down’ and ‘bottom -up’, can not be borrowed into dependency parsing

w ithout some specialization of meaning. I have tried to define these terms

for the purposes of dependency parsing, and have added some new distinc

tions, such as the distinction between ‘deep’ top-down parsing and ‘shallow’

top-down parsing.

Search, in dependency parsing, can focus on a variety of different things.

For a given word, the object of search may be to find a head for the word, or

a dependent for the word, or both. In my discussion I identify eight different

search foci, although others may be possible. The issue of what to search for

seems to be particular to dependency parsing. I have shown how the choice

of search focus can determine a number of design features, and may even

determine whether or not the parser is able to parse successfully.

An adjacency constraint can reduce a large search space so th at it could

hardly be smaller. An adjacency constraint can also prevent a parser from

discovering valid analyses. I have shown how different parsers embody different

attem pts to balance the requirements of constrained search within the context

of natural language phenomena. I have also advocated DG as a particularly

useful framework for exploring this problem.

Most im portantly, I have identified work which still needs to be done. The

management of ambiguity warrants special mention here, since very few depen

dency parsing systems take this problem seriously. The special requirements

290
of a t least some extended versions of DG m ean th a t, for them , existing tools

for the m anagem ent of am biguity in constituency parsers are likely to be in

appropriate.

291
C hapter 13

C onclusion

“Use your head!”

Traditional.

A t the beginning of this thesis I set out the formal properties of DCs, as defined

by Gaifman. I reported th a t his version of DG is equivalent to a subclass of

the CFPSGs, namely the class in which every phrase contains exactly one

category which is a projection of a lexical category. It is exactly this subclass

of CFPSGs which most linguists assume in analyses of natural language. The

differences between the gram m atical systems, then, are not significant either in

terms of their formal power or their adequacy for describing natural language.

However, it must be added th a t many — perhaps the m ajority — of theoretical

linguists who use DG have added extensions to the basic formalism, thereby

creating new kinds of system of uncertain formal power. In this thesis I have

focussed on those dependency systems which have a discernible core which

may be expressed in terms of a Gaifman grammar.

The field of PSG parsing evolved — in computer science and in compu

tational linguistics — with the assumption th a t PSG rules do not distinguish

one item in a phrase (the head) as having privileged status. It is only com

paratively recently (within the last decade or so) th a t most phrase structure
grammarians have come to assume th at every phrase does, indeed, have a

head. Thus, head-driven parsing using PSGs has emerged as a live research

292
topic even more recently. The principal difference between DG and PSG is
that DG rules necessarily identify the head of each construction, whereas PSG

rules only identify the head of a phrase if some additional constraint is supplied
(as in the case of versions of X grammar). Head-marking is intrinsic to DG,

but extrinsic to PSG as originally defined. One would therefore expect to find
a much longer record of work on head-driven parsing in the field of dependency

parsing.

Unfortunately, what emerges from this survey of existing dependency pars

ing systems does not satisfy these expectations. There has been very little

emphasis in the dependency parsing literature on exploring what is distinctive

about parsing with head-marked rules. Some parsers, (for example, the DLT

ATN and DOG systems) make no special use of heads a t all. On the other

hand, there have been hardly any visible attem pts to relate developments in

dependency parsing to well known and understood results in phrase structure

parsing. Only Hellwig’s W FST parser stands as a deliberate attem p t to bor

row an existing PSG parsing technique while attem pting to make use of the
headedness of DG rules.

The empirical evidence furnished by this survey is th a t almost all depen

dency parsers constructed so far operate bottom -up incrementally. The basic

operation of these parsers is to construct pairwise dependency relations. The

discovery of larger constructions (phrases) follows as a consequence of this, not

as the result of special phrase-building operations. However, there is nothing

in all this which could not have a PSG parsing correlate.

By categorizing as many of the parsers surveyed as possible using fairly well-

understood parsing terms (e.g. top-down, depth-first search), I have begun to

explore the space within which dependency parsing algorithms are located.

The most im portant conclusion to draw here is th a t the space is — on almost

every count — the same as th a t occupied by PSG parsing algorithms. It has not
been necessary to introduce completely new terms to describe w hat is going on

293
in dependency parsing algorithms; existing terms will suffice. However, some

minor divergences have come to light as, for example, in the case of top-down

parsing which I have subcategorized into deep and shallow variants. Whereas
deep top-down parsing can be implemented either in a dependency framework

or any PSG framework, shallow top-down parsing appears to be particular to

head-marking frameworks.

Thus, what emerges from the survey is the beginnings of a taxonomy of

dependency parsing algorithms, in which it is clear th at some configurations

of properties have been much more thoroughly explored than others. In this

way, I have identified certain clusters of properties which, though commonly

reported in the the PSG parsing literature, are not represented in this survey.

I have attem pted to make good some of these deficits by describing what

a dependency solution would look like and, especially, by supplying Prolog

instantiations of these solutions in the Appendix.

And so we turn to the hypotheses introduced at the start of the thesis to

help point up the similarities and differences between dependency parsing and

standard PSG parsing.

H y p o th e sis 1

It is possible to construct a fully functional dependency parser

based directly on an established phrase structure parsing algorithm

without altering any fundamental aspects of the algorithm.

I have offered at least two existence proofs of this hypothesis in the text. In

the first case, I showed how a shift-reduce parsing algorithm as standardly

applied in PSG parsing could be taken over into dependency parsing. The

PSG and DG versions of the algorithm differ only trivially in the way in which

they represent knowledge. Otherwise, they are identical. If PSG and DG

parses are followed through for the same sentence with equivalent grammars,

the operation of the parsers is identical, shift for shift, reduction for reduction.

As an even clearer proof of the tru th of Hypothesis 1, I borrowed Gazdar

294
and Mellish’s existing implementation of a PSG bottom -up chart parser and

showed how, with only the most modest of changes to the code, and none at all

to the basic algorithm, it could work given an arbitrary dependency grammar.

This should not be surprising, since this is a very weak hypothesis. It is well

understood th a t dependency rules include phrasal information so w hat is to

stop them working in combination with phrase-building algorithms? However,

it is not the case th a t arbitrary PSG rules incorporate dependency information.

This is the motivation for the stronger hypothesis. Hypothesis 2.

H ypothesis 2
It is possible to construct a fully functional dependency parser using

an algorithm which could not be used without substantial modifi

cation in a fully functional conventional phrase structure parser.

An existence proof for this hypothesis is provided by the divide-conquer algo

rithm. This works on the principle th a t top-down parsing need never hypothe

size an expansion w ithout inunediately checking it in the string. It works solely

because every rule in the dependency gram m ar explicitly mentions a lexical

head, which can always be identified in the rule. This is not the case in an

arbitrary PSG. This algorithm is particularly attractive by virtue of the pos

sibilities it raises for dividing up the parse problem and solving (conquering)

the different parts in parallel.

However, though Hypothesis 2 has been proven literally, it misses an impor

ta n t point. It is difficult to study the subject of dependency parsing without

being drawn to this conclusion: it is invidious to contrast PSG parsers with

dependency parsers; the more profitable comparison is th a t between parsers

which make use of the notion ‘head’ and those which do not. While most of the

standard PSG parsing algorithms are not head-driven, a small num ber (which

use head-marked versions of PSG) are. Conversely, although a dependency

rule w ithout head-marking is inconceivable, this survey has shown th a t by no

means all dependency parsers make significant use of information about heads.

295
The overwhelming weight of opinion in linguistic theory supports the mark

ing of heads in phrases, but remarkably little progress has yet been won by

the introduction of explicitly marked heads in parsing systems. Parsing in the

dependency grammar tradition, which ought to be a rich information source,

turns out to be generally disappointing, not least because the systems which

have been developed have never been systematically related to any other (more

m ainstream) parsing results. I offer this thesis as a first step towards the inte

gration of dependency parsing with mainstream work on head-driven parsing.

296
A p p e n d ix

P r o lo g L istin g s

A . l In tr o d u c tio n

T he program s listed in this appendix are w ritten in Q uintus Prolog (version

3.0.1). A restricted sub-set of Q uintus built-in predicates has been used to

encode the algorithm s described in th e m ain tex t. This sub-set is entirely con

sistent w ith stan d ard ‘E dinburgh’ sy n tax (Clocksin and M ellish 1987). How

ever, a sm all num ber of non-standard predicates has been utilised to set up

the environm ent in which the m ain algorithm s are located. T h e m ost com

mon of these is ensureJLoaded/1 which is broadly equivalent to ‘E d in b u rg h ’

reconsult/1 . It is used to load th e predicates defined in an o th er file. T h e a r

gum ent of ensure J.oaded/1 m ay be either a filename (m inus Q uintus Prolog’s

com pulsory ‘.pi’ extension) or a term of th e form library (X), w here X is th e

nam e of a Q uintus library file. T he only such file to be loaded is files which

provides a collection of predicates for m anipulating te x t files. T h e p a rtic u

lar library predicate used in the program s listed here is f ile_exists/l which

takes as its argum ent th e nam e of a file. T h e predicate succeeds if th e file

exists (i.e. can be found in th e current directory by th e Prolog system ). M ost

practical Prologs provide a broadly equivalent predicate, although predicate

names differ from system to system .

Q uintus Prolog requires th a t all dynam ic predicates (i.e. predicates which

m ay be asserted or retracted a t runtim e) be explicitly declared. This is usu

ally done a t th e beginning of th e file containing th e relevant assert/1 or

retract/1 predicate calls. D ynam ic predicate declarations have th e following

form:

dynamic Predicate/N.

297
Predicate is the name of the dynamic predicate; N is its arity. Both
Predicate and N must be instantiated. Each dynamic predicate declaration

may simply be commented out for use with Prologs which do not require such

declarations.

The listings set out below present a diverse range of recognition and pars

ing algorithms which are united in their use of dependency grammars, but

divided in the ways in which they manipulate their d ata structures, includ

ing their internal representation of grammars. For this reason a compilation

methodology has been used for those algorithms which make use of Gaifman-

style dependency grammars (see Chapter 2 for details). The grammar writer

writes a Gaifman dependency grammar using Gaifman’s standard notation.

This is subsequently compiled into the Prolog-internal representation most

appropriate (i.e. efficient) for each algorithm. The compilation process only

restructures grammar rules — it does not add or subtract information. The

code for the Gaifman dependency grammar rule compiler is listed in the file

dg_com pile.pl.

Section A.2 indexes each predicate which appears in the listing according

to the file in which it is defined. The files themselves are given in alphabetical

order according to file name in Section A.3. A sample grammar to illustrate

some basic features of the parsers appears in Section A.4.

298
A .12 I n d e x o f p r e d ic a te s
P r e d ic a te F ile
addjspans Jncluding_trees/3 hays.parser.pl
allow ed_char/l dg_compile.pl
alpha_num eric/1 dg.com pile.pl
a p p e n d /3 lib.pl
assert Jf_new /1 lib.pl
begin_new J in e / 0 m ap.to_dcg.pl
build _ c a tJist/2 hays.generator.pl
co n cat/3 lib.pl
conquer/5 divide.pl
construct .assignm ents /0 m ap.to.dcg.pl
construct.assignm ents/2 m ap.to_dcg.pl
construct_call/ 0 map_to_dcg.pl
construct_em bedded_call/0 map_to_dcg.pl
construct j:ules / 0 map_to_dcg.pl
cross .p ro d u c t/3 lib.pl
dep_w rite/3 map_to_dcg.pl
dcg.generate/0 dcg.pl
dcg .p arse/0 dcg.pl
dg_com pile/l dg_compile.pl
dg_compile Jo o p / 2 dg_compile.pl
divide/4 divide.pl
di vide.conquer / 0 divide.pl
div id e.co n q u er/1 divide.pl
dot lib.pl
dru le/3 dg_compile.pl
each jn e m b e r/2 lib.pl
eac h .tre e / 4 hays.generator.pl
em bedded .s ta g e .t w o /3 hays.generator.pl
em bedded .x .p ro d u c t/3 lib.pl
e n u m e ra te /0 hays .generator, pi
e n u m e ra te /1 hays .generator, pi
enum erate Jo o p / 0 hays.generator.pl
e n u m erate .su rfa c e/1 hays.generator.pl
e x tra ct.a n y jsu b jstring_w ith.trees/4 hays_parser.pl
ex tract jsubjstring-and_trees/5 hays_parser.pl
ff_druIe/3 dg_compile.pl
flush.com m ent / 3 dg_compile.pl

299
flushline/2 dg_compile.pl
generate_one_root /1 dcg.pl
generate_tree/l hays_generator.pl
g e t^ lL ch a rs/1 dg_compile.pl
get_alLchars2/3 dg_compile.pl
gram m ar.present / 2 dg_compile.pl
g ro u p /4 dg_compile.pl
in_word/2 lib.pl
incorporate/2 dg_compile.pl
in it/1 divide.pl
initialize_parse_table / 2 hays_parser.pl
know n_tree/l hays_generator.pl
lower_case/l dg_compile.pl
map_to_dcg/2 map_to_dcg.pl
m u ltiJin e /1 dg_compile.pl
note_grammar_present / 2 dg_compile.pl
n u m e ric /1 dg_compile.pl
padding_char/l dg_compile.pl
parse Jncreasing_substrings /1 hays_parser.pl
p rin t_ se t/l dcg.pl
purge_grammar_rules / 0 lib.pl
re a d Jn /1 lib.pl
read w o rd /3 lib.pl
restsen t/3 lib.pl
return_adm issible_trees/2 hays_parser.pl
reverse/2 lib.pl
reverse/3 lib.pl
rfF_drule/3 dg_compile.pl
root/1 dg_compile.pl
s a tu ra te /2 dg_compile.pl
sentence Je n g th /1 hays_parser.pl
s e p a ra to r/1 dg_compile.pl
show_complete_tree/0 hays_parser.pl
spans/3 hays_parser.pl
speciaLchar/1 dg_compile.pl
sr_recognize/0 shift_reduce.pl
sr_recognize/1 shift_reduce.pl
sr_recognizeJoop/2 shift_reduce.pl
sr_reduce/2 shift_reduce.pl
stage_one/l hays .generator, pi
stage_two/2 hays_generator.pl
surface/2 hays_generator.pl

300
tab u lar _parse/0 hays_parser.pl
tokenize/1 dg_compile.pl
upper_case/l dg_compile.pl
w h ittle/5 divide.pl
word_class/2 dg_compile.pl
word_classify/2 divide.pl
w ord.exs / 3 map_to_dcg.pl
w rite_sentenceJist /1 lib.pl
w riteln/1 lib.pl

301
A .3 C o d e listin g s

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% FILENAME: dcg.pl
%
% WRITTEN BY: Noriticin M. Fraser
%
% DESCRIPTION: A definite clause grammar incorporating some
% notions from dependency grammar. For more
% information on definite clause grammars see
% Pereira and Warren (1980).
%
% VERSION HISTORY: 1.0 November 28, 1992
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%<

%
% LOAD DECLARATIONS
:- ensure_loaded(lib).
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%:

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * «
*

* dcg_parse/0.
*

* Pêürse a string using a definite clause grammar. Return a dependency

* tree if the parse succeeds.
* For example, typing :
*

* I ?- dcg_parse.
* I: the big mouse chased the timid cat.
*

* produces the result :

*
* Parse tree: verb(noun(det(*),adj(♦),*),*,noun(det(*),adj(♦),*))
*
*/
dcg_parse
read_in(String),
!,
root(Root),
Rule =.. [Root,Tree],
phrase(Rule,String,[*.’]),
writeln([’Parse tree: ',Tree]),
nl.
dcg.parse :-
writeln(’PARSE FAI L E D ’),
nl.

/***************************************************************************
*
* d cg _ g en er a te/0 .

302
*
* Generate all strings (and associated syntactic parse trees) defined
* by the DCG.
*/
dcg_generate
setof ( R o o t ,root(Root),Set),
generat@_one_root(Set),
nl.

/***************************************************************************
*
* generate_one_root(+RootList).
*
* Generate all possible strings for a given sentence root.
*/
generate_one_root([]).
generate_one_root( [First IRest] ) :-
Rule =.. [First,T r e e ] ,
setof( [String,Tree],phrase(Rule.String).Set).
print_set(Set).
generate_one_root(Rest).

/***************************************************************************
*
* print_set(+ResultList).
*
* Print out a list of String/Tree generation result pairs, one
* pair at a time.
*/
print_set( [] )
nl.
print_set([[String.Tree]I Rest])
writeln(['String : ’.String]).
writeln([’Tree: ’.Tree]),
nl,
print_set(Rest).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% THE GRAMMAR
% A very simple definite clause grammeir to illustrate how to
% build dependency trees using DCGs.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

adj(X) — > [Head].

{ class(Head.adj).
X = adj(*) }.

det(X) — > [Head].

{ class(Head.det).
X = det(*) }.

303
noun(X) — > det(Det), [Head],
{ class(Head,noun),
X = noun(Det,*) >.
noun(X) — > det(Det), adj(Adj), [Head],
{ class(Head,noun),
X = noun(Det,A d j ,*) }.

i_verb(X) — > noun(Noun), [Head],

{ class(Head, i_verb),
X = i_verb(Noun,*) }.

t_verb(X) — > noun(Nounl), [Head], noun(Noun2),

{ class(Head,t_verb),
X = t_verb(Nounl,*,Noun2) }.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% VALID SENTENCE ROOTS
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

root(i_verb),
root(t_verb).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% WORD CLASS ASSIGNMENTS
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

class(big,adj).
class(fierce,adj).
class(timid,adj).

class(a,det).
class(the,det).

class(cat,n o u n ) .
class(dog,noun).
class(mouse,noun).

class(snored,i_verb).
class(ran,i_verb).

class(chased,t_verb),
class(likes,t_verb).

304
a%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

r.
y, FILENAME: dg_compile.pl
r.
K WRITTEN BY: Norman M. Fraser
%
y, DESCRIPTION: Compile a standard Gaifman format dependency
% grammar into several different forms, namely :
% Gaifman Prolog form, full form, and reversed
y, full form.
I
% VERSION HISTORY: 1.0 August 12, 1992
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% LOAD DECLARATIONS
% library(files) is a Quintus Prolog library. To run with other
'/, prologs replace call to file_exists/l in dg_compile/2 with the
% local equivalent.
%
:- ensure_loaded(library(files)).
:- ensure_loaded(lib),
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% DYNAMIC PREDICATE DECLARATIONS
:- dynamic multi_line/l.
:- dynamic root/1.
:- dynamic word_class/2.
:- dyneunic drule/3.
:- dynamic ff_drule/3.
:- dynamic rff_drule/3.
:- dynaunic grammar_present/2.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

/***************************************************************************
*
* dg_compile(+File).
* dg_compile(+Compilation,+File).
*
* Compile a Gaifman dependency grsimmar into a variety of
* Prolog-readable forms. Three compilations are supplied.
*
* Gaifman dependency grammars allow rules of the following three
* varieties:
*
* (i) *(X)
* (ii) X(*)
* (iii) X(Yl,Y2,...,Yi,*,Yj...,Yn-l,Yn)
*
* GAIFMAN PROLOG FORM
* Gaifmsui Prolog Form (GPF) is the simplest Prolog implementation of

305
* Gaifman's rule system, therefore it may be regarded as the
* canonical implementation. A grammeir in standard Gaifman form can be
* compiled into GPF as follows:
*
* (1) Replace every rule of type 1 with a GPF rule of type ’r o o t ( X ) . ’
* (2) Replace every rule of type 2 with a GPF rule of type
* ’drule(X, [],[]).'
* (3) Replace every rule of type 3 with a GPF rule of type
* ’drule(X,A,B).’ where A is a Prolog list consisting of Yl-Yi in
* the same order as they appear in the original rule, eind B is a
* Prolog list consisting of Yj-Yn in the same order as they appear
* in the original rule. If nothing precedes in the original rule,
* then A = [] ; if nothing follows in the original rule then B = [].
*
* To compile a Gaifman grammar contained in a file called 'grammar1' into
* GPF, use:
*
* dg_compile(gpf,grammar1).
*
* Since GPF is the default compilation, the same result may be achieved
* using:
*
* dg_compile(grammar1).
*
* FULL FORM
* Full form dependency rules are produced using the following mapping :
*
* (1) Replace every rule of type 1 with a full form rule of type 'root(X).'
* (2) Replace every rule of type 2 with a full form rule of type
* ’ff_drule(X,[X]).'
* (3) Replace every rule of type 3 with a full form rule of type
* 'ff_drule(X,A).’ where A is the Prolog list consisting of the
* concatenation of Yl-Yi, X, and Yj-Yn in that order.
*
*
* To compile a Gaifman grammar contained in a file called 'grammar1' into
* full form, use:
*
* dg_compile(ff,grammar1).
*
* REVERSED FULL FORM
* Rerersed ull form dependency rules are produced using the following
* mapping:
*
* (1) Replace every rule of type 1 with a full form rule of type 'root(X).'
* (2) Replace every rule of type 2 with a full form rule of type
* 'rff_drule(X,[X]).'
* (3) Replace every rule of type 3 with a full form rule of type
* 'ff_drule(X,A).' If A is a Prolog list consisting of the
* concatenation of Yl-Yi, X, and Yj-Yn in that order, then A1
* is the mirror image of that list.
*
* To compile a Gaifman grammar contained in a file called 'grammar1' into
* reversed full form, use:
*

306
* dg_compile(rff,grammarl).
*
* To compile the same source file into all three formats at the same
* time use:
*
* dg_compile(all.greimmarl).
*
* The output of dg_compile/l eind dg_compile/2 is written directly to
* the Prolog internal database (user).
*/
dg_compile(File)
dg_compile(gpf,F i l e ) .

dg_compile(Compilâtion,File) :-
(
file_exists(File)
I
writeln([’Unknown file: ’.File]),
abort
).
writeln([’Compiling ’.File,’ into ’.Compilation,’ format.’]),
see(File),
retract 2uLl(multi_line(_) ),
assert(multi_line(off)),
tokenize(FirstRule),
dg_compile_loop(Compilation,FirstRule),
told,
note_grammar_present(Compilation,File),
close_all_streams,
writeln(’Grammar compilation completed.’).

dg_compile_loop(Compilation,eof([])).
dg_compile_loop(Compilation,eof(Rule)) :-
dot,
phrase(valid_rule(X),Rule),
incorporate(Compilâtion,X).
dg_compile_loop(Compilation,[]) :-
tokenize(Rule),
dg_compile_loop(Compilâtion,Rule).
dg_compile_loop(Compilâtio n ,FirstRule) :-
dot,
phrase(valid_rule(X).FirstRule),
incorporate(Compilâtion,X),
tokenize(NextRule),
dg_compile_loop(Compilât i o n ,NextRule).

incorporâte(all,dependency_rule(Head,Before,After)) :-
assertz(drule(Head,Before,After)),
append(Before,[Head IAfter].Phrase),
assertz(ff_drule(Head,Phrase)),
reverse(Phrase,RevPhrase),
assertz(ff_drule(Head,RevPhrase)).

incorporâte(gpf,dependency_rule(Head.Before,After)) :-
assertz(drule(Head,Before,After)).

307
incorporâte(gpf_sat,dependency_rule(Head,Before,After))
saturate(Before,BefOrel),
saturate(After,After1),
assertz(gpf_sat_drule(Head,BefOrel,Afterl)).
incorporâte(ff,dependency_rule(Head,Before,After)) :-
append(Before,[Head 1 After],Phra s e ) ,
assertz(ff_dmle(Head,Phrase)).
incorporâte(ff_sat,dependency_rnle(Head,Before,After))
saturate(Before,Beforel),
satnrate(After,After1),
Headl =.. [Head,*],
append(BefOrel, [Head IAfter1],Phrase),
assertz(ff_sat_dnile(Headl,Phrase)),
incorporâte(rff,dependency_rnle(Head,Before,After))
append(Before,[HeadlAfter],Phrase),
revers e (Phrase,RevPhrase),
assertz(f f _ d m l e ( H e a d ,RevPhrase) ).
incorporât e (rf f _ s a t ,d e p e n d e n c y _ m l e (Head,Before,After) )
saturate(Before,BefOrel),
saturate(After,Afterl),
Headl =.. [Head,*],
append(Before1 , [Head IAfterl],Phrase),
reverse(Phrase,RevPhrase),
assertz(rff_sat_drule(Headl,RevPhrase)).

incorporâte(_,sentence_root(Root))
assertz(root(Root)).
incorporât e(_,class_ass ign(_,[])).
incorporate(_,class_assign(Class,[FirstWordjRest]))
assertz(word_class(FirstWord,Class)),
incorporate(_,class_assign(Class,Rest)).

note_gramm 2ur.present (all,Granunar)

note.grammar.present(gpf,Grammar),
note.grammar.present(ff,Grammar),
note.grammar.present(rff,Grammar).
note.grammar.present(Format,Grammar) :-
assert (grammar.present (Format,Grammair ) ).

saturate( [], □ ) .
saturate([First IRes t ] ,[New IResult]) :-
New =.. [First,*],
saturate(Rest,Result).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% TOKENIZE A DEPENDENCY GRAMMAR
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

/***************************************************************************
*
* tokenize(-ListOfTokens)
*

308
* Produce a list of tokens for the current line in the standard input,
♦/
tokenize(Result)
get_all_chars(ListOfChars),
group (ListOf Chars, [] , [] ,Result).

/***************************************************************************
*
* get_all_chars(+Filename,-ListOfChars)
*
* Construct a list of all legitimate characters on the current line
* (in reverse order).
*/
get_all_chars(AllChars)
getO( C ) ,
get_all_chars2(C,[],AllChars).

get_all_chars2(C,Result,eof(Result))
end_of_file(C).
get_all_chars2(C,Result,Result) :-
multi_line(off),
newline(C).
get_all_chars2(C,Current,Result)
comment(C),
flushline(C,Cl),
get_all_chars2(Cl.Current,Result).
get_all_chars2(ThisChar,[LastChar ICurrent].Result)
asterisk(ThisChar).
oblique(LastChar).
flush_comment(120.120.C1).
get_all_chars2(Cl.Current.Result).
get_all_chars2(C.Current.Result) :-
close_curly(C).
retractall(multi_line(_)).
asserta(multi_line(off)).
getO(Cl),
get_all_cheurs2(Cl. [CI Current] .Result),
get_all_chars2(C .Current.Result) :-
open_curly(C).
retractall(multi_line(_)).
asserta(multi_line(on)).
getO(Cl).
get_all_chars2(Cl.[CI Current].Result),
get_all_chars2(C.Current.Result) :-
allowed_char(C).
getO(Cl).
get_all_chars2(Cl.[CI Current].Result).
get_all_chars2(C.Current.Result) :-
write(M Illegal character ignored: ’),
p u t(C) .
w r i t e C (ASCII '),
write(C).
write( ’) ’),
nl,

309
getO(Cl),
get_all_chars2(Cl.Current.Result).

/* ***************************************************************** *(**:***+**
*
* group(+Inlist.?Current_Word.+Current_List.-Result )
*
* Tokenize a list of character codes,
*/
group(eof(Anything).O n e .T w o .eof(Result))
group(Anything.O n e .T w o .Result).
group( [] . [] .Result .Result).
group( [] .Current_List .So_Far. [Current_Atom| So_Far]I )
name(Current_Atom.Current_List).
group( [HIT].n.So_Far.Result)
special_char(H).
n a m e (Current_Atom.[H] ).
group(T.[].[Current_AtomlSo_Far] .Result).
group( [HiT] .[] .So_Far.Result)
padding_char(H).
group(T. □ .So_Far.Result).
g roup( [HIT].Current_List.So_Far.Result) :-
alpha_numeric(H).
group(T.[HiCurrent_List].So_Far.Result).
group([HI t ] .Current_List.So_Far.Result) :-
separator(H).
name(Current_Atom,Current_List),
g roup( [HIT]. □ .[Current_Atom|So_Far].Result).

/***********************************************************************
*
* Character manipulation utilities and definitions
*
***********************************************************************/

/***************************************************************************
*
* flushline/0.
*
* Flush the input buffer to the next end of l i n e .
*/
flushline(C.C)
end_of_file(C),
flushline(C.Cl)
newline(C).
getO(Cl).
flushline(_.C)
getO(Cl).
flushline(Cl.C).

/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ] |c * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

310
* :llush_comineiit(+CurrentChar,+PreviousChar,+ReturnChar).
*
* IFlush the input buffer to the end of the next multiline comment,
*/
flush_comment(C,_,C)
e nd _ o f _ f i l e ( C ) .
flush_comment( C , C l ,C 2 )
o bliqu e ( C ) ,
asteri s k ( C l ),
getO(C2).
flush_comment(Cl,_,C3)
getO(C2),
flush_comment < C 2 ,C l ,C3) .

allo)wed_char(C)
padding,_char(C).
allO)wed_char(C) :-
alpha_numericCC)
allowed_char(C)
special _c h a r ( C ) .

separator(C)
paddin g _ c h a r ( C ).
separator(C)
special _ c h a r ( C ) .

padding_chax(C)
space(C).
padding_char(C) :-
tab_char(C).
padding_char(C) :-
c omma(C).
padding_char(C)
period ( C ) .
padding_char(C) :-
newline(C).

alpha_numeric(C)
lower_case(C) .
alpha_numeric(C)
upper_case(C) .
alpha_numeric(C)
underscore(C).
alpha_numeric(C)
numeric(C).

lower-case(C)
C >= 97,
C =< 122.

^PP9r_case(C)
C >= 65,
C =< 90.

311
numeric(C)
C >= 48,
C =< 57.

special_char(C)
open_bracket(C).
special_char(C)
close_bracket(C)
special_char(C) :-
colon(C).
special_char(C)
asterisk(C).
special_char(C)
open_curly(C).
special_char(C)
close_curly(C).
special_char(C)
oblique(C).

end_of_lile(-l), % EOF
tab_char(9). % tab
newline(lO), % nl
space(32). % ' »
comment(37). % %
open_bracket(40). % (
close_bracket(41). % )
asterisk(42), % *
comma(44), % ,
dash(45). % -
period(46). % .
oblique(47), % /
colon(58). % :
underscore(95). % _
open_curly(123). % {
close_curly(125). % >

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% A DEFINITE CLAUSE GRAMMAR FOR DEPENDENCY GRAMMAR RULES
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%
% VALID RULE TYPES
%
valid_rule(X) — >
dependency_rule(X).
valid_rule(X) — >
class_assignment(X).
valid_rule(X) — >
root_declaration(X).

312
% WORD CLASS ASSIGNMENT RULES
%
class_assignment(X) — >
[A], colon_string(B), set_of_words(C),
{atom(A),
X = class_assign(A,C)}.

colon_string(X) — >

set_of_words(X) — >
open_set(A), word_list(X), close_set(C).

open_set(X) — >
['{'].

close_set(X) — >
[»>’].

word_list(X) — >
[A],
{atom(A),
X = [A]>.
word_list(X) — >
[A], word_list(B),
•Catom(A),
X = [A|B]>.

%
% SENTENCE ROOT RULES
%
root_declaration(X) — >
asterisk_string(A), open_brkt_string(B), [C], close_brkt_string(D),
{atora(C),
X = sentence_root(C)>.

asterisk_string(X) — >
['*'] .

open_brkt_string(X) — >
['('].

close_brkt_string(X) — >
[ ’) ’].

%
% DEPENDENCY RULES
%
dependency_rule(X) — >
[A], open_brkt_string(B), asterisk_string(C), close_brkt_string(D),
{atom(A),
X = dependency_rule(A, □ , [ ] )}.
dependency_rnle(X ) — >
[A], open_brkt_string(B), word_list (C), asterisk_string(D),
close_brkt_st:ring(E),

313
■Catom(A),
X = dependency_rule(A,C , [])}.
dependency_rule(X) — >
[A], open_brkt_string(B), asterisk_string(C), word_list(D),
close_brkt_string(E),
■Catom(A),
X = dependency_rule(A, □ ,D)>.
dependency_rule(X) — >
[A], open_brkt_string(B), word_list(C), asterisk_string(D),
word_list(E), close_brkt_string(F),
■Catom(A),
X = dependency_rule(A,C,E)}.

314
'/%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

FILENAME: divide_conquer.pl

WRITTEN BY: Norman M. Fraser

DESCRIPTION: Divide & Conquer, A shallow top-down dependency

parser.

VERSION HISTORY: 1.0 December 17, 1990

1.1 August 8, 1992 (NMF)
1.2 January 16, 1992 (NMF)
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% LOAD.DECLARATIONS
:- ensure_loaded(library(files)).
:- ensure_loaded(lib).
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

/**************************************************************************
*
* divide_conquer/l.
* divide_conquer/0.
*
* Parse a string. Version with filenzime argument loads a Gaifman Prolog
* Form grammar. The parser is based on the 'divide and conquer' algorithm.
* The basic idea is to use the head of a rule to split the string to be
* parsed in two and then to recurse down each half in turn.
*/
divide_conquer(File) :-
(
file.exists(File),
purge_grammax_rules,
dg_compile(File)
I
writeln(['ERROR ! Non-existent grammar file : ',File,'.']),
abort
),
divide.conquer.

divide.conquer :-
write('Type the sentence to be parsed (end with a full stop)'),
nl,
w rite ( ': '),
read_in(Sentence),
word.classify (Sentence, Class.List),
init(Class.List).
divide.conquer :-
writeln(['*** PARSER FAILED ♦*♦»]).

315
/*********************************%*#**************************************
*
* word_classily(+Classless,-Classifi€ed),
*
* Take a list of unclassified words eand return a list of word classes,
* basing assignments on the current (grammar.
*/
word_classify([.], □ ).
word_classif y( [Word IRest_Words] , [Clasîs IClass_List] ) :-
word_classify(Rest_Words, (Class_List),
word_class(Word,Class).

/***************************************************************************
*
* init(+String).
*
* Begin the parse.
*/
init(List)
root(Start),
drule(Start, Left_Deps, Rightc_Deps),
divide (List,L e f t ,R i g h t ,Start )),
conquer (St art. Left, Left _Deps,, [] ,Report 1),
conquer (Start,R i g h t ,Right _Depps, [] ,Report 2) ,
writeln([’Root = ’,Start]),
writeln([’Leftside = ’,Reporttl] ) ,
writeln([’Rightside = ’,Reporrt2] ),

/***************************************************************************
*
* divide (+String,-Lef t P a r t ,-RightParrt,+Head).
*
* Find Head in String emd return thee substring to its left as LeftPart
* and the substring to its right as Right Part.
*/
divide([] ,_,_,_).
divide([H|T],[],T,H).
divide([HIT],[HiLeft],Right,Root)
divide(T,L e f t ,R i g h t ,Root).

/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * % * s t * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

*
* conquer(+Head,+String,+Dependents;,-Remainder.of.Substring,-Report).
*
* Find trees rooted in each of the [Dependents in String. These will
* each depend on Head. Return a n y oif String not accounted for as
* Remainder. Report what has been fdound.
*/
conquer (_,[],[],[],[]). % SUCCEED : all satisfied
conquer (_,[],[_ I_] ,_,_) :- % FAIL: deps but no words
!,
fail.

:316
conquer(Head,[Dep],[Dep],[],[(Dep.HHead)])
druleCDep, [],[]). % SUCCEED: only dep matches only word
conquer(Head,String,[Dep].Remainderr, [(Dep,Head)I Reports]) :-
drule(Dep,Left_Deps,Right_DDeps) ,*/, ONE DEP: divide aind conquer
divide(String,Left,Right,Deep),
conquer (Dep, Left, Lef t_Depps, □ , Report 1) ,
conquer(Dep, Right, Right_DDeps,Remainder,Report2),
append(Report1, Report2, ReeportS).
conquer(Head,String,[First_Dep|Restt_Deps].Remainder,Reports) :- % MANY DEPS
drule(First_Dep, Left_Deps,, Right_Deps),
divide(String,L e f t ,Right,Fiirst_Dep),
conquer(First_Dep,Left.Leftt_Deps,[],Report1),
whittle(First.Dep, Right, RRight_Deps, Remainder2,Report2),
conquer(Head, Remainder2, RRest_Deps,Remainder1,Reports),
append(Report1,Report2,Repoor13),
append(Reports,Reports,Repoort5).
conquer(Head,[First_WordIRest_Wordsg],[First_DepIRest_Deps].Remainder,
[(First_Dep,Head)I Report1]) :-
drule(First_Dep, □,[]),
conquer(Head,Rest_Words,ResSt_Deps,Remainder,Report 1).

/***************************************************************************
*

* whittle(+Head,+String,+Dependentss,-Remainder,-Report).
*

* A reduced version of conquer/5 fcor whittling down String when

* more than one tree must be found I in it.
*/
whittle(_. Remainder, [], Remainder,,_).
whittle(Head,[DeplIRest_Words],[DepilIRest_Deps] .Remainder,[(Depl.Head)I Report1])
drule(Depl, [],[]),
whittle(Head,Rest_Words,Restt_Deps,Remainder,Report1).
whittle(Head,String,[DeplIRest_Deps]],Remainder,Reports) :-
drule(Depl,Left_Deps,Right_DDeps),
divide(String,L e f t ,Rig h t ,Deppl),
conquer(Depl.Left,Left_Deps,,[],Report1),
conquer(Depl.Right.Right_Depps.Remainder,Report2),
append(Report1,Report2,ReporrtS).

2317
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%
% FILENAME: gazdar_mellish.pl
%
% WRITTEN BY: Gerald Gazdar & Chris Hellish, with minor
% modifications by Norman M. Fraser.
%
% DESCRIPTION: Contains the concatenation of several files
% (ncimely: buchartl.pl, chrtlibl.pl, library.pl,
% psgrules.pl, lexicon.pl, examples.pi) from the
% program listings in Gazdar & Hellish (1989).
% Some minor changes have been made to mahe the
% program run under Quintus Prolog. A few
% predicates which are irrelevant here have
% been removed (mostly from library.pl).
%
% VERSION HISTORY: Jcmuary 16, 1993 (date created in this form)
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% ORIGINAL NOTICE FOLLOWS:
%
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% Example code from the book "Natural Language Processing in Prolog" %
% published by Addison Wesley %
% Copyright (c) 1989, Gerald Gazdar & Christopher Hellish. %
% % % % % % % % % % % % % % % % */. % % % % % % % % % % % % % % % % % % % %
%
% Reproduced by kind permission.
%

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /

%
% buchartl.pl A bottom-up chart parser
%
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /

parse (VO,Vn,String) :-
8tart_chart(V0,Vn,String). % defined in chrtlibl.pl
%
add_edge(VO,VI,Category,Categories,Parse) :-
edge(VO,VI,Category,Categories,Parse),!.
%
add_edge(Vl,V2,Category1,[],Parse) :-
assert_edge(Vl,V2,Categoryl,[],Parse),
foreach(rule(Category2,[CategorylI Categories]),
add_edge(Vl,VI,Category2,[CategorylI Categories], [Category2])),
foreach(edge(VO,VI,Category2,[CategorylI Categories],P arses),
add_edge(VO,V 2 ,Category2,Categories,[Parse IParses])).
add_edge(VO,VI,Categoryl,[Category21 Categories],Parses) :-
assert_edge(VO,VI,Categoryl,[Category21 Categories],Par s e s ) ,
foreach(edge(VI,V 2 ,Category2,[],Parse),
add_edge(VO,V2,Categoryl,Categories,[ParseI Parses])).

318
/**************************************************************************/
%
% chrtlibl.pl Librêiry predicates for database chart parsers
%
/**************************************************************************/

%
start_chart
uses add_edge (defined by particular chart parser) to insert inactive
edges for the words (and their respective categories) into the chart
%
start.chart(VO,VO,[]).
st£irt_chart(VO,Vn, [Word IWords] )
VI is VO+1,
f oreach(word(Cat egory,W o r d ) ,
add_edge(VO,VI,Category,[],[Word,Category])),
steurt.chart (V I ,V n ,Words ).
% test
% allows use of test sentences (in examples.pi) with chart parsers
%
test(String)
VO is 1,
initial(Symbol),
parse(VO,Vn,String),
f oreach(edge(VO,V n ,Symbol,[],Parse),
mwrite(Parse)),
retractall(edge(_,_,_,_,_)).
%
% foreach - for each X do Y
%
foreach(X,Y)
X,
do(Y),
fail.
foreach(X,Y)
true.
do(Y) Y , !.
%
% mwrite prints out the mirror image of a tree encoded as a list
%
mwrite(Tree)
mirror(Tree,Image),
write(Image),
nl.
%
% mirror - produces the mirror image of a tree encoded as a list
%
mirror ( □ , [ ] ) !.
mirror(Atom,Atom)
atomic(Atom).
mirror( [X1|X2],Image)
mirror(Xl,Y2),
mirror(X2,Y1),

319
append(Y1,[Y2],Image).
%
% assert_edge
% asserta(edge(...)), but gives option of displaying nature of edge crreaated
%
assert_edge(Vl,V2,Categoryl,[],Parsel)
asserta(edge(Vl,V2,Categoryl,[],Parsel)).
% dbgwrite(inactive(Vl,V2,Categoryl)).
assert_edge(VI,V2,Category1,[Category21 Categories].Parsel)
asserta(edge(Vl,V2,Categoryl,[Category21 Categories],Pars e l ) ) .
% dbgwrite(active(VI,V2,Categoryl,[Category21 Categories])).
%

/*********************************************************%*************'****/
%
% library.pl A collection of utility predicates
%
/**********************************************************************#+***/

%
% '--- > ’ an arrow for rules that distinguishes them from D<CG ( ’— >') iruiles
%
?- op(255,xfx,-- >).
%
% definitions to provide a uniform interface to DCG-style :fule format:
% the 'word' predicate is used by the RTNs and other parse.rs
% the 'rule' clause that subsumes words is used by the cheurt parsers
%
word(Category,Word)
(Category ---> [Word]).
%
rule(Category,[Word])
use_rule,
(Category ---> [Word]).
%
% in order for the clause above to be useful,
% use_rule. needs to be in the file.
%
rule(Mother,List_of.daughters) :-
(Mother --- > Daughters),
n o t (islist(Daughters)),
conj tolist(Daughters,List.of.daughters).
%
% conjtolist - convert a conjunction of terms to a list off terms
%
conjtolist((Term,Terms), [Term I List.of.terms]) !,
conjtolist(Terms,List.of.terms).
conjtolist(Term,[Term]).
%
% islist(X) - if X is a list, C&M 3rd ed. p52-53
%
islist(D) !.
islist([. I . ] ) .
%

320
*/h rread in(X) — convert keyboard input to list X, C&M 3rd ed. plOl-103
VL
r:eaad_in( [Wordll Words] )
getO(CCharacterl),
readwcord(Characterl ,Word,Character2),
restseent(Word,Characters,Words).
%
res3tsent(Word,,Character, [] )
lastwcord(Word),!.
resstsent(Wordll .Character 1, [WordS IWords] ) :-
readwoord(Characterl,Word2,Character2),
restseent(Word2,Characters,Words).
•/.,
rea^dwordCCharaacterl.Word,Characters)
singlee_character(Characterl). !.
nameCWWord.[Character1]).
ge10(CCharact er2).
reac.dword(Charaacterl.Word,Characters)
in_worcd(Characterl,Character3).!.
getO(CCharacter4).
restwoord(Character4.Characters.Characters).
name(WWord,[Character31 Characters]).
reacdwordCCharaacterl.Word,Characters)
getO(CEharacter3),
readwonrd(Character3.Word,Characters).
%
resttword(Charaacterl.[Characters ICharacters].Character3) :-
in_wordd(Characterl.Characters).!.
getO(Chharacter4).
restworrd(Character4.Characters,Character3).
rest:word(Characcter. [] .Character).
%
sing;le_characteer(33). % !
s ing;l e_charact e er (44 ). % .
sing;le_characteer(46). % .
s ing;le_charact esr (58) . % :
sing;le_characteBr(59). % ;
sing;ie_charactesr(63). % ?
%
in_wcord (Char act t e r .Charact er) :-
Chêuractèer > 96.
Charact6 er < 123. % a-z
in_word (Charact e e r .Charact er )
Characteer > 47,
Characteer < 58. % 1-9
in_word(Characteerl.Characters) :-
Charact6 erl > 64.
Characteerl < 91.
CharacteerS is Characterl + 32. % A-Z
in_wo>rd(39.39). ' % '
in_wo>rd(45.45). ! % -
%
lastword( ’. *) •
lastW'ord( ’ ! ’).
lastword( ’) .

321
%
% testi - get u s e r ’s input and pass it to test predicate, then repeat
%
testi :-
w r i t e C ’End with period and < C R > ’),
read_in(Words),
append(String,[Period],Word s ) ,
nl.
test(String),
nl,
testi.
%
% dbgwrite - a switchable tracing predicate
%
dbgwrite(Term)
dbgon,
write(Term),
nl, ! .
dbgwrite(Term).
%
dbgwrite(Term,Var)
dbgon,
integer(Var),
tab(3 * (Var - 1)),
write(Term),
nl, ! .
dbgwrite(Term,Var)
dbgon,
write(Term), write(" "), write(Var),
nl, ! .
dbgwrite(Term,Var).
%
dbgon. % retract this to switch dbg tracing off

/**************************************************************************/
%
% psgrules.pl An example set of CF-PSG rules
%
/**************************************************************************/

%
% DCG style format
%
/* op(255,xfx,--->). */
%
initial(s). % used by chart parsers
%
s ---> (np, v p ) .
np ---> (det, n b ) .
nb ---> n .
nb ---> (n, rel).
rel ---> (wh, v p ) .
vp ---> iv.
vp ---> (tv, np).

322
' ( [xreui*J9ii*o:v‘3[onp‘B*eAB3‘ireuio/i‘9iiq.] ) 3,saq.
^as9q.
’( [P8%P '99%' M91I^'XpWSS 'M91I^'UIT::(] )q.S9Q.
-: e^s9îi
• ( [i[0np‘‘B*û9S‘itpires] )5.S9Q.
-: C:^S9q.
' ( [P9Tp‘urp3[])îlS9q.
-: T^S9Q.
%
%9SZT2d JOJ P9UTJ9P 9q qSTlUI <^S9q.< 9q.%0Tp9J:d - S9"[dureX9 ^S9q. JO Q.9S V %

/**************************************************************************/
%
S9%durex9 q.s9q. jo q.9s v ld*s9xdurex9 %
%
/**************************************************************************/

' [/i9in[] <--- AS

• [psptreq] <--- AP
*[9a u 3] <--- AP
•[9AU8] <--- Aq
•[aus] <--- Aq
• [8»12] <--- Aq
• [eau] <--- AT
* [P9TP] <--- AT
■ [o^] <--- d
• [q.uqq] <--- qa
•[oqa] <--- qa
• [ireuioft] <--- u
• [ireui] <--- u
• [3[oup] <--- u
[X9umsuoo] <--- u
•[^e^] <--- qsp
• [9q%] <--- qsp
•[u] <--- qep
•[pu9xq] <--- du
• [9SX] <— — du
*[Xpires] <--- du
• [UIt5l] <--- du

/* •(<--- ‘x j x ‘533)do -i */

/**************************************************************************/
%
UOOTX9X 9xdurex9 uv xd'iio^TXsX %
%
/**************************************************************************/

%
•(du *d) <— dd
•(s ‘AS) <— dA
(dd ‘du ‘Ap) <— dA
te sts
t e s t ( [lee,handed,a,duck,that,died,to,the,woiticin] )
%

/**************************************************************************/
%
% Necessary addition for Quintus Prolog compatibility
%
/**************************************************************************/

not(X) :-
\+X.

324
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

FILENAME: hays_parser.pl

WRITTEN BY: Norman M. Fraser

DESCRIPTION: A tabular dependency parser based on the

recognition algorithm described by David Hays
in Language 4 0 ( 4 ) :516-S17, 1964.

VERSION HISTORY: 1.0 August 15, 1992

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

LOAD DECLARATIONS
- ensure_loaded(lib).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

DYNAMIC PREDICATE DECLARATIONS

dynamic sentence_length/l.
dynamic spsois/S.

/***************************************************************************
*
* tabular_parse/0.
*
* Parse a string. After initializing the table with each category
* licensed by the string and the grammar, make multiple passes, on
* each pass considering only sub-strings one word longer than in the
* last pass. For each saturated dependency, record (i) the creation
* of a new saturated head and (ii) the tree rooted in that head.
* To conclude, signal either success or failure and, if success,
* return all well-formed trees which span the entire input string.
♦/
tabular.parse :-
retractall(sentence_length(_)),
retractall(spans(_,_,_)),
read_in(String),
initialize_pcirse_table(String),
parse_increasing_substrings(l),
(
show_complete_tree
I
writeln('PARSE FAI L E D ')
).

/***************************************************************************
*
* initialize_parse_table(+String).
*
* Given a list of words, initialize the sub-string table with all
* their possible category assignments.

325
*/
initialize_parse_table(WordString) :-
initialize_parse_table(WordString,0).

% initialize_parse_table/2.
initialize_parse_table([.],N)
assert(sentence_length(N)).
initialize_parse_table([First IRest],M) :-
f indall(Class,word_class(First,Class) .Bag),
N is M+1,
add_spsms_including_trees(Bag,M ,N ) ,
initialize_parse_table(Rest,N).

add_spans_including_trees([]
add_spcins_including_trees( [First IRest] ,M,N)
a s sert(spans(M,[First,*],N)),
add_spaiis_including_trees(Rest,M ,N ) .

/***************************************************************************
*
* parse_increasing_substrings(+Length).
*
* Extract all strings of length Length from the table euid attempt to
* parse them. If parsing succeeds record the head and the dependency
* structure in the table.
♦/
parse_increasing_substrings(N)
sentence_length(N).
parse_increasing_substrings(N) :-
gpf_ s at _drule(H e a d ,Before,After),
append(Before,[Head IAfter].Body),
length(Body,N),
extract_sub_string_amd_trees(N,Sta r t ,Body.Trees,F inish),
NewHead =.. [Head,*],
NewTree =.. [HeadI Trees],
assert_if_new(spans(Start,[NewHead,NewTree].Finish)),
fail.
parse_increasing_substrings(M)
N is M+1,
parse_increasing_substrings(N).

/***************************************************************************
*
* extract_sub_string_and_trees(+N,-Start,-Result,-Trees,-Finish).
*
* Extract a sub-string from the table, N units long where each unit
* is a single word or a fully-connected dependency tree. Returns both
* sub-string auid the corresponding trees. Also returns the Start and
* Finish addresses of the sub-string.
*/
extract_sub_string_and_trees(N,Sta r t ,Result,T r e e s ,Finish) :-
extract_any_sub_string_with_trees(Start,Result.Trees,F i n i s h ) ,
length(Result.N).

326
% extract_amy_sub_string_with_trees/4.
extract_ 2üiy_sub_str ing_with_trees (Start, [Label] , [Tree] ,Finish) :-
spans (Stcirt, [Label,Tree] ,Finish).
extract_ajiy_snb_string_with_trees(Start,[LabelI Substring],[TreelTreeList].Finish)
spans(Start,[Label,Tree].Intermed),
extract_émy_sub_string_with_trees(Intermed,Substring,TreeList.Finish).

/***************************************************************************
*
* show_complete_tree/0.
*
* Succeeds if a root edge (and associated tree) spans the whole sentence
* in the sub-string t a b l e . Writes out all spanning trees to the standard
* output,
*/
show_complete_tree
sentence_length(N),
findall([Label IT r e e ] ,spans(0,[Label ITree],N).TreeBag),
return_admissible_trees(TreeBag.Admit).
writeln('PARSE SUCCEEDED'),
each_member(Admit.writeln),

return_admissible_trees( [] . [] ).
return_admissible_trees([[Label,Tree]I Res t ] , [Tree IResult]) :-
Label =.. [Root I_],
root(Root),
return_admis8ible_trees(Rest.Result).
return_admissible_trees([First IRest].Result)
return,admissible_trees(Rest.Result).

327
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% FILENAME: hays_recognizer.pl
%
% WRITTEN BY: Norman M. Fraser
%
% DESCRIPTION: A recognizer for detrmining whether an arbitrary
% string belongs to the language generated by a
% given grammar. This is an implementation of the
% algorithm described by David Hays in Lsinguage 40(4) :
% 516-517, 1964.
%
% VERSION HISTORY: 1.0 August 8, 1992
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% LOAD DECLARATIONS
:- ensure_loaded(lib).
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% DYNAMIC PREDICATE DECLARATIONS
:- dynamic sentence_length/l.
:- dynamic spans/3.

/***************************************************************************
*
* recognize/0.
*
* Try to recognize a string. After initializing the table with each
* category licensed by the string and the grammar, madce multiple
* passes, on each pass considering only sub-strings one word longer
* than in the last pass. For each new saturated dependency, record
* the creation of a new saturated head. Signal either success or
* failure in recognizing the string.
*/
recognize :-
retractall(sentence_length(_)),
retractall(spans(_,_,_)),
read_in(String),
initialize_table(String),
apply_rules_of_increasing_length(1),
(
complete_spain ->
writeln('PARSE SUCCEEDED')
I
writeln('PARSE F A I L E D ')
).

/***************************************************************************
*
* initialize_table(+CatStr).
*

328
* Given a list of words, initialize the sub-string table with all
* their possible category assignments.
*/
initialize_table(WordString)
initialize_table(WordString,0).

% initialize_table/2.
initialize_table([.],N)
assert(sentence_length(N)).
initialize_table([First I Rest],M) :-
findall(Class,word_class(First,Class),Bag),
N is M+1,
add_spems(Bag,M,N),
initialize_table(Rest,N).

add_spans( []
add.spans( [First I R e s t ] ,M ,N)
assert(spans(M,First,N)),
add_spans(Rest,M ,N ) .

/***************************************************************************
*
* apply_rules_of_increasing_length(+Length).
*
* Extract all strings of length Length from the table and attempt to
* parse them. If parsing succeeds record the head aind boundaries of
* the new edge in the table.
*/
apply_rules_of_increasing_length(N)
sentence_length(N).
apply_rules_of_increasing_length(N)
gpf_sat_drule(Head,Before,After),
append(Before,[HeadI After],Body),
length(Body,N),
extract_sub_string(N,Start,B o d y ,Fini s h ) ,
New =.. [Head,*],
assert_if_new(spans(Start,N e w ,Finish)),
fail.
apply_rules_of_increasing_length(M) :-
N is M+1,
apply_rules_of_increasing_length(N).

/***************************************************************************
*
* extract_sub_string(+N,-Start,-Result,-Finish).
*
* Extract a sub-string from the table, N units long where each unit
* is a single word or a fully-connected dependency tree. Also returns
* the Start emd Finish addresses of the sub-string.
*/
extract_sub_string(N,Start,Result,Finish) :-
extract_ainy_sub_string(Start,Result,Finish),
length(Result,N).

329
% extract_any_sub_string/3.
extract_cOiy_sub_string(Start, [Label] .Finish) :-
spans(Start.Label,Finish).
extract_any_snb_string(Start,[Label ISubstring].Finish) :-
spans(Start.Label.Intermed).
extract_any_sub_string(Intermed.Substring.Finish),

/***************************************************************************
*
* complete_span/0.
*
* Succeeds if a root edge spans the whole sentence in the sub-string table.
♦/
complete.span
8 ent enc e_1 ength(N ),
spéins(0,Label,N),
Label =., [Root,*],
root(Root).

330
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

FILENAME: hays_generator.pl

WRITTEN BY: Norman M. Fraser

DESCRIPTION : Given a dependency grammar in Gaifman Prolog

Form, enumerate all the strings generated by
the grammar. This is an implementation of the
algorithm described by David Hays in Language
40(4): 514-515, 1964.

Like many other classes of grammar, Gaifman

grammars can use recursion to produce
infinitely long strings. When presented with
a grammar having this property. H a y ’s algorithm
will never halt. The version here restricts
enumeration to the set of dependency trees of
depth less than Max, where Max is defined
using max_tree_depth/l.

VERSION HISTORY: 1.0 August 8, 1992

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

LOAD DECLARATIONS
libraryCfiles) is a Quintus Prolog library. To run with other
prologs replace call to file_exists/l in enumerate/1 with the
local equivalent.

:- ensure_loaded(libreury(f iles) ).
:- ensure_loaded(lib).
:- ensure_loaded(dg_compile).
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% DYNAMIC PREDICATE DECLARATION
:- dynamic known_tree/l.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

/***************************************************************************
*
* enumerate(+File).
*
* The top level predicate. Enumerates all the strings generated by a
* dependency grammeur in Gaifman Prolog Form contained in File.
*/
enumerate(File) :-
(
file_exists(File),
purge_grammar_rules,
dg.compile(gpf,File)

331
writeln([’ERROR! Non-existent grammar file: ’ .File,’.’]),
abort
).
retractall(known_tree(_)),
enumerate_loop.

/***************************************************#***********************
*

* enumerate/0.
*
* An alternative top level predicate. Enumerates a l l the strings generated
* by the dependency grammar in Gaifman Prolog Form w h i c h has already
* been compiled.
*/
enumerate
(
grammar.present(gpf,_)
I
writeln(’ERROR ! GPF grammar not l o a d e d . ’),
abort
),
retractall(known.tree(_)),
enumerate.loop.

/**************************************************»*************************
*
* enumerat e_loop/0.
*
* A failure-driven loop which forces backtracking through all possible
* strings generated by the g r a m m a r .
*/
enumerate.loop :-
generate.tree(Tree),
(
known_tree(Tree) ->
fail
I
assert(known.tree(Tree)),
build_cat_list(Tree,CatStr)
).
enumerate.surface(CatStr),
fail,
enumerate.loop.

/***************************************************************************
*
* generate.tree(-Tree).
*
* Generating a dependency tree is a two-stage process as described by
* Hays.
*/

332
g;enerate_tree(Tree) :-
stage_one(Root),
stage_two(Root,T r e e ) .

/***************************************************************************
*
* stage_one(-Root).
*
* The first stage retrieves a permissible sentence root from the
♦ g rammar.
*/
Stage_one(Root)
root(Root).

/***************************************************************************
*
* stage_two(+Root,-Tree).
* stage_two(+Root,-Tree,+N).
*
* The second stage constructs a Tree rooted in Root and well-formed
* according to the rules of the grammar being used. N is a counter
* which keeps track of the depth of the tree. When max_tree_depth(Max),
* N = Max, enumeration is aborted.
*/
stage_two(Root,Tree) :-
stage_two(Root,T r e e ,1).

stage_two(Root,Tree,_) ;-
drule(Root, [],[]),
Tree =.. [Root,*].
stage_two(Root,Tree,N)
drule(Root,Before,After),
embedded_stage_two(Before,BeforeTrees,N ) ,
embedded_stage_two(After,AfterTrees,N),
append([Root IBeforeTrees],[*IAfterTrees],ListOfTrees),
Tree =.. ListOfTrees.

embedded_stage_two(_,[],Max) ;-
max_tree_depth(Max),
writeln(’Maximum depth reached in search tree. Pruning...’),
!.
embedded_stage_two( [] ,[],_).
embedded_stage_two( [Head|Tail],[HeadTreeITailTrees],M) :-
N is M+1,
stage.two(Head,HeadTree,N ) ,
embedded_stage_two(Tail,TailTrees,M ) .

/***************************************************************************
*
* max_tree_depth(-Integer).
♦
* This is required to avoid infinite looping. The maximum may be reset

333
* to smy positive integer value, as required.
*/
max_tree_depth(20).

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

*
* build_cat_list(+Tree,-CatList).
*

* Given a dependency Tree, produce a list of word categories in

* the correct surface order for that tree.
*/
build_cat_list( [] ,_).
build_cat_list(Tree,CatList)
Tree =.. [Root IRest],
each_tree(Root,R e s t ,[],CatList),
! .

each_tree(_, [] ,Result,Result ).
each_tree(Root, [*IRes t ] ,Current,Result) ;-
append(Current,[Root],New),
each_tree(_,Rest,New,Result).
each_tree(Root, [Terminal IRes t ] ,Current,Result) :-
Terminal =.. [Name,*],
append (Current, [Naune] ,New),
each_tree(Root,R e s t ,New,Result).
each_tree(Root,[Tree IRes t ] ,Current,Result) :-
build_cat_list(Tree,Res1),
append(Current,Resl,New),
each_tree(Root,R e s t ,N e w ,Result).

/***************************************************************************
*
* enumerate_surface(+CatList)_.
*
* Find all grammatically possible surface strings which instamtiate a
* list of word categories. Write each of these to the standard output.
*/
enumerate_surface(CatList)
findall(String,surface(CatList,String),A l l ) ,
each_member(A l l ,write_sentence_list),
!.

/***************************************************************************
*
* surface(+CatList,-SurfList).
*
* Return a single list of surface forms (words) for a given list
* of word categories.
*/
surface ([] , [] ).
surface([Cat I R e s t ] ,[Word I Result])
word_class(Word,Cat),
surface(Rest,Result).

334
" (a.tidTii)9SJBd"nqx
*(qndui)UT“p99j
*(, ;,)9qxj:ft
*iu
*(,p9sx-ed 9q oq. 3uxxqs 9dXq. 9S99Xd, )9qxxfl
9SX9d"x9qTi9m9X3iix
/*
•X9sxed 9qq oqux 3uxjq.s 9qq ss9j -3uxxqs qndux tits xoj qduioxj *
*
" 0 /9 S T e d " X 9 q % I 9 U I 9 X 3 lI X *
*
***************************************************************************/

• 9SXT2d“ XTBqU9UI9X0UX
‘(
qaoq9
‘ ( [ ‘ ■<‘®ITd‘ t :®ITÎ JTBUiurex3 qu9asxx9-uofi
I
u ix o j 3 o x o x d xremjxBj) u x ® */,*/, (9 x x d )9 x x d u io o “ 3 p
*S9XTlX“XBUIUrex3~93xTld
‘ (®ITd)sîiSTX®"®ITJ:
)
- : ( 9 X x j ) 9 S X 9 d ” x^Q^^®ï“®-i3UT
/*
•0/®SXTBd“XT2qU9ra9XOUX *
XTT23 U9 qq ‘u u o j 3oxoxj ireuijx^D ux 9XTJ vhoxj X9 tmirex3 Xoti9 pu 9 <j9 p 9 p 9 oq *
*
■(®ITi+)®s^«I“I^3^ti9m9Xoux
*
*
++***+************++*******************************************************/

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
' (( s 9 X T f ) X x 9 x q x x ) p e p s o x " e z T i8 ii9
' (qxx)pepT20x"e% Ti8U9
* (9XTduiOO“ 3 p ) p 9 p 9 0 X “ 8-Itl8U 9 - :
SNOiiVTiviDaa a v o i %
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
Z661 ‘ 8 ^^8Ti3tiv O'T :A H 01SIH N0ISH3A %
%
X9SX9d %
Aou9 pu 9 d 9 p 90np9X-qjxiis dn-uioq:ioq XTeq.Ti8 ui9 X0 UX uv : NOLldiaOSaa %
%
X9S9XJ 'M U9UIX0M : AS. NaiXIdW %
%
Xd*9onp9X“ q3:xqs“x®^Ti®ui9JOiix ^aWVNaild %
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
/*********************** ******************************************** *********
*
* ibu_parse(+Input).
*
* The top level parse predicate,
*/
ibu_parse(Input)
ibu_parse_loop(Input, □ ),
w r i t e ( ’Parse succeeded’),
nl.
ibu_parse(_)
w r i t e ( ’Parse failed’),
nl.

/****************************************************************************
*
* ibu_parse_loop(+Input,-Result).
*
* The main parse loop. There are three possibilities: termiinate,
* reduce, and shift. Result reporting is suppressed here too emphasize
* the simplicity of the algorithm,
*/
ibu_parse_loop( [.] , [dr(Root, [],[])]) */,'/,TERMINNATE
root(Root),
ibu_p 2irse_loop(Input, [First I [Second IRest] ] ) :- */,*/,REDUCEE
reduce_inc(First,Second,Result),
ibu_parse_loop(Input,[Result IRest] ).
ibu_parse_loop( [Word IRest] ,Stack) :- */,*/,SHIFT '
word_class(Word,Class),
drule(Class,Before_Deps,After_Deps),
reverse(Before_Deps,Before_Depsl),
ibu_parse_loop(Rest,[dr(Class,Before_Depsl,After_Depps)I Stack]),

/****************************************************************************
*
* reduce_inc(+StackTop,+StackNext,-NewTop).
*
* The rules of reduction. The second and third rules basiccally do the
* same thing but two clauses are required because of the wway in which
* Prolog constructs lists,
*/
reduce_inc(dr(X, [YI Alpha] ,Beta) ,dr(Y, [],[]),dr(X, Alpha,Betaji) ),
reduce_inc(dr(X, □ ,Alpha),dr(Y, □ ,[X]) ,dr(Y,[],Alpha)),
reduce_inc(dr(X, □ ,Alpha) ,dr(Y, [] , [XI Beta] ) ,dr(Y, [] , [Alpha I]|Beta] ) ),

336
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% FILENAMME: lib.pl
%
% WRITTENf BY: Norman M. Fraser
%
% DESCRIPTTION: A library of mostly general-purpose predicates.
% Originally designed for use with a variety
% of programs making use of dependency grammars,
% hence the presence of more specific predicates
% such as gpf_rules_present/0.
%
% VERSIONf HISTORY: 1.0 August 8, 1992
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

y****************************************************************************
*
* append(+**Listl,+*List2,+*Result).
*
* Append Liistl and List2 to form Result. Can also be used in reverse
* to split ]Result into pairs of sub-lists.
♦/
/*
append( □ ,Liist.List).
append( [HeadilTaill].List,[Head|Tail2]) :-
appesnd(Tail1 .List,Tail2).
*/

/****************************************************************************
*
* assert_ifJ_new(+Clause).
*
* If clause ! exists in the database then do nothing; otherwise add it.
*/
assert_if_neww(Clause) :-
Clausse =,. [Head IBod y ] .
clausse(Head,Body).
!.
assert_if_neww(Clause) :-
asserrt(Clause).
I

/****************************************************************************
*
* concat(?Prcefix.+Suffix,?Whole).
*
* Append a clharacter string to an atom.
*/
coneat(Prefix;,SuffixChars,Whole) :-
ncime(IPrefix.PrefixChars).

337
append(PrefixChars.SullixChars,WholeChars),
nsone (Whole,WholeChars ).

/***************************************************************************
*
* cross_product(+Listl,+List2,-Result).
*
* Produces the cross product of two lists. List 1 and List2.
*/
cross .product ( □,_,[]).
cross_product([H|T],In,Out)
embedded_x_product(In,H ,Intermedl),
cross_product(T,In,Intermed2),
append(Intermedl,Intermed2,Out).

% embedded_x_product/3.
embedded_x_product ([],_,□).
embedded_x_product( [HIT],Const, [[Const,H]I Result]) :-
embedded_x_product(T,Con s t ,Result).

/***************************************************************************
*
* dot/0.
*
* Write a dot to the standard output. Used for registering activity
* lengthy processes.
♦/
dot ;-
write(user,’. ’),
flush.output(user).

/***************************************************************************
*
* each_member(+List,+Predicate).
*
* Applies a Predicate of arity=l to each item in List. Predicate
* will normally have side effects. For example, a typical usage
* would be to write each member of a list: each.member(List,write).
*/
each_member( □ ,_).
each_member([Argument IRest],Predicate)
Term =.. [Predicate,Argument],
call(Term),
each_member(Rest.Predicate).

/***************************************************************************
*
* purge_grammar_rules/0.
*
* Retract dependency grammar rules (of all formats) from the Prolog
* database.

338
*/
purge_grammar_rules :-
r e t r a c t a l l ( d r u l e ,
retractall(gpl_sat_drule(_,_,_)),
retractall(ff_drule(_,_)),
retractall(rff_drule(_,_)),
r e t r a c t a l K r o o t (_) ),
retractall(word_class(_,_)).

y***************************************************************************
*
* read_in(-ListOfAtoms).
*
* Read a sentence terminated by a legitimate last character from the
* standard input. Convert input to lower case and filter excluded
* characters. Return a list of atoms terminated by a fullstop.
*
* From Clocksin & Mellish (1987) Programming in Prolog. Berlin:
* Springer-Verlag. (3rd Edition). 101-103.
*/
read_in([Word IWords])
getO(Characterl),
readword(Characterl,Word,Character2),
restsent(Word,Character2,Words).

% Given a word and the word after it, read in the rest of the
% sentence.
restsent(Word,Character,[])
lastword(Word),!.
restsent(Wordl,Character1 , [Word21 Words]) :-
readword(Charact er1,W o r d 2 ,Charact er2),
restsent(Word2,Character2,Words).

% Read in a single word, given an initial character, emd remembering

% that the character came after the word.
readword(Characterl,Word,Character2)
single_character(Characterl), !,
ncime(Word, [Characterl] ),
getO(Character2).
re adword(Charact er1,W o r d ,Charact er2)
in_word(Characterl ,Chéiracter3), !,
getO(Character4),
restword(Character4,Characters,Character2),
name(Word,[Character31 Characters] ).
readword(Character1,W o r d ,Character2)
getO(Chéiracter3),
readword(Charact er3,W o r d ,Charact er2).

restword(Characterl, [Chêiracter21 Characters] ,Character3)

in_word(Characterl,Character2),!,
getO(Character4),
restword(Character4,Characters,Character3).
restword(Character,[],Character).

339
% These cheuracters form words on their own.
single_character(33). % !
single_ charact e r (44)
s ingle_charact e r (46)
s ingle_character(58)
s ingle_character(59)
single_character(63)

% These characters can appear within a word. The second in_word clause
% converts characters to lowercase.
in_word(Character,Character) :-
Character > 96,
Character < 123. % a-z
in_word(Character,Character) :-
Character > 47,
Chsoracter < 5 8 . % 1-9
in_word(Characterl ,Chcuracter2) :-
Characterl > 64,
Characterl < 91,
Ch«uracter2 is Characterl + 32. % A-Z
in_word(39,39). % ’
in_word(45,45). % -

% These words terminate a sentence.

lastword(’.’).
lastword(’ !’).
lastword(’? ’).

/***************************************************************************
*
* reverse(+ForwêirdList ,-BackwaxdList).
*
* Reverse ForwardList to produce BackwsurdList.
*/
reverse(In,Out) :-
reverse (In, [] ,Out).
reverse([],G u t ,O u t ) .
reverse( [First 1Rest],Temp,Out) :-
reverse(Rest,[First ITemp],Out).

y***************************************************************************
*
* writeln(+Data).
*
* Write Data to the standard output ending with a newline, where Data
* is either an atom or a list of atoms.
*/
writeln( [] ) :-
nl.
writeln([H|T]) :-
write(H),
writeln(T).

340
writeln(X)
write(X),
nl.

/***************************************************************************
*
* write_sentence_list(List).
*
* List is a list of atoms. Write each atom to the standard output,
* separated by a space character.
*/
write_sentence_list([])
nl.
write_sentence_list([First IRest]) :-
write(First),
write(' *),
write_sentence_list(Rest).

341
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% FILENAME: map_to_dcg,pi
%
% WRITTEN BY: Norman M. Fraser
•/
y DESCRIPTION: Map a Gaifman-format dependency grammar into
y a definite clause grammar.

VERSION HISTORY: 1.0 January 19, 1993

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% LOAD.DECLARATIONS
:- ensure_loaded(lib).
:- ensure_loaded(dg_compile).
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% DYNAMIC PREDICATE DECLARATIONS
:- dynêunic max_no_deps/2.

/***************************************************************************
*
* map_to_dcg(+InFile,+OutFile).
*
* Read a Gaifman format dependency grammar from InFile. Write a definite
* clause grammar to OutFile.
*/
map_to_dcg(InFile,OutFile) :-
dg_compile(InFile),
tell(OutFile),
write('%%% DCG GENERATED FROM THE DEPENDENCY GRAMMAR: ’),
write(InFile),
writeC’ */,*/,*/.’),
nl, nl,
write(':- ensure_loaded(lib).’),
nl, nl,
write( ’*/,•/, PARSE PREDICATES ’) ,
nl,
construct_call,
nl, nl,
retractall(max_no_deps(_,_)),
write(»•/;/, RULES'),
nl,
construct_rules,
nl, nl,
write('%% WORD CLASS ASSIGNMENTS’),
nl,
construct_assignments,
told.

342
/ ***************************************************************************
*
* construct_call/0.
*
* Construct a ’dcg_parse* predicate for parsing with the grammar.
*/'
coinstruct_call
write( ’dcg.paxse
begin_new_line,
write( *write(*’Please type the sentence to be parsed’’) , ’),
begin_new_line,
write( ’n l , ’),
begin_new_line,
write( ’read_in(Input),’),
begin_new_line,
w r ite( ’dcg_parsel(Input).’),
nl, nl,
construct_embedded_call.

/***************************************************************************
*
* construct_embedded_call/0.
*
* Construct a parse predicate for each different type of root
* allowed by the DG.
♦/
construct_embedded_call :-
retract(root(Root)),
write( ’dcg_parsel(Input) :-’),
begin_new_line,
write( ’phrase(rule_’),
write(Root),
write( ’(Tree),I n p u t ,[.]),’),
begin_new_line,
write ( ’write ( ’ ’PARSE SUCCEEDED : ” ) , ’),
begin_new_line,
write( ’write(Tree),’),
begin_new_line,
write( ’nl, n l . ’),
nl,
construct_embedded_call.
construct_embedded_call :-
write( ’dcg_parse :-’),
begin_new_line,
write ( ’writ e ( ’ ’PARSE FAI L E D ” ) , ’),
begin_new_line,
write( ’nl, n l . ’),
nl.

343
/**************************************************************#**********:**
*
* begin_new_line/0.
*
* Initialize a new line of Prolog code.
*/
begin_new_line
nl,
t ab ( 8 ) .

/**************************************************************:***********'**:
*
* construct_rules/0.
*
* Add a DCG rule for every DG rule in the grammar. Ensure that DCG
* rules return a parse tree as their result.
*/
construct_rules
retract(drule(Head,Pre,Post)),
writ e ( ’r u l e _ ’) ,
write(Head),
wri t e( ’(X) — > ’),
dep_write(Pre,’A ’,1),
write(' ’),
write( *word_’),
write(Head),
write( ’,’),
dep_write(Post,’B ’,1),
nl.
tab ( 8 ) ,
write( ’{ X =.. [’) ,
write( ” ” ) ,
write(Head),
write( * ’*’),
retract(max_no_deps('A ',Amax)),
write_exs( ’A ’,1, Ama x ) ,
write(*,*’),
retract(max_no_deps( ’B ’,Bmax)),
write_exs( ’B *,1,B m a x ) ,
w ri t e C ’] } . ’),
nl,
construct_rules.
construct_rules.

/***************************************************************************
*
* dep_write/3.
*
* Map a list of dependents for a head onto a list of calls to DCG
* rules.
*/
dep_write([].Prefix,N)

344
aj.ssert(max_no_deps(Prefix,N)),
dep_writej ( [[First IRest] ,Pre f i x ,M) :-
wri.te(* ’),
wri.te(’rule_*),
wrri.te(First) ,
wrri;te( ’( ') ,
w:ri:te(Pref ix) ,
w]ri te( M ) ,
wa:iite(*),’),
N is M+1,
d«ep_ write (Rest, Prefix, N ) .

/ ***************************************************************************
*
* write_exs/3.
*
* Write result variables from all DCG rules which are called within
* some rmle).
*/
write_exs(_,Max,Max) .
write_exs( P refix,M,Max)
w r i t « ( » ,»),
write( P r e f i x ) ,
write(M),
N is M+1,
write_exs(Prefix,N,Max).

/ * * * * * * * * * * * * i | ( : | i ^ : ( c * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

*
* construct__assignments/0.
* construct_assignraents/2.
*

* Generate a set of DCG word class assignment rules corresponding to

* the DG word class assignments.
*/
construct_assignments
word_class(Word,Class),
setof(X,word_class(X,Class) ,Bag),
construct_assignments(Bag,Class),
retractall(word_class(_,Class)),
construct_assignments,
construct_assignments,

construct_assignments(□ ,_)
nl.
construct_assignments([WordlRest],Class)
write( 'word.’) ,
write(Class),
write( ’ — > [ ’),
write(Word),
write(’] . ’),
nl,
construct_assignments(Rest,C l a s s ) .

345
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%
FILENAME: nmf_chaxt.pi

WRITTEN BY: Norman M. Fraser

Based in very large measure on a program
written by Gerald Gazdar & Chris Mellish.
All significant differences are identified.

DESCRIPTION: Contains the concatenation of parts of several

files (naunely: buchartl.pl, chrtlibl.pl,
library.pl) from the program listings in Gazdar
& Mellish (1989).
Some minor chainges have been made to make the
program run under Quintus Prolog. A few
predicates which are irrelevant here have
been removed (mostly from library.pl).

The most significant difference between this and

the program written by Gazdar and Mellish is that
their chart parser presupposed a phrase structure
grammar whereas this one presupposes a dependency
grammar.

VERSION HISTORY: January 16, 1993 (date created in this form)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% ORIGINAL NOTICE ON GAZDAR & MELLISH’S MATERIAL FOLLOWS :

% % %% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %% %
Example code from the book "Natural Language Processing in Prolog"
published by Addison Wesley
Copyright (c)1989, Gerald Gazdar & Christopher Mellish.
% % %% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %

Reproduced by kind permission

:- ensure_loaded(dg_compile).

:- dynamic edge/4.

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /

%
% buchartl.pl A bottora-up chart parser
%
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /

%
% This new initialization predicate loads a dependency grammar (as
% defined in File) in full form.
%
initialize_dchart(File) :- %% NEW PREDICATE

346
(
file_exists(File),
purge _ grammar,rule s ,
dg_compile(ff ,File) '/,*/, load a DG in full form
I
writeln([’ERROR! Non-existent grammar file: ’ .File,’. ’]),
abort
).

dchart_parse(VO,Vn,String)
start_chart(VO,Vn,String). % defined in chrtlibl.pl
%
a d d _ e d g e [’* ’],_). */.*/. NEW CLAUSE - no dependents
add_edge(VO,VI.Category.Categories.Parse)
edge(VO,V1.Category.Categories.Parse).I.
add_edge(Vl,V2.Categoryl.[].Parse)
assert_edge(Vl.V2.Categoryl.[].Parse).
foreachCrule(Category2.[Categoryl|Categories]).
add_edge(Vl.VI,Category2.[CategorylI Categories].[Category2])).
foreach(edge(VO.V I ,Category2.[CategorylI Categories].Parses).
add_edge(VO,V2.Category2.Categories.[ParseI Parses])).
add_edge(VO.VI.Categoryl.[Category21 Categories].Parses)
assert_edge(VO,VI.Categoryl.[Category21 Categories].Parses).
foreach(edge(Vl.V2.Category2.[].Parse),
add_edge(VO.V2,Categoryl.Categories.[ParseI Parses])).

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /

%
% chrtlibl.pl Library predicates for database chart parsers
%
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /

%
start_chart
uses add_edge (defined by particular chart parser) to insert inactive
% edges for the words (and their respective categories) into the chart
%

start_chart(VO.VO.[]).
start_chart(VO.Vn.[WordlWords])
VI is VO+1,
foreach(word(Category.Wor d ) .
add_edge(VO,Vl.Category,[].[Word.Category])).
start_chart(VI.Vn.Words).
% test
% allows use of test sentences (in examples.pl) with chart parsers
%

test(String)
VO is 1.
% initial(Symbol). %% OLD VERSION
root(Symbol). %% NEW VERSION
dchart_parse(VO.Vn.String). */,*/. NAME CHANGE
foreach(edge(VO.Vn.Symbol.[].Parse).
mwrite(Parse)),
retractall(edge(_,_._._,_)).

347
%

% foreach - for each X do Y

foreach(X,Y)
X,
do(Y),
fail.
foreach(X,Y)
true.
do(Y) Y,!.
%

% mwrite prints out the mirror image of a tree encoded as a list

mwrite(Tree)
mirror(Tree,Image),
write(Image),
nl.
%

% mirror - produces the mirror image of a tree encoded as a list

mirror ([],[]) !.
mirror(Atom,Atom)
atomic(Atom).
mirror( [XIIX2],Image)
mirror(XI,Y2),
mirror(X2,Yl),
append(Y1,[Y2],Image).
%

% assert.edge
% asserta(edge(...)), but gives option of displaying nature of edge created
%

assert_edge(VI,V2,Categoryl,[],Parsel)
asserta(edge(Vl,V2,Categoryl,[],Parsel)),
%

dbgwrit e(inactive(Vl,V2,Categoryl)).
assert_edge(VI,V2,Category1 , [Category21 Categories],Parsel)
asserta(edge(Vl,V2,Categoryl,CCategory21 Categories],Parsel)),
%

dbgwrite(active(Vl,V2,Categoryl,[Category21 Categories] )).

/**************************************************************************/
%

% library.pl A collection of utility predicates

%
/**************************************************************************/

% *--- >* an arrow for rules that distinguishes them from DCG ( ’— > ’) rules
%

?- op(255,xfx,--->).
%

word(Category,Word)
% (Category > [Word]). %% OLD VERSION

348
word_class(Word,Category) . */.*/• NEW VERSION
7.
7rule(Mother,List_of.daughters) %% OLD VERSION
% (Mother --- > Daughters),
% not(islist(Daughters)),
% conj tolist(Daughters,List_of.daughters).
rule(Head,['*']) %% NEW VERSION
ff.drule(Head,[Head]).
rule(Head,Dependents)
ff.drule(Head,Dependents),
Dependents \== [Head].
%
% conjtolist - convert a conjunction of terms to a list of terms
% (NOW REDUNDANT)
%conjtolist((Term,Terms), [Term1List.of.terms]) !,
% conjtolist(Terms,List.of.terms).
y.conj tolist (Term, [Term] ).
%
% islist(X) - if X is a list, CAM 3rd ed. p52-S3
%
islist ([]) !.
islist([.|.]).
%
% read.in(X) - convert keyboard input to list X, CAM 3rd ed. plOl-103
%
read.in([WordlWords])
getO(Characterl),
readword(Characterl,W o r d ,Character2),
restsent(Word,Character2,W o r d s ) .
%

restsent(Word,Character,[])
lastword(Word),!.
restsent(Wordl,Characterl,[Word21 Words]) :-
readword(Characterl,Word2,Character2),
restsent(Word2,Character2,Words).
%

readword(Characterl,W o r d ,Character2)
single.character(Characterl), !,
name(Word,[Characterl]),
getO(Character2).
readword(Characterl,Word,Character2)
in.word(Charac t er1 ,Character3),!,
getO(Ch2Lracter4) ,
restword(Character4,Characters,Character2),
neune(Word, [Character31 Characters] ) .
readword(Character1,W o r d ,Character2)
getO(Ch2uracter3) ,
readword(Character3,W o r d ,Character2).
%
restword(Characterl,[Character21 Characters],Character3)
in.word(Characterl,Character2),!,
getO(Character4),
restword(Character4,Characters,Character3).
restword(Character,[],Character).
%

349
single_character(33). % !
single_character(44). % ,
single_character(46). % .
single_character(58). % :
single_character(59). % ;
single_character(63). '/, ?
%

in_word(Character,Character)
Character > 96,
Character < 123. % a-z
in_word(Character,Character)
Character > 47,
Character < 58. % 1-9
in_word(Characterl,Character2)
Characterl > 64,
Characterl < 91,
Character2 is Characterl + 3 2 . % A-Z
in_word(39,39). % ’
in_word(4S,4S). % -
%

lastwordC’.’)•
lastwordC * !’).
lastwordC’? *).
%

% testi - get u s e r ’s input and pass it to test predicates, then repeat

testi
wri t e ( ’End with period and < C R > ’),
read_in(Words),
appendCString,[Period],Words),
nl,
test(String),
nl,
testi.
%

% dbgwrite - a switchable tracing predicate

%
dbgwrite(Term) :-
dbgon,
write(Term),
nl, ! .
dbgwrite(Term).
7.
dbgwrit e (T e r m ,Var) :-
dbgon,
integer(Var),
tab(3 * (Var - 1)),
write(Term),
nl, ! .
dbgwrite(Term,Var) :-
dbgon,
write(Term), write(" "), write(Var),
nl, !.
dbgwrite(Term,Var).
%

350
dbgon. % retnracct this to switch dbg tracing off

% examples.pi A set of test examples

%
/* * * * * * * * * * * * # * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /

% A set of tes t examples - predicate ’tes t ’ must be defined for parser

testi :-
test( [kiim,died] ).
test2 :-
test( Csaandy,s a w ,a ,duck] ) .
tests :-
test ( Ckiim,k n e w ,seindy,k n e w ,l e e ,died] ).
test4 :-
test( ['thie,woman,gave,a,duck,to,h e r ,mam] ) .
tests
test( [!le?e,handed,a,duck,that,died,to,t h e ,woman] ) .
%

/* * * * * * * * * * * * * # * % * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /

%
% Necessaury acddlition for Quintus Prolog compatibility
%
/*************#*:<***********************************************************/

not(X) :-
\+X.

351
%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%%%
%
% FILENAME: shift_reduce.pl
%

% WRITTEN BY: Norman M. Fraser

7.
% DESCRIPTION: A non-incremental shift-reduce dependency
% recognizer,
%

% VERSION HISTORY: 1.0 August 8, 1992

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% LOAD DECLARATIONS
% library(files) is a Quintus Prolog library. To run with other
% prologs replace call to file_exists/l in enumerate/1 with the
% local equivalent.
%

:- ensure_loaded(library(files)).
:- ensure_loaded(lib).
:- ensure_loaded(dg_compile).
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

/***************************************************************************
*

* sr_reduce(+File).
*

* The top level predicate. Recognize a string in non-incremental bottom-up

* shift-reduce fashion, using the Gaifman dependency grammar defined in
* File.
*/
sr_recognize(File) :-
(
file_exists(File),
purge _ grammar,rule s ,
dg_compile(rff_sat,File)
I
writeln([’ERROR! Non-existent grammar file: ’ .File,’.’]),
abort
).
read_in(Input),
sr_recognize_loop(Input,[]).

/***************************************************************************
*
* sr_recognize/0.
*
* An alternative top level predicate. Assumes a Gaifman dependency
* grammar in saturated reversed full form has already been loaded.
*/
sr_recognize :-

352
(
graminar_present(rlf_sat,_)
I
writelnC’ERROR! Saturated reversed full form DG not loa d e d ’),
abort
),
read_in(Input),
sr_recognize_loop(Input,[]).

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* sr_recognize_loop(+String,+Stack).
*

* The main program loop. Clauses 1 and 2 trap the succeed and fail
* cases. Clause 3 attempts to reduce the stack. If all else fails,
* clause 4 shifts the next word from the input onto the stack.
*/
sr_recognize_loop([.],[TreeRoot]) :-
TreeRoot =.. [Root I_],
root(Root),
writelnC’RECOGNIZED’).
sr_recognize_loop([.],[_])
writelnC’NOT RECOGNIZED’).
sr_recognize_loopCInput,Stack) :- % Reduce
sr_reduce(Stack,Result),
sr_recognize_loop(Input,Result).
sr_recognize_loop([WordI R e s t ] ,Stack) % Shift
word_class(Word,Class),
sr_recognize_loop(Rest,[Class IStack]).

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
*

* sr_reduce(+BeforeStack,-AfterStack).
*
* Perform reductions on BeforeStack as licenced by dependency grammar
* rules in saturated reversed full form.
*/
sr_reduce([],_)
!,
fail.
sr_reduce(Stack,[Head IResult]) :-
appendCStr,Result,Stack),
rff_sat_drule(Head,Str).

353
A .4 S a m p le gra m m a r

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

FILENAME: grammarl

WRITTEN BY: Norman M Fraser

DESCRIPTION: A very basic dependency grammar.

Gaifman-type dependency grammars allow rules of
the following three varieties:

(i) *(X)
(ii) X(*)
(iii) X(Y1,Y2,...,Yi,*,Yj...,Yn-l,Yn)

(i) is used to declare permited sentence roots ;

(ii) is used to declare words that may occur
without any dependents ;(iii) is used to indicatte*.
that Yl-Yn may depend on X in the order shown.

To these I have added rules of the form:

(iv) C: {Wl, W2 , . ..,Wn>

This is used to assign Wl-Wn to category C.

VERSION HISTORY: 1.0 August 12, 1992

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%:%%%

% EXAMPLES
%

The cat sat on the mat.

The big cat slept near the fire.
The big fat cat slept on the mat by the fire

The big cat saw the mouse on the mat.

Who was on the mat?
The cat saw the mouse with the waistcoat near the f i r e .
What was near the fire?

The big cat gave the mouse a nice little waistcoat.

The little mouse gave a big waistcoat to the cat by the fire.

7.
% SENTENCE ROOT
%

*(DTV)
*(IV)
♦ (TV)

354
•/ % DEPENDENCY RULES
•/ %
Ü A(*)

EDet(*,N)

DDTV(Det,*,Det,Det)
D DTV(Det,♦,Det,Prep)

IIV(Det,*,Prep)

NN(*)
NN(A,*)
NN(A,A,*)
NN(*,Prep)
NN(A,*,Prep)
NN(A,A,*,Prep)

PPrep(*,Det)

TTV(Det,*,Det)
TTV(Det,*,Det ,Prep)

•//. CATEGORY ASSIGNMENT

Ai: {big, fat, little, nice}

Deet : {a, the}

DTTV: {gave}

IW; {sat, slept}

Nr: {cat, fire, mat, mouse, waistcoat}

Prrep: {by, near, on, to, with}

TV/: {cuag h t , saw}

355
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% FILENAME: grammarl.deg
%

% WRITTEN BY: generated automatically by map_to_dcg/2.

CREATION DATE: January 19, 1993

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%% DOG GENERATED FROM THE DEPENDENCY GRAMMAR: grammarl %%%

:- ensure_loaded(lib).

•/.*/. PARSE PREDICATES

dcg_parse :-
writ e ( ’Please type the sentence to be parsed’),
nl,
read_in(Input),
dcg_parsel(Input),

dcg_parsel(Input) :-
phrase(rule_DTV(Tree),Input,[.]),
w rite( ’PARSE SUCCEEDED : ’),
write(Tree),
nl, nl.
dcg_parsel(Input) :-
phrase(rule.IV(Tree),Input,[.]),
write( ’PARSE SUCCEEDED : ’),
write(Tree),
nl, nl.
dcg_parsel(Input) :-
phrase(rule_TV(Tree),Input,[.]),
write( ’PARSE SUCCEEDED : ’),
write(Tree),
nl, nl.
dcg_parse :-
write(’PARSE FAILED’),
nl, nl.

%% RULES
rule_A(X) — > word.A,
{ X =.. C’A ’,*] }.
rule.Det(X) — > word.Det, rule.N(Bl),
{ X =.. [’D e t ’ ,*,B1] }.
rule.DTV(X) — > rule.Det(Al), word.DTV, rule.Det(Bl), rule.Det(B2),
{ X =.. C ’D T V ’,A1,*,B1,B2] >.
rule.DTV(X) — > rule.Det(Al), word.DTV, rule.Det(Bl), rule.Prep(B2),
{ X =.. C’D T V ’,A1,*,B1,B2] >.
rule.IV(X) — > rule.Det(Al), word.IV, rule.Prep(Bl),
{ X =.. C’l V ’,A1,*,B1] }.
rule.N(X) — > word.N,
{ X =. . C’N ’ ,*] }.

356
rule_N(X) — > m l e _ A ( A l ) , word_N,
•C X =. . [»N’ ,A1,*] }.
rule_N(X) — > rule_A(Al), rule_A(A2), word_N,
{ X =.. [»N’ ,A1,A2,*] }.
rule_N(X) — > word_N, rule_Prep(Bl),
{ X =.. [»N’ >.
rule_N(X) — > rule_A(Al), word_N, rule_Prep(Bl),
■C X =. . [’M ’ ,A1,*,B1] >.
rule_N(X) — > nile_A(Al), rule_A(A2), word_N, rule_Prep(Bl),
{ X =.. ['N',A1,A2,*,B1] }.
rule_Prep(X) — > word_Prep, rule_Det(Bl),
•C X =. . [’Prep’ ,*,B1] }.
rule_TV(X) — > rule.Det(Al), word.TV, rule.Det(Bl),
{ X =.. [’T V ’ ,A1,*,B1] }.
rule.TV(X) — > rule.Det(Al), word.IV, rule.Det(Bl), rule.Prep(B2),
{ X =.. [’T V ’,A1,*,B1,B2] }.

*/.'/, WORD CLASS ASSIGNMENTS

word.A — > [big].
word.A — > [fat].
word.A — > [little].
word.A — > [nice].

word.Det — > [a].

word.Det — > [the].

word.DTV — > [gave].

word.IV — > [sat].

word.IV — > [slept].

word.N — > [cat].

word.N — > [fire].
word.N — > [mat].
word.N — > [mouse].
word.N — > [waistcoat].

word.Prep — > [by].

word.Prep — > [near].
word.Prep — > [on].
word.Prep — > [to].
word.Prep — > [with].

word.IV — > [cuaght].

word.IV — > [saw].

357
B ibliography

Ades, A. and M. Steed man (1982). On the order of words. Linguistics

and Philosophy, 4: 517-58.

Ajdukiewicz, K. (1935). Die syntaktische konnexitat. Studia philosoph-

ica, 1: 1-27. English translation by H. Weber in S. McCall (ed) Polish
Logic, 1920-1930, 207-31. Oxford: Oxford University Press.

Anderson, J. M. (1971). The Grammar o f Case: Towards a localistic

theory. Cambridge Studies in Linguistics 4. Cambridge University Press,
Cambridge.

Anderson, J. M. (1977). On Case Grammar: Prolegomena to a theory of

grammatical relations. Croom Helm, London.

Anderson, J. M. and J. Durand (1986). Dependency phonology. In J. Du

rand, editor. Dependency and Non-linear Phonology, pages 1-54. Croom
Helm, London.

Andry, F. and S. Thornton (1991). A parser for speech lattices using a

UCG grammar. In Proceedings o f the 2nd European Conference on Speech
Communication and Technology, pages 219-22, Genova.

Andry, P., N. M. Fraser, S. McGlashan, S. Thornton, and N. J. Youd

(1992). Making DATR work for speech: lexicon compilation in SUNDIAL.
Computational Linguistics, 18(3): 245-67.

Arnold, D. (1986). Eurotra: a European perspective on MT. Proceedings

o f the IEEE, 74: 979-92.

Arnold, D. and L. des Tombe (1987). Basic theory and methodology in

Eurotra, In S. Nirenburg, editor. Machine Translation, pages 114-35.
Cambridge University Press, Cambridge.

Arnold, D. and L. Sadler (forthcoming). The theoretical basis of MiMo.

Machine Translation.

Atkinson, M., D. Kilby, and I. Roca (1982). Foundations o f General Lin

guistics. Allen and Unwin, London.

358
Atwell, E ., T . O ’Donoghue, and C. Souter (1989). T he COMMUNAL
R A P: a probabilistic approach to n atu ral language parsing. Technical
rep o rt, U niversity of Leeds.

A vgustinova, T. and K. Oliva (1990). Syntactic description of free word

order languages. In CO L IN G 90, pages 311-13, Helsinki.

Bar-H illel, Y. (1953). A quasi-m athem atical notation for syntactic de
scription. Language, 29: 47-58. Also in: Y. Bar-Hillel (ed) Language and
Inform ation. Reading, Mass.: Addison-Wesley. 61-74.

Bar-Hillel, Y., H. Gaifman, and E. Sham ir (1960). On categorial and

phrase stru c tu re gram m ars. Bulletin o f the Research Council o f Israel, 9:
Section F ,l- 1 6 . Also in: Y. Bar-Hillel (ed) Language and Inform ation.
R eading, M ass.: Addison-Wesley.

B aum , R. (1976). Dependenzgrammatik: Tesniere’s Modell der

Sprachbeschreibung in wissenschaftsgeschichtlicher und kritischer Sicht.
Niemeyer, T ubingen.

Blake, B. J . (1989). Review of Stanley Starosta: T he Case for Lexicase.

Language, 65: 614-22.

Bloomfield, L. (1914). A n Introduction to the Study o f Language. Henry

H olt and Co, New York.

Bloomfield, L. (1933). Language. Holt, R inehart, and W inston, New York.

B oum a, G. (1989). Efficient processing of flexible categorial gram m ar.

In Proceedings o f the Fourth Conference o f the European Chapter o f the
A ssociation fo r Com putational Linguistics, pages 19-26, M anchester.

B rietzm ann, A. and U. Ehrlich (1986). T he role of sem antic processing in

an au to m atic speech understanding system. In COLING-86, pages 596-
98, Bonn.

Brough, D. R. (1986). W ord G ram m ar — parsing m ethods. Im perial

College London ms.

Bruce, B. and M. Moser (1987). Case gram m ar. In S. C. Shapiro, ed

itor, Encyclopedia o f Artificial Intelligence, pages 333-339. Jo h n Wiley,
Chichester. Volume 1.

Chomsky, N. (1956). T hree models for th e description of language. IE E E

Transactions on Inform ation Theory, 2: 113-24.

Chomsky, N. (1957). Syntactic Structures. M outon, T he Hague.

359
Chomsky, N. (1962). A transformational approach to syntax. In Proceed
ings o f the Third Texas Conference on problems o f linguistic analysis in
English, pages 124-58, Austin.
Chomsky, N. (1981). Lectures on Government and Binding. Foris, Dor
drecht.
Clocks in, W. and C. Mellish (1987). Programming in Prolog. Springer-
Verlag, Berlin, third edition.
Covington, M. A. (1984). Syntactic Theory in the High Middle Ages:
Modistic models of sentence structure. Cambridge University Press, Cam
bridge.
Covington, M. A. (1986). Grammatical theory in the middle ages. In
T. Bynon and F. Palmer, editors. Studies in the History o f Western Lin
guistics. Cambridge University Press, Cambridge.
Covington, M. A. (1988). Parsing variable word order languages with
unification-based dependency grammar. Technical Report ACMC 01-0022,
Advanced Computational Methods Center, University of Georgia.
Covington, M. A. (1990a). A dependency parser for variable word order
languages. Technical Report AI-1990-01, Artificial Intelligence Program,
University of Georgia.
Covington, M. A. (1990b). Parsing discontinuous constituents in depen
dency grammar. Computational Linguistics, 16: 234-6.
Covington, M. A., D. Nute, and A. Vellino (1987). Prolog Programming
in Depth. Scott, Foresman, Glenview, Illinois.
Curry, H. B. and R. Feys (1958). Combinatory Logic, volume 1. North
Holland, Amsterdam.
Dahl, Osten. (1980). Some arguments for higher nodes in syntax: a reply
to Hudson’s ‘Constituency and Dependency’. Linguistics, 18: 485-8.
Danieli, M., F. Ferrara, R. Gemello, and C. RuUent (1987). Integrating
semantics and flexible syntax by exploiting isomorphism between gram
matical and semantical relations. In Proceedings o f the Third Conference
o f the European Chapter o f the Association for Computational Linguistics,
pages 278-83, Copenhagen.
de Groot, A. W. (1949). Structurele Syntaxis. Service, The Hague.
Devos, M., G. Adriaens, and Y. Willems (1988). The Parallel Expert
Parser (PEP): a thoroughly revised descendant of the Word Expert Parser
(W EP). In COLING-88, pages 142-7.

360
Dowty, D. R. (1982). G ram m atical relations and M ontague gram m ar. In
P. Jacobson and G. Pullum , editors, The Nature o f Syntactic Representa
tion. D. Reidel, Dordrecht.
Dowty, D. R. (1988). Type raising, functional com position and non
constituen t conjunction. In R. Oehrle, E. Bach, and D. W heeler, ed
itors, Categorial G ram m ar and Natural language Structures. D. Reidel,
D ordrecht.
Dowty, D. R., R. E. Wall, and S. Peters (1981). Introduction to Montague
Sem antics. D. Reidel, D ordrecht, Holland.

Earley, J. (1970). An efficient context-free parsing algorithm . C om m uni

cations o f the Association fo r Computing M achinery, 13: 94-102.

Em onds, J. E. (1976). A Transformational Approach to English Syntax.

Academic Press, New York.

Engel, U. (1977). Syntax der deutschen Gegenwartssprache. Schm it,

Berlin.

Engel, U. and H. Schumacher (1976). Kleines Valenzlexicon deutscher

Verben. N arr, Tubingen.

Engelen, B. (1975). Untersuchungen zu Satzbauplan und W ortfeld in der

geschriebenen deutschen Sprache der Gegenwart. H ueber, M unich.
E rm an, L. D., F. Hayes-Roth, V. R. Lesser, and D. R. R eddy (1981).
T he H earsay-II speech-understanding system: integrating knowledge to
resolve uncertainty. In N. J. Nilsson and B. L. W ebber, editors. Readings
in A rtificial Intelligence, pages 349-89. M orgam K aufm ann, Los Altos,
Ca.
Fabricius-H ansen, C. (1977). P ro jek tet “D anisch-D eutsch kontrastive
G ram m atik” . In K ontrastiv gramm atik i Danmark, pages 170-83. Statens
hum anistiske forskningsràd, Copenhagen.
Fillm ore, C. J. (1968). T he case for case. In E. Bach and R. Harm s,
editors, Universals in Linguistic Theory, pages 1-88. Holt, R inehart and
W inston, New York.
Fillm ore, C. J. (1977). T he case for case reopened. In P. Cole and
J. Sadock, editors. Syntax and Sem antics, Volume 8: Grammatical re
lations, pages 59-81. Academic Press, New York.
Fissore, L., E. P. Giachin, P. Laface, G. Micca, R. Pieraccini, and C. Rul-
lent (1988). Experim ental results on large vocabulary speech recognition
and understanding. In IG ASSP-88, New York.

361
Flickinger, D. P. (1987). Lexical rules in the hierarchical lexicon. PhD
thesis, Stanford.

Flickinger, D. P., C. J. Pollard, and T. Wasow (1985). Structure-sharing

in lexical representation. In Proceedings o f the 23rd Annual Meeting o f the
Association for Computational Linguistics, pages 262-7, Chicago.

Fraser, N. M. (1985). A word grammar parser. M aster’s thesis. University

College London.

Fraser, N. M. (1988). A word grammar parser: progress report 2. Techni

cal report. University College London.

Fraser, N. M. (1989a). Parsing and dependency grammar. In R. Carston,

editor, UCL Working Papers in Linguistics 1, pages 296-319. University
College London.

Fraser, N. M. (1989b). Review of Stanley Starosta: The Case for Lexicase.

Computational Linguistics, 15: 114-15.

Fraser, N. M. and G. Gilbert (1991a). Effects of system voice quality

on user utterances in speech dialogue systems. In Proceedings o f the 2nd
European Conference on Speech Communication and Technology, pages
57-60, Genova.

Fraser, N. M. and G. N. Gilbert (1991b). Simulating speech systems.

Computer Speech and Language, 5: 81-99.

Fraser, N. M. and R. A. Hudson (1990). Word Grammar: an inheritance-

based theory of language. In W. Daelemans and G. Gazdar, editors. Pro
ceedings o f the International Workshop on Inheritance in Natural Lan
guage Processing, pages 58-64, Tilburg.

Fraser, N. M. and R. A. Hudson (1992). Inheritance in word grammar.

Computational Linguistics, 18(2): 133-58.

Fraser, N. M. and R. C. Woofhtt (1990). Orienting to rules. In N. Gilbert,

editor. Proceedings of the American Association fo r Artificial Intelligence
Workshop on Ethnomethodology, Complex Systems and Interaction Anal
ysis, pages 69-80, Boston.

Fraser, N. M., G. N. Gilbert, G. S. McGlashan, and R. C. Woofhtt (forth

coming). Analyzing Information Exchange. Routledge, London.

Gaifman, H. (1965). Dependency systems and phrase-structure systems.

Information and Control, 8: 304-7.

362
Garey, H. B. (1954). Review of Lucien Tesniëre: Esquisse d ’une Syntaxe
S tructurale. Language, 30: 512-13.

G azdar, G. (1987). Linguistic applications of default inheritance stru c

tures. In P. W hitelock, H. Somers, P. B ennet, R. L. Johnson, and M. M.
Wood, editors. Linguistic Theory and Computer Applications, pages 37-
67. Academic Press, London.

G azdar, G. (1988). Applicability of indexed gram m ars to n atu ral lan

guages. In U. Reyle and C. Rohrer, editors. N atural Language Parsing
and Linguistic Theories, pages 69-94. D. Reidel, D ordrecht.

G azdar, G. and C. S. Mellish (1989). Natural Language Processing in

PR O LO G . Add is on-Wes ley, W okingham.

G azdar, G., E. Klein, G. K. Pullum , and I. Sag (1985). Generalized Phrase

Structure Grammar. Basil Blackwell, Oxford.

G azdar, G., A. Franz, K. Osborne, and R. Evans (1987). N atural Language

Processing in the 1980s. CSLI, Stanford, CA.

G iachin, E. P. and C. R ullent (1988). R obust parsing of severely corrupted

spoken utterances. In GOLING-88, pages 196-201, B udapest.

G iachin, E. P. and C. R ullent (1989). A parallel parser for spoken n atu ral
language. In IJG AI-89, pages 1537-42, D etroit.

Goldschlager, L. and A. Lister (1982). Com puter Science: a m odem in

troduction. Prentice-H all, Englewood Cliffs, N.J.

Gorayska, B. (1987). Word G ram m ar sem antic analyser. Technical report,

IBM UK Scientific Centre.

G rantham , P. R. (1987). N atural language understanding, a W ord G ram

m ar approach to th e problems. M aster’s thesis, Sheffield C ity Polytechnic.

G rishm an, R. (1986). C om putational Linguistics. C am bridge U niversity

Press, Cam bridge.

Gross, M. (1964). T he equivalence of models of language used in the fields

of m echanical translation and inform ation retrieval. Inform ation Storage
and Retrieval, 2: 43-57.

Haddock, N. J. (1987). Increm ental in terp retatio n and com binatory cate
gorial gram m ar. In Proceedings o f the Tenth International Jo in t Confer
ence on Artifical Intelligence, pages 661-3, Milan.

363
Haigh, R., G. Sampson, and E. Atwell (1988). Project APRIL — a
progress report. In Proceedings o f the 26th Annual Meeting o f the A s
sociation for Computational Linguistics, pages 104-112, Buffalo.

Hajic, J. (1987). RUSLAN — an MT system between closely related lan

guages. In Proceedings o f the Third Conference o f the European Chapter o f
the Association fo r Computational Linguistics, pages 113-17, Copenhagen.

Hajicova, E. (1988). Reasons why we use dependency grammar. In

COLING-88, page 451, Budapest.

Harris, Z. S. (1951). Methods in Structural Linguistics. University of

Chicago Press, Chicago.

Hayes, P. J., A. G. H auptm ann, J. G. Carbonell, and M. Tom ita (1986).

Parsing spoken language: a semantic caseframe approach. In COLING-86,
pages 587-92, Bonn.

Hayes-Roth, F., D. W aterman, and D. Lenat (1983). Building Expert

Systems. Addison-Wesley, Reading, Mass.

Hays, D. G. (1961a). Basic principles and technical variations in sentence-

structure determination. In C. Cherry, editor, Inform ation Theory, pages
367-76. B utterw orths, London.

Hays, D. G. (1961b). Grouping and dependency theories. In H. Edmund-

son, editor. Proceedings o f the National Symposium on Machine Transla
tion, pages 258-66. Prentice-Hall, London.

Hays, D. G. (1961c). Linguistic research at the RAND corporation. In

H. Edmunds on, editor. Proceedings o f the National Symposium on Machine
Translation, pages 13-25. Prentice-Hall, London.

Hays, D. G. (1964). Dependency theory: a formalism and some observa

tions. Language, 40: 511-25.

Hays, D. G. (1965). An annotated bibliography of publications on depen

dency theory. Technical Report RM-4479-PR, The RAND Corporation.

Hays, D. G. (1966a). Connectability calculations, syntactic functions, and

Russian syntax. In D. G. Hays, editor. Readings in Autom atic Language
Processing, pages 107-125. American Elsevier, New York.

Hays, D. G. (1966b). Parsing. In D. G. Hays, editor. Readings in Auto

matic Language Processing, pages 73-82. American Elsevier, New York.

Hays, D. G. (1967). Introduction to Computational Linguistics. Macdon

ald, London.

364
Hays, D. G. and T. W. Ziehe (1961). Studies in m achine translation 10:
R ussian sentence-structure determ ination. Technical R eport RM-2538,
T he R and Corporation, S anta Monica, Ca.

Helbig, G. and W. Schenkel (1969). Wôrterbuch zur Valenz und D istribu

tion deutscher Verben. Bibliographisches In stitu t, Leipzig.

Hellwig, P. (1974). Form al-desam biguierte repraesentation. Vorueber-

legungen zur maschinellen bedeutungsanalyse auf der grundlage der valen-
zidee. U niversity of Heidelberg dissertation.

Hellwig, P. (1985). Program system PLAIN: examples of application.

Technical report. University of Surrey, UK.

Hellwig, P. (1986). Dependency Unification G ram m ar (DUG). In

COLING-86, pages 195-8, Bonn.

Hellwig, P. (1988). C hart parsing according to th e slot and filler approach.

In COLING-88, pages 242-4, B udapest.

Hepple, M. (1987). M ethods for parsing com binatory gram m ars and the
spurious am biguity problem. M aster’s thesis. U niversity of Edinburgh.

Hepple, M. and G. M orrill (1989). Parsing and derivational equivalence.

In Proceedings o f the Fourth Conference o f the European Chapter o f the
Association fo r C om putational Linguistics, pages 10-18, M anchester.

H erbst. T ., D. H eath, and H.-M. Dederding (1980). G rim m ’s Grandchil

dren: Current opics in German linguistics. Longm an, London.

Heringer, H .-J. (1970). Theorie der deutschen Syntax. M ax H ueber Verlag,

Munich.

H ietaranta, P. (1981). O n m ultiple modifiers: a fu rth er rem ark on con

stituency. Linguistics, 19: 513-16.

Hirschberg, L. (1961). Le repâchem ent conditionnel de l’hypothèse de

projectivité. Technical R eport C ETIS R eport No. 35, EU RA TO M , Ispra,
Italy.

Hj elms lev, L. (1935). La catégorie des cas. A cta Jutlandica, 7: 1-184.

Hjelmslev, L. (1937). La catégorie des cas. A cta Jutlandica, 9: 1-78.

H ockett, C. F. (1958). A Course in M odem Linguistics. M acmillan, New

York.

H uddleston, R. D. (1984). A n Introduction to the G ram m ar o f English.

C am bridge U niversity Press, Cam bridge.

365
Huddleston, R. D. (1988). English Grammar: An Outline. Cambridge
University Press, Cambridge.

Hudson, R. A. (1971). English Complex Sentences: An introduction to

Systemic Grammar. North-Holland, Amsterdam.

Hudson, R. A. (1976). Arguments for a Non-Transformational Grammar.

University of Chicago Press, Chicago.

Hudson, R. A. (1980a). Constituency and dependency. Linguistics, 18:

179-98.

Hudson, R. A. (1980b). A second attack on constituency: a reply to Dahl.

Linguistics, 18: 489-504.

Hudson, R. A. (1981a). Pan-lexicalism. Journal o f Literary Semantics, 2:

67-78.

Hudson, R. A. (1981b). A reply to H ietaranta’s argumants for con

stituency. Linguistics, 19: 517-20.

Hudson, R. A. (1983). Word Grammar. In Proceedings o f the X lllth

International Congress o f Linguists, pages 89-101, Tokyo.

Hudson, R. A. (1984). Word Grammar. Basil Blackwell, Oxford.

Hudson, R. A. (1985a). Some basic assumptions about linguistic and non-

linguistic knowledge. Quaderni di semantica, 6: 284-7.

Hudson, R. A. (1985b). Towards a computer testable implementation of

word grammar. University College London ms.

Hudson, R. A. (1986a). A Prolog implementation of Word Grammar.

In Speech, Hearing and Language: Work in Progress 2, pages 133-50.
University College London.

Hudson, R. A. (1986b). Sociolinguistics and the theory of grammar. Lin

guistics, 24: 1053-78.

Hudson, R. A. (1988a). Coordination and grammatical relations. Journal

o f Linguistics, 24: 303-42.

Hudson, R. A. (1988b). Extraction and grammatical relations. Lingua,

76: 177-208.
Hudson, R. A. (1989a). English passives, grammatical relations and de
fault inheritance. Lingua, 79: 17-48.

Hudson, R. A. (1989b). Gapping and grammatical relations. Journal of

Linguistics, 25: 57-94.

366
Hudson, R. A. (1989c). Towards a com puter-testable W ord G ram m ar of
English. In UCL Working Papers in Linguistics, Volume 1, pages 321-39.
U niversity College London.

H udson, R. A. (1990). English Word Grammar. Basil Blackwell, Oxford.

H udson, R. A. (forthcom ing). Do we have heads in our minds? In G. G.

C o rbett, N. M. Fraser, and S. M cGlashan, editors. Heads in Grammatical
Theory. Cam bridge University Press, Cambridge.

Husserl, E. (1900). Logische Untersuchungen. Halle, Niemeyer. T rans

lated by J.N. Findlay as Logical Investigations. R outledge & Kegan Paul,
London, 1970.

Jackendoff, R. S. (1977). X Syntax: A study o f phrase structure. MIT

Press, Cam bridge, Mass. Linguistic Inquiry M onograph 2.

Jappinen, H. and M. Ylilammi (1986). Associative model of morphological

analysis: an em pirical inquiry. Computational Linguistics, 12: 257-72.

Jappinen, H., E. Nelimarkka, A. Lehtola, and M. Ylilammi (1983). Knowl

edge engineering approach to morphological analysis. In Proceedings o f the
F irst Conference o f the European Chapter o f the Association fo r Compu
tational Linguistics, pages 49-51, Pisa.

Jappinen, H., A. Lehtola, and K. Valkonen (1986). Functional structures

for parsing dependency constraints. In COLING-86, pages 461-63, Bonn.

Jappinen, H., A. Lehtola, E. Nelimarkka, and K. Valkonen (1987). Depen

dency analysis o f Finnish sentences. Selected reprints. SITRA Foundation,
Helsinki.

Jappinen, H., T. Honkela, A. Lehtola, and K. Valkonen (1988a). H ierar

chical multilevel processing model for n atu ral language d atab ase interface.
In Proceedings o f the Fourth Conference on A rtificial Intelligence Applica
tions, pages 332-7, San Diego. IEEE.

Jappinen, H., E. Lassila, and A. Lehtola (1988b). Locally governed trees

and dependency parsing. In COLING-88, pages 275-7, B udapest.

Jefferson, G. (1988). Prelim inary notes on a possible m etric which pro

vides for a ‘stan d ard m axim um ’ silence of approxim ately one second in
conversation. In D. Roger and P. Bull, editors. Conversation, pages 166-
96. M ultilingual M atters, Clevedon, PA.

Johnson, R., M. King, and L. des Tom be (1985). EU R O TR A : A m ultilin-

guial system under developm ent. Com putational Linguistics, 11: 155-69.

367
K acnel’son, S. (1948). 0 gram m aticeskoj kategorii. Vestnik Leningrad-
skogo Universiteta, 2: 114-134.

K aplan, R. M. and J. Bresnan (1982). Lexical functional gram m ar: a for

m al system for gram m atical representation. In J. B resnan, editor. The
M ental Representation o f Grammatical Relations, pages 173-281. M IT
Press, Cam bridge, Mass.

Kay, M. (1965). Large files in linguistic com puting. Technical R eport

P-3136, R and C orporation, S anta Monica.

Kay, M. (1985). Parsing in functional unification gram m ar. In D. R.

Dowty, L. K arttu n en , and A. M. Zwicky, editors. Natural Language P ars
ing, pages 251-78. C am bridge U niversity Press, Cam bridge.

Kay, M. (1986). A lgorithm schem ata and d a ta stru ctu res in sy n tactic pro
cessing. In B. J. Grosz, K. Sparck Jones, and N. L. W ebber, editors. Read
ings in Natural Language Processing, pages 35-70. M organ K aufm ann, Los
Altos, CA. (F irst appeared in 1980).

K ettunen, K. (1986). On modelling dependency-oriented parsing. In

F. K arlsson, editor. Papers from the Fifth Scandanavian Conference on
Com putational Linguistics, pages 113-20, Helsinki. U niversity of Helsinki.

K ettunen, K. (1989). Evaluating FU N D PL, a dependency parser for

Finnish. U niversity of Helsinki ms.

Kiefer, F. (1968). M athem atical Linguistics in Eastern Europe. A m erican

Elsevier, New York.

Kirschner, Z. (1984). On a dependency analysis of English for au to m atic

translation. In P. Sgall, editor. Contributions to Functional Syntax, Se
m antics and Language Comprehension, pages 335-58. A cadem ia, Prague.

K odam a, T. (1982). C onstituency gram m ar and dependency gram m ar.

In Studies in Foreign Literature 55, pages 15-46. R itsum eikan University,
Kyoto.

K ornai, A. and G. K. Pullum (1990). T he X -bar theory of phrase stru ctu re.
Language, 66: 24-50.

Kunze, J. (1975). Ahhdngigkeitsgrammatik. Akademie-verlag, Berlin.

Lakoff, G. (1985). Women, Fire and Dangerous Things: What categories

reveal about the mind. Chicago University Press, Chicago.

Lam bek, J. (1958). T he m athem atics of sentence stru ctu re. Am erican
M athem atical Monthly, 65: 154-70.

368
Laurie, S. (1893). Lectures on Language and Linguistic Method in the
School Jam es Thin, Edinburgh.

Lecerf, Y. (1960). Analyse autom atique. In Enseignem ent Préparatoire

aux Techniques de la D ocumentation Autom atique, pages 179-245. EU
RATOM , Brussels.

Lehtola, A. (1986). D PL - a com putational m ethod for describing gram

m ars and modelling parsers. In F. Karlsson, editor. Papers fro m the
Fifth Scandanavian Conference o f Computational Linguistics, pages 151-
60, Helsinki. University of Helsinki.

Lehtola, A., H. Jappinen, and E. Nelimarkka (1985). Language-based

environm ent for n atu ral language parsing. In Proceedings o f the Second
European Conference o f the Association fo r Com putational Linguistics,
pages 98-106, Geneva.

Lesniewski, S. (1929). G rundziige eines neuen systems der grundlagen der

m athem atik. Fundamenta Mathematicae, 14: 1-81.

Levelt, W. J. (1974). Formal Grammars in Linguistics and Psycholinguis

tics, volume II: A pplications in linguistic theory. M outon, T h e Hauge.

Lindsey, F. (1987). R eport on a lexically-driven phrase-building parser.

Technical report. University of Hawaii.

Lyons, J. (1968). Introduction to Theoretical Linguistics. C am bridge U ni

versity Press, Cambridge.

M anaster-R am er, A. and M. B. Kac (1990). T he concept of phrase stru c

ture. Linguistics and Philosophy, 13: 325-62.

M arcus, M. P. (1980). A Theory o f Syntactic Recognition fo r Natural

Language. M IT Press, Cam bridge, Mass.

M arslen-W ilson, W. and L. Tyler (1980). T he tem poral stru ctu re of spo
ken language understanding. Cognition, 8: 1-74.

M artem ’yanov, Y. (1961). T he coding of words for an algorithm for sy n tac

tic analysis. In Doklady ne K onferentsii po Obrabotke Inform atsii, M ashin-
nom u Perevodu i Avtom aticheskom u Chteniyu Teksta. In stitu te of Scien
tific Inform ation, Academy of Sciences, Moscow.

M aruyam a, H. (1990). S tructural disam biguation w ith constraint propa

gation. In Proceedings o f the 28th A nnual Meeting o f the Association fo r
Com putational Linguistics, pages 31-8, P ittsb u rg h .

369
M atsunaga, S. and M. Kohda (1988). Linguistic processing using a depen
dency stru ctu re gram m ar for speech recognition and understanding. In
COLING-88, pages 402-7, B udapest.

M atthew s, P. H. (1981). Syntax. C am bridge U niversity Press, Cam bridge.

Maxwell, D. and K. Schubert (1989). M etataxis in Practice: Dependency

syntax fo r m ultilinguial machine translation. D istributed Language T rans
lation 6. Foris, D ordrecht.

M cG lashan, S. (1992). Dependency unification grammar. PhD thesis.

U niversity of Edinburgh.

M el’cuk, I. A. (1962). Ob algoritm e sintaksicheskogo analiza yazykovykh

tekstov (obshchie printsipy i nekotory itogi). M ashinny Perevod i Priklad-
naya Lingvistika, 7: 45-87.

M el’cuk, I. A. (1979). D ependency syntax. In I. A. M el’cuk, editor. Studies

in Dependency Syntax, pages 3-21. K arom a, A nn A rbor.

M el’cuk, I. A. (1988). Dependency Syntax: Theory and practice. SUNY

Press, Albany.

M el’cuk, I. A. and A. K. Zolkovkij (1970). Towards a functioning

‘M eaning-Text’ model of language. Linguistics, hi: 10-47.

Miller, J. (1985). Syntax and Sem antics. C am bridge U niversity Press,

Cam bridge.

Miller, J. (1990). Review of S. Starosta: T he Case for Lexicase. Journal

o f Linguistics, 26: 235-41.

Nagao, K. (1990). Dependency analyzer: a knowledge-based approach to

stru c tu ra l disam biguation. In COLING-90, pages 282-7, Helsinki.

Nelim arkka, E., H. Jappinen, and A. Lehtola (1984a). Parsing an in

flectional free word order language w ith two-way finite au to m ata. In
T. O ’Shea, editor. Advances in A rtificial Intelligence. (Proceedings o f the
6th European Conference on A rtificial Intelligence), Pisa. N orth Holland.

Nelim arkka, E., H. Jappinen, and A. Lehtola (1984b). Two-way finite

au to m a ta and dependency theory: a parsing m ethod for inflectional free
word order languages. In C O L IN G ’84, Stanford.

Nichols, J. (1978). Double dependency? Proceedings o f the Chicago L in

guistics Society, 14: 326-39.

Nichols, J. (1986). H ead-m arking and dependent-m arking gram m ar. Lan
guage, 62: 56-119.

370
N iederm air, G. T. (1986). Divided and valency-oriented parsing in speech
understanding. In COLING-86, pages 593-5, Bonn.

Nii, H. (1986). Blackboard systems: th e blackboard model of problem

solving and evolution of blackboard architectures. The A l Magazine. Sum
mer: 38-53; August: 82-106.

Nikula, H. (1976). Verhvalenz: Untersuchungen am Beispiel des deutschen

Verbs m it einer kontrastiven Analyse Deutsch-Schwedisch. A cta Uni vers i-
ta tis Upsaliensis, Studia Germ anica Ups aliens ia 15. Almqvist and Wiksell,
U ppsala.

Owens, J. (1988). The Foundations o f Grammar: A n introduction to m e

dieval Arabic grammatical theory. John Benjam ins, A m sterdam .

P ap p , F. (1966). M athem atical Linguistics in the Soviet Union. M outon,

T he Hague.

Pareschi, R. and M. Steed m an (1987). A lazy way to ch art parse w ith

categorial gram m ars. In Proceedings o f the 25th A nnual Conference o f the
Association fo r Computational Linguistics, pages 81-8, Stanford.

Peckham , J. (1991). Speech understanding and dialogue over the tele

phone: an overview of the E S PR IT SUNDIAL project. In Proceedings
o f the D A R P A Workshop on Speech and Language, pages 14-27, Pacific
Grove, CA.

Pereira, F. (1981). E xtraposition gram m ars. Am erican Journal o f Com

putational Linguistics, 7: 243-56.

Pereira, F. C. and D. H. W arren (1981). Definite clause gram m ars for

language analysis — a survey of the formalism and a com parison with
augm ented transition networks. Artificial Intelligence, 13: 231-78.

Pericliev, V. and I. Ilarionov (1986). Testing th e projectivity hypothesis.

In COLING-86, pages 56-8, Bonn.

Petkevic, V. (1988). New dependency based specification of underlying

representations of sentences. In COLING-88, pages 512-14, B udapest.

Phillips, J. D. (1988). Using explicit syntax for disam biguation in speech

and script recognition. U niversity of Tiibingen ms.

Pickering, M. and G. B arry (1990). Sentence processing w ithout em pty

categories. Language and Cognitive Processes, 6: 229-59.

Poesio, M. and C. R ullent (1987). Modified caseframe parsing for speech

understanding systems. In IJC AI-87, pages 622-5, Milan.

371
Pollard, C. and I. A. Sag (1988). Information-based Syntax and Semantics.
CSLI Lecture Notes 13. CSLI, Stanford, CA.

Proudian, D. and C. Pollard (1985). Parsing head-driven phrase structure

grammar. In Proceedings o f the 23rd Annual Meeting o f the Association
for Computational Linguistics, pages 167-71, Chicago.

Pullum, G. K. (1985). Assuming some version of the X-bar theory. Tech

nical Report SRC-85-01, University of California, Syntax Research Center,
Cowell College, University of California, Santa Cruz.

Ritchie, G. (1983). Semantics in parsing. In M. King, editor. Parsing

Natural Language, pages 199-217. Academic Press, London.

Robins, R. (1979). A Short History o f Linguistics. Longman, London,

second edition.

Robinson, J. J. (1967). Methods for obtaining corresponding phrase struc

ture and dependency grammars. In Proceedings o f the Second Interna
tional Conference on Computational Linguistics, Grenoble.

Robinson, J. J. (1969). Case, category, and configuration. Journal of

Linguistics, 6: 57-80.

Robinson, J. J. (1970). Dependency structures and transformational rules.

Language, 46: 259-85.

Ross, J. R. (1967). Constraints on variables in syntax. PhD thesis, Mas-

sachusets Institute of Technology.

Sadler, V. (1989a). The Bilingual Knowledge Bank, a new conceptual

basis for MT. DLT report, BSO/Research, Utrecht.

Sadler, V. (1989b). Translating with the Bilingual Knowledge Bank

(BKB). DLT report, BSO/Research, Utrecht.

Sadler, V. (1989c). Working with Analogical Semantics. Foris, Dordrecht.

Saito, M. (1989). Scrambling as semantically vacuous A ^-movement. In

M. R. Baltin and A. S. Kroch, editors. Alternative Conceptions o f Phrase
Structure, pages 182-200. University of Chicago Press, Chicago.

Schank, R. C. (1972). Conceptual dependency: a theory of natural lan

guage understanding. Cognitive Psychology, 3: 552-631.

Schank, R. C. (1975). Conceptual Information Processing. Fundamental

Studies in Computer Science 3. North-Holland, Amsterdam.

372
R. C. Schank and C. K. Riesbeck, editors (1981). Inside Com puter Under
standing: Five programs plus miniatures. Lawrence E arlbaum Associates,
Hillsdale, NJ.

Schubert, K, (1986). Syntactic tree stru ctu re in DLT. Technical report,

B SO /R esearch.

Schubert, K. (1987). Metataxis: contrastive dependency syntax fo r m a

chine translation. D istributed Language T ranslation 2. Foris, Dordrecht.

Schumacher, H. (1988). Valenzbibliographie. In stitu t fiir deutsche Sprache,

M annheim.

Sgall, P. (1963). T he interm ediate language in m achine tran slatio n and

the theory of gram m ar. In 26th A nnual Meeting o f the Am erican Docu
m entation Institute, pages 41-2, Chicago.

Sgall, P. and J. Panevova (1987). M achine translation, linguistics, and in

terlingua. In Proceedings o f the Third Conference o f the European Chapter
o f the Association fo r C om putational Linguistics, pages 99-108, C open
hagen.

Sgall, P., E. Hajicova, and J. Panevova (1986). The Meaning o f the S en

tence in its Sem antic and Pragmatic Aspects. Academia, Prague.

Shieber, S. M. (1986). A n Introduction to Unification-based Approaches to

Grammar. CSLI Lecture Notes, 4. CSLI, Stanford.

Slutsker, G. (1963). Poluchenie vsekh dopustim ykh variantov sintaksich

eskogo analiza teksta pri pomoshchi mashiny. Problemy Kibernetiki, 10:
215-25.

Small, S. L. (1983). Parsing as co-operative distributional inference. U n

derstanding through m emory interactions. In M. King, editor. Parsing
Natural Language, pages 247-76. Academic Press, London.

Somers, H. L. (1987). Valency and Case in Computational Linguistics.

Edinburgh Inform ation Technology Series 3. Edinburgh U niversity Press,
Edinburgh.

Sommerfeldt, K.-E. and H. Schreiber (1974). Wôrterbuch zur Valenz und

D istribution deutscher Adjektive. Bibliographisches In stitu t, Leipzig.

Sommerfeldt, K .-E. and H. Schreiber (1977). Wôrterbuch zur Valenz und

D istribution deutscher Substantive. Bibliographisches In stitu t, Leipzig.

Sowa, J. F. (1984). Conceptual Structures. Addison-Wesley, Reading, MA.

373
sp arck Jones, K. and M. K ay (1973). Linguistics and Inform ation Science.
Academic Press, London.

Sperber, D. and D. W ilson (1986). Relevance: C om m unication and cogni

tion. Basil Blackwell, Oxford.

S tarosta, S. (1970). Verbs and case subcategorization. H andout, Linguis

tics 651, U niversity of Hawaii.

Starosta, S. (1971a). Lexical derivation in a case gram m ar. University o f

Hawaii Working Papers in Linguistics, 3: 83-101.

S tarosta, S. (1971b). Some lexical redundancy rules for English nouns.

Glossa, 5: 167-201.

S tarosta, S. (1978). T he one per Sent solution. In W. A braham , edi

tor, Valence, Sem antic Case, and Grammatical Relations. John Benjamins
B.V., A m sterdam . Studies in Language Com panion Series 1.

S tarosta, S. (1988). The Gase fo r Lexicase: A n Outline o f Lexicase Gram

m atical Theory. P inter, London.

Starosta, S. (1990). Review of H.L. Somers, Valency and Case in C om pu

tatio n al Linguistics. Machine Translation, 5.

S tarosta, S. and H. N om ura (1986). Lexicase parsing: a lexicon-driven

approach to syntactic analysis. In GOLING-86, pages 127-32, Bonn.

S tarosta, S. (forthcom ing). Lexicase. In E. Brown, editor. The Encyclope

dia o f Language and Linguistics. Pergam on Press and A berdeen University
Press, Oxford and Aberdeen.

Steed m an, M. J. (1985). D ependency and coordination in th e gram m ar of

du tch and english. Language, 61: 523-68.

Steed m an, M. J. (1987). C om binatory gram m ars and parasitic gaps. N a t

ural Language and Linguistic Theory, 5: 403-39.

Steed m an, M. J. (1990). G ram m ar, in terp retatio n , and processing from
the lexicon. In W. M arslen-W ilson, editor. Lexical Representation and
Process. M IT Press, Cam bridge, MA.

Tarvainen, K. (1977). Dependenssikielioppi. G audeam us, Helsinki.

Taylor, J. R. (1989). Linguistic Categorization: A n essay in cognitive

linguistics. Oxford U niversity Press, Oxford.

Tesniere, L. (1953). Esquisse d ’une Syntaxe Structural. Librairie Klinck-

sieck, Paris.

374
Tesnière, L. (1959). Élém ents de Syntaxe Structurale, Librairie Klinck-
sieck, Paris.

R. H. Thom ason, editor (1974). Formal Philosophy: Selected papers o f

Richard Montague. Yale University Press, New Haven.

T urner, D. A. (1979). A new im plem entation technique for applicative

languages. Software Practice and Experience, 9: 31-49.

T urner, K. (1990). Review of Stanley Starosta: T he Case for Lexicase.

Linguistics, 28: 635-36.

Valkonen, K., H. Jappinen, and A. Lehtola (1987a). B lackboard-based

dependency parsing. In IJC AI-87, pages 700-702, M ilan.

Valkonen, K., H. Jappinen, A. Lehtola, and M. Ylilammi (1987b). D eclara

tive model for dependency parsing - a view into blackboard methodology.
In Proceedings o f the Third European Conference o f the Association fo r
C om putational Linguistics, pages 218-225, Copenhagen.

van der K orst, B. (1988). SE PA R SER II: An a ttrib u te gram m ar for

technical English. DLT report, BS0 / Research, U niversity of A m sterdam .

van Zuijlen, J. M. (1986a). Com parison of an ATN and a DCG perform

ing the first stage of the IL word analysis. DLT report, B SO /R esearch,
U trecht.

van Zuijlen, J. M. (1986b). A DCG for th e first stages of th e IL-word

gram m ar. DLT report, B SO /R esearch, U trecht.

van Zuijlen, J. M. (1988). A technique for the com pact representation of

m ultiple analyses in dependency gram m ar. DLT report, B SO /R esearch,
U trecht.

van Zuijlen, J. M. (1989a). T he application of sim ulated annealing to

dependency gram m ar parsing. DLT report, B SO /R esearch, U trecht.

van Zuijlen, J. M. (1989b). Probabilistic m ethods in dependency parsing.

In Proceedings o f the International Workshop on Parsing Technologies,
pages 142-51, P ittsburgh. Carnegie Mellon University.

van Zuijlen, J. M. (1990). Notes on a probabilistic parsing experim ent.

DLT report, B SO /Language Systems, U trecht.

V ater, H. (1975). Toward a generative dependency gram m ar. Lingua, 36:

121-45.
W hitehead, A. and B. Russell (1925). Principia M athematica. Cam bridge
U niversity Press, Cam bridge.

375
Wilks, Y. (1975). An intelligent analyser and understander of English.
Communications o f the Association fo r Computing Machinery, 18: 264-
74.

Winograd, T. (1983). Language as a Cognitive Process, volume 1: Syntax.

Addison-Wesley, Reading, MA.

W irth, N. (1975). Algorithms + Data Structures = Programs. Prentice

Hall, Englewood Cliffs, NJ.

W itkam, A. (1983). Distributed language translation. Feasibility study of

a multilingual facility for videotex information networks. Technical report,
BSO/Research.

W itkam, A. (1989). Distribited Language Translation, another M T sys

tem. In I. D. Kelly, editor. Progress in Machine Translation: Natural
Language and Personal Computers, pages 133-42. Sigma Press, Wilmslow.

Woods, W. A. (1970). Transition network grammars for natural language

analysis. Communications o f the Association fo r Computing Machinery,
13: 591-6.

Woods, W. A. (1982). Optim al search strategies for speech understanding

control. Artificial Intelligence, 18: 295-326.

Woods, W. A. (1987). Augmented transition network grammar. In S. C.

Shapiro, editor. Encyclopaedia o f Artificial Intelligence, pages 323-33. Wi
ley, New York.

376

Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
47 pages
Dependency Parsing
100% (11)
Dependency Parsing
127 pages
Dependency Parsing Lecture
No ratings yet
Dependency Parsing Lecture
45 pages
NLP Unit 2
No ratings yet
NLP Unit 2
13 pages
NLP Module 3
No ratings yet
NLP Module 3
41 pages
4.chapter5 - Syntactic and Semantic Representations
No ratings yet
4.chapter5 - Syntactic and Semantic Representations
47 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
45 pages
NLP Unit 2
No ratings yet
NLP Unit 2
20 pages
Bhat Thesis
No ratings yet
Bhat Thesis
158 pages
Linguists' Guide to Dependency Parsing
No ratings yet
Linguists' Guide to Dependency Parsing
20 pages
Urdu Dependency Parser Guide
No ratings yet
Urdu Dependency Parser Guide
7 pages
Unit 2 New One
No ratings yet
Unit 2 New One
12 pages
6752 NLP
No ratings yet
6752 NLP
14 pages
Dependency Grammar
No ratings yet
Dependency Grammar
10 pages
NLP - Shortnotes Unit 3
No ratings yet
NLP - Shortnotes Unit 3
16 pages
Unit 2
No ratings yet
Unit 2
94 pages
Dependency Parsing
No ratings yet
Dependency Parsing
51 pages
Unit 2
No ratings yet
Unit 2
140 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
23 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
45 pages
Jaf 2015 PHD Thesis
No ratings yet
Jaf 2015 PHD Thesis
191 pages
Dependency Parsing
No ratings yet
Dependency Parsing
32 pages
Dependency Parsing and Algorithms With Images
No ratings yet
Dependency Parsing and Algorithms With Images
13 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
71 pages
Dependency Parsing
No ratings yet
Dependency Parsing
27 pages
What Is Parsing
No ratings yet
What Is Parsing
47 pages
Unit 5
No ratings yet
Unit 5
10 pages
Cs224n 2025 Lecture04 Dep Parsing
No ratings yet
Cs224n 2025 Lecture04 Dep Parsing
53 pages
IJRPR20061
No ratings yet
IJRPR20061
15 pages
Consituency Vs Dependent Grammar
No ratings yet
Consituency Vs Dependent Grammar
5 pages
Lecture 6
No ratings yet
Lecture 6
43 pages
Intro to Dependency Grammars
No ratings yet
Intro to Dependency Grammars
21 pages
Unit 2 - Lecture 1
No ratings yet
Unit 2 - Lecture 1
19 pages
Pert24 - NLP For Communication
No ratings yet
Pert24 - NLP For Communication
30 pages
Mod - 3
No ratings yet
Mod - 3
51 pages
NLP Sem Unit 2
No ratings yet
NLP Sem Unit 2
12 pages
NLP Ans
No ratings yet
NLP Ans
9 pages
Makalah Sociolinguistics
No ratings yet
Makalah Sociolinguistics
8 pages
Unit 3
No ratings yet
Unit 3
10 pages
Unit 2 Syntactic Processing
No ratings yet
Unit 2 Syntactic Processing
16 pages
8 Parsing
No ratings yet
8 Parsing
40 pages
Unit 2 Syntactic Processing
No ratings yet
Unit 2 Syntactic Processing
17 pages
Dependency Grammar: NASSLLI Short Course On Dependency Parsing Summer 2010
No ratings yet
Dependency Grammar: NASSLLI Short Course On Dependency Parsing Summer 2010
51 pages
Unit Iii
No ratings yet
Unit Iii
17 pages
NLP Unit-2
No ratings yet
NLP Unit-2
18 pages
Efficient Third-Order Dependency Parsers
No ratings yet
Efficient Third-Order Dependency Parsers
11 pages
NLP One Mark Questions With Answers
No ratings yet
NLP One Mark Questions With Answers
8 pages
Recent Advances in Dependency Parsing (Slides 2014) - Ryan McDonald Joakim Nivre PDF
No ratings yet
Recent Advances in Dependency Parsing (Slides 2014) - Ryan McDonald Joakim Nivre PDF
379 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
50 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
118 pages
Unit II
No ratings yet
Unit II
61 pages
NLP 3
No ratings yet
NLP 3
4 pages
Semitic
No ratings yet
Semitic
62 pages
18 Jurafsky Jurafsky
No ratings yet
18 Jurafsky Jurafsky
26 pages
14 Ai Cse551 NLP 2 PDF
No ratings yet
14 Ai Cse551 NLP 2 PDF
39 pages
Modelo de Árbol de Probabilidades
No ratings yet
Modelo de Árbol de Probabilidades
7 pages
CD - CH3 - Syntax Analysis (Parsing)
No ratings yet
CD - CH3 - Syntax Analysis (Parsing)
109 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
42 pages
Mere Christianity by C.S. Lewis
100% (12)
Mere Christianity by C.S. Lewis
108 pages
Hebrew Bible
100% (25)
Hebrew Bible
229 pages
The Oxford Bible Commentary
98% (48)
The Oxford Bible Commentary
1,413 pages
The Apocrypha Including Books From The Ethiopic Bible
100% (67)
The Apocrypha Including Books From The Ethiopic Bible
740 pages
Systematic Theology
96% (108)
Systematic Theology
1,717 pages
BlackLabourWhiteWealth Part1
98% (59)
BlackLabourWhiteWealth Part1
50 pages
Visual Bible PDF
94% (69)
Visual Bible PDF
548 pages
God of The Oppressed - James Cone
97% (33)
God of The Oppressed - James Cone
271 pages
The Bible's Difficult Scriptures EXPLAINED!
93% (70)
The Bible's Difficult Scriptures EXPLAINED!
131 pages
The Definitive Guide To Project 2025Fnl Project 2025
91% (11)
The Definitive Guide To Project 2025Fnl Project 2025
67 pages
Viktor Frankl - Man S Search For Meaning
94% (35)
Viktor Frankl - Man S Search For Meaning
99 pages
A General Introduction To The BIBLE
90% (49)
A General Introduction To The BIBLE
501 pages
The Book of Enoch
100% (84)
The Book of Enoch
265 pages
Orthodox Study Bible PDF
97% (34)
Orthodox Study Bible PDF
3,360 pages
Chronological Study Bible
94% (84)
Chronological Study Bible
144 pages
A New Testament Biblical Theology The Unfolding of The Old Testament in The New by G.K. Beale
89% (19)
A New Testament Biblical Theology The Unfolding of The Old Testament in The New by G.K. Beale
1,072 pages
Bible Workbook and Guide - Study and Understand Book by Book (The Bible Study Book)
100% (22)
Bible Workbook and Guide - Study and Understand Book by Book (The Bible Study Book)
576 pages
Dictionary of Deities and Demons in The Bible
100% (13)
Dictionary of Deities and Demons in The Bible
1,001 pages
Counseling Across Cultures
92% (36)
Counseling Across Cultures
646 pages
NLT Study Bible PDF
83% (46)
NLT Study Bible PDF
376 pages
Zondervan Illustrated Bible Dictionary, Excerpt
90% (115)
Zondervan Illustrated Bible Dictionary, Excerpt
160 pages
The Lost Gospels of Jesus
100% (24)
The Lost Gospels of Jesus
655 pages
Visual Theology Guide To The Bible Sample
95% (102)
Visual Theology Guide To The Bible Sample
31 pages
Foundations of Mental Health Counseling Fourth Edition PDF
100% (21)
Foundations of Mental Health Counseling Fourth Edition PDF
508 pages
Charlesworth, JH (Ed) - Old Testament Pseudepigrapha, Vol. 1, Apocalyptic Literature & Testaments (Doubleday, 1983)
96% (67)
Charlesworth, JH (Ed) - Old Testament Pseudepigrapha, Vol. 1, Apocalyptic Literature & Testaments (Doubleday, 1983)
1,048 pages
Encyclopedia of Counseling PDF
94% (17)
Encyclopedia of Counseling PDF
1,940 pages
Life of The Beloved
100% (6)
Life of The Beloved
82 pages
(the IVP Bible Dictionary Series) Hawthorne, Gerald F._ Martin, Ralph P._ Reid, Daniel G. - Dictionary of Paul and His Letters a Compendium of Contemporary Biblical Scholarship-InterVarsity Press (201
94% (16)
(the IVP Bible Dictionary Series) Hawthorne, Gerald F._ Martin, Ralph P._ Reid, Daniel G. - Dictionary of Paul and His Letters a Compendium of Contemporary Biblical Scholarship-InterVarsity Press (201
793 pages
Biblical Preaching - The Development and Delivery of Expository Messages (PDFDrive)
100% (10)
Biblical Preaching - The Development and Delivery of Expository Messages (PDFDrive)
232 pages
Practical Research 1 PDF
80% (131)
Practical Research 1 PDF
174 pages
A Typological Grammar of Panare (Payne & Payne 2013)
No ratings yet
A Typological Grammar of Panare (Payne & Payne 2013)
485 pages
Understanding Clauses: Noun, Adjective, and Adverb Clauses Explained
0% (1)
Understanding Clauses: Noun, Adjective, and Adverb Clauses Explained
3 pages
Participle
No ratings yet
Participle
2 pages
English Narrative Texts Guide
No ratings yet
English Narrative Texts Guide
5 pages
Exercises. For The Exam
No ratings yet
Exercises. For The Exam
3 pages
Makalah Bahasa Inggris
No ratings yet
Makalah Bahasa Inggris
13 pages
Reported Speech - EFL and Culture
No ratings yet
Reported Speech - EFL and Culture
9 pages
Skema BI Tingkatan 3 UASA 2022
No ratings yet
Skema BI Tingkatan 3 UASA 2022
4 pages
Frontmatter
No ratings yet
Frontmatter
16 pages
Conditional Sentences
No ratings yet
Conditional Sentences
10 pages
Latihan Soal Quantifiers Bersih
No ratings yet
Latihan Soal Quantifiers Bersih
6 pages
Grammar Crossword: Past Simple Irregular Verbs
75% (4)
Grammar Crossword: Past Simple Irregular Verbs
2 pages
Crack 1-4 English
No ratings yet
Crack 1-4 English
57 pages
Ing Lengua Inglesa II 11
No ratings yet
Ing Lengua Inglesa II 11
6 pages
Unit-02-1 Our Findings Show
100% (1)
Unit-02-1 Our Findings Show
29 pages
Unit 2 (Version 1.2)
No ratings yet
Unit 2 (Version 1.2)
52 pages
English Stage 6 P1-QS PDF
100% (3)
English Stage 6 P1-QS PDF
8 pages
Grammar and Syntax
No ratings yet
Grammar and Syntax
25 pages
Introduction To Grammatics-1
No ratings yet
Introduction To Grammatics-1
37 pages
Understanding Simple Sentences
No ratings yet
Understanding Simple Sentences
4 pages
Grade 3 Learning Competencies Guide
No ratings yet
Grade 3 Learning Competencies Guide
5 pages
My Next Grammar - Student Book 2 (답지)
No ratings yet
My Next Grammar - Student Book 2 (답지)
36 pages
9th Grade English Curriculum Plan
No ratings yet
9th Grade English Curriculum Plan
10 pages
Sanskrit English
100% (4)
Sanskrit English
408 pages
Possessive Nouns Guide
No ratings yet
Possessive Nouns Guide
5 pages
ESL Worksheet For B1 Level - Phrasal Verbs For Small Talk - Tutor
No ratings yet
ESL Worksheet For B1 Level - Phrasal Verbs For Small Talk - Tutor
4 pages
Verbs in Bengali Rishita Deb
No ratings yet
Verbs in Bengali Rishita Deb
30 pages
Outline Template: Argumentative Essay (Type 1) Prompt
No ratings yet
Outline Template: Argumentative Essay (Type 1) Prompt
6 pages
Fast Track PDF
50% (4)
Fast Track PDF
114 pages

Dependency Parsing

Uploaded by

Dependency Parsing

Uploaded by

D ependency Parsing

N o rm a n M acA sk ill Fraser

Thesis subm itted for the degree of PhD

All rights reserved

INFORMATION TO ALL USERS

Published by ProQuest LLC(2016). Copyright of the Dissertation is held by the Author.

All rights reserved.

Syntactic structure can be expressed in terms of either constituency or de­

1.2 C hapter o u tlin e ........................................................................................ 22

2.2 Gaifman g ra m m a rs .................................................................................. 24

2.2.2 A recognizer for Gaifman g ra m m a rs........................................ 30

2.2.3 Representing dependency stru c tu re s.........................................32

2.2.4 The generative capacity of Gaifman g ra m m a rs ..................... 36

2.3 Beyond Gaifman grammars ................................................................. 41

2.4 Origins in Hnguistic th e o r y ................................... 43

2.5 Related gram m atical form alism s...........................................................51

2.5.1 Case g ra m m a r................................................................................ 52

2.5.2 Categorial grammar ................................................................... 53

2.5.3 Head-driven phrase structure g r a m m a r .................................. 57

3.1.2 Speech understanding s y s te m s ..................................................... 63

3.1.3 O ther a p p lic a tio n s ......................................................................... 64

3.1.4 Im plem entations of t h e o r i e s ........................................................ 64

3.1.5 E xploratory s y s t e m s ...................................................................... 65

3.2 PARS: Parsing A lgorithm R epresentation S c h e m e .............................69

3.2.1 D ata s t r u c t u r e s ................................................................................ 69

4.2 T h e bottom -up a l g o r i t h m ......................................................................... 78

4.2.1 Basic p r in c ip le s ................................................................................ 78

4.2.2 T he parsing a l g o r i t h m ...................................................................79

4.3 T h e top-dow n a l g o r i t h m .............................................................................85

4.3.1 T he parsing a l g o r i t h m ...................................................................85

5.2 D ependency R epresentation Language ...................................................91

5.2.1 T h e form of DRL e x p r e s s io n s ..................................................... 91

5.2.2 W ord order c o n s tr a in ts ................................................................... 94

5.2.3 T he base l e x i c o n ............................................................................. 96

5.2.4 T h e valency lexicon . ................................................................... 96

5.3 T h e parsing a l g o r i t h m ................................................................................ 98

5.4 T h e well-formed substring t a b l e ............................................................. 102

5.5 S u m m a r y ...................................................................................................... 106

6.2 Evolution of the p a r s e r ........................................................................... 109

6.2.1 T he earliest version: two way finite a u t o m a t a ..................109

6.2.2 A gram m ar representation language: D P L ............................. 113

6.2.3 C onstraint based gram m ar: F U N D P L ..................................... 115

6.3 T he p a r s e r ................................................................................................... 120

6.3.1 T he g r a m m a r ................................................................................. 120

6.3.2 B lackboard-based c o n tro l............................................................. 121

6.3.3 T he parsing a l g o r i t h m ................................................................ 123

6.3.4 A m b ig u ity ........................................................................................128

6.3.5 Long distance d e p e n d e n c ie s ...................................................... 128

6.3.6 Statistics and p e rfo rm a n c e ..........................................................129

6.3.7 O pen q u e s ti o n s .............................................................................. 130

7 T h e DLT M T sy stem 134

7.2 D ependency gram m ar in D L T ................................................................ 137

7.3 An ATN for parsing d e p e n d e n c ie s ........................................................140

7.4 A probabilistic dependency p a r s e r ........................................................143

8 L exicase parsers 151

8.2 Lexicase t h e o r y ..........................................................................................152

8.2.1 D ependency in L e x i c a s e ............................................................. 153

8.2.2 Lexical entries in L e x ic a s e .......................................................... 159

8.3 Lexicase p a rs in g ..........................................................................................164

8.3.1 S tarosta and N om ura’s p a r s e r ................................................... 164

8.4 Sum m ary .................................................................................................... 172

9 W ord G ram m ar parsers 174

9.1 O verview ........................................................................................................ 174

9.2 W ord G ram m ar t h e o r y ............................................................................. 175

9.2.1 Facts about words ...................................................................... 175

9.2.2 Generalizations about w o r d s ..................................................... 181

9.2.3 A single-predicate s y s t e m .........................................................186

9.2.4 Syntax in W G ................................................................................ 187

9.2.5 Semantics in Word G r a m m a r .................................................. 191

9.3 W ord G ram m ar parsing ..........................................................................193

9.3.1 F raser’s p a r s e r................................................................................ 194

9.3.2 H udson’s p a r s e r .............................................................................208

9.4 Sum m ary .................................................................................................... 215

10.3 Unification-based dependency g r a m m a r ................................................ 218

10.4 C ovington’s p a r s e r ....................................................................................220

11 T h e CSELT la ttice parser 230

Syntactic structure can be expressed in terms of either constituency or de