0% found this document useful (0 votes)
4 views

2020.acl-main.577

This paper presents a novel approach to Named Entity Recognition (NER) by reformulating it as a dependency parsing task, allowing for the identification of both flat and nested entities. Utilizing a biaffine model on top of a multi-layer BiLSTM, the system scores potential entity spans and ranks them according to their scores, achieving state-of-the-art results on multiple NER benchmarks. The model demonstrates significant accuracy improvements, particularly in handling nested entities, and the authors provide the code as open source.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

2020.acl-main.577

This paper presents a novel approach to Named Entity Recognition (NER) by reformulating it as a dependency parsing task, allowing for the identification of both flat and nested entities. Utilizing a biaffine model on top of a multi-layer BiLSTM, the system scores potential entity spans and ranks them according to their scores, achieving state-of-the-art results on multiple NER benchmarks. The model demonstrates significant accuracy improvements, particularly in handling nested entities, and the authors provide the code as open source.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Named Entity Recognition as Dependency Parsing

Juntao Yu Bernd Bohnet Massimo Poesio


Queen Mary University Google Research Queen Mary University
London, UK Netherlands London, UK
[email protected] [email protected] [email protected]

Abstract (2017). For dependency parsing, the system pre-


Named Entity Recognition (NER) is a funda- dicts a head for each token and assigns a relation
mental task in Natural Language Processing, to the head-child pairs. In this work, we reformu-
concerned with identifying spans of text ex- late NER as the task of identifying start and end
pressing references to entities. NER research indices, as well as assigning a category to the span
is often focused on flat entities only (flat NER), defined by these pairs. Our system uses a biaffine
ignoring the fact that entity references can be model on top of a multi-layer BiLSTM to assign
nested, as in [Bank of [China]] (Finkel and
scores to all possible spans in a sentence. After
Manning, 2009). In this paper, we use ideas
from graph-based dependency parsing to pro- that, instead of building dependency trees, we rank
vide our model a global view on the input via the candidate spans by their scores and return the
a biaffine model (Dozat and Manning, 2017). top-ranked spans that comply with constraints for
The biaffine model scores pairs of start and end flat or nested NER. We evaluated our system on
tokens in a sentence which we use to explore three nested NER benchmarks (ACE 2004, ACE
all spans, so that the model is able to predict 2005, GENIA) and five flat NER corpora (CONLL
named entities accurately. We show that the 2002 (Dutch, Spanish) CONLL 2003 (English, Ger-
model works well for both nested and flat NER
through evaluation on 8 corpora and achieving
man), and ONTONOTES). The results show that our
SoTA performance on all of them, with accu- system achieved SoTA results on all three nested
racy gains of up to 2.2 percentage points. NER corpora, and on all five flat NER corpora with
substantial gains of up to 2.2% absolute percentage
1 Introduction points compared to the previous SoTA. We provide
‘Nested Entities’ are named entities containing ref- the code as open source1 .
erences to other named entities as in [Bank of
[China]], in which both [China] and [Bank of 2 Related Work
China] are named entities. Such nested entities
Flat Named Entity Recognition. The majority of
are frequent in data sets like ACE 2004, ACE 2005
flat NER models are based on a sequence labelling
and GENIA (e.g., 17% of NEs in GENIA are nested
approach. Collobert et al. (2011) introduced a neu-
(Finkel and Manning, 2009), altough the more
ral NER model that uses CNNs to encode tokens
widely used set such as CONLL 2002, 2003 and
combined with a CRF layer for the classification.
ONTONOTES only contain so called flat named en-
Many other neural systems followed this approach
tities and nested entities are ignored.
but used instead LSTMs to encode the input and
The current SoTA models all adopt a neural net-
a CRF for the prediction (Lample et al., 2016; Ma
work architecture without hand-crafted features,
and Hovy, 2016; Chiu and Nichols, 2016). These
which makes them more adaptable to different
latter models were later extended to use context-
tasks, languages and domains (Lample et al., 2016;
dependent embeddings such as ELMo (Peters et al.,
Chiu and Nichols, 2016; Peters et al., 2018; De-
2018). Clark et al. (2018) quite successfully used
vlin et al., 2019; Ju et al., 2018; Sohrab and Miwa,
cross-view training (CVT) paired with multi-task
2018; Straková et al., 2019). In this paper, we in-
learning. This method yields impressive gains for
troduce a method to handle both types of NEs in
one system by adopting ideas from the biaffine de- 1
The code is available at https://github.com/
pendency parsing model of Dozat and Manning juntaoy/biaffine-ner

6470
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6470–6476
July 5 - 10, 2020. c 2020 Association for Computational Linguistics
enumerate exhaustively all possible spans up to a
Biaffine Classifier
defined length by concatenating the LSTMs out-
puts for the start and end position and then using
this to calculate a score for each span. Apart from
the different network and word embedding config-
FFNN_Start FFNN_End urations, the main difference between their model
and ours is there for the use of biaffine model. Due
to the biaffine model, we get a global view of the
sentence while Sohrab and Miwa (2018) concate-
nates the output of the LSTMs of possible start
BiLSTM and end positions up to a distinct length. Dozat
and Manning (2017) demonstrated that the biaffine
mapping performs significantly better than just the
concatenation of pairs of LSTM outputs.
BERT, fastText & Char Embeddings
3 Methods
Figure 1: The network architectures of our system.
Our model is inspired by the dependency parsing
model of Dozat and Manning (2017). We use both
a number of NLP applications including NER. De- word embeddings and character embeddings as in-
vlin et al. (2019) invented BERT, a bidirectional put, and feed the output into a BiLSTM and finally
transformer architecture for the training of lan- to a biaffine classifier.
guage models. BERT and its siblings provided bet- Figure 1 shows an overview of the architecture.
ter language models that turned again into higher To encode words, we use both BERTLarge and fast-
scores for NER. Text embeddings (Bojanowski et al., 2016). For
Lample et al. (2016) cast NER as transition- BERT we follow the recipe of (Kantor and Glober-
based dependency parsing using a Stack-LSTM. son, 2019) to obtain the context dependent embed-
They compare with a LSTM-CRF model which dings for a target token with 64 surrounding tokens
turns out to be a very strong baseline. Their each side. For the character-based word embed-
transition-based system uses two transitions (shift dings, we use a CNN to encode the characters of
and reduce) to mark the named entities and handles the tokens. The concatenation of the word and
flat NER while our system has been designed to character-based word embeddings is feed into a
handle both nested and flat entities. BiLSTM to obtain the word representations (x).
Nested Named Entity Recognition. Early After obtaining the word representations from
work on nested NER, motivated particularly by the the BiLSTM, we apply two separate FFNNs to
GENIA corpus, includes (Shen et al., 2003; Beat- create different representations (hs /he ) for the
rice Alex and Grover, 2007; Finkel and Manning, start/end of the spans. Using different representa-
2009). Finkel and Manning (2009) also proposed tions for the start/end of the spans allow the system
a constituency parsing-based approach. In the last to learn to identify the start/end of the spans sep-
years, we saw an increasing number of neural mod- arately. This improves accuracy compared to the
els targeting nested NER as well. Ju et al. (2018) model which directly uses the outputs of the LSTM
suggested a LSTM-CRF model to predict nested since the context of the start and end of the entity
named entities. Their algorithm iteratively contin- are different. Finally, we employ a biaffine model
ues until no further entities are predicted. Lin et al. over the sentence to create a l × l × c scoring tensor
(2019) tackle the problem in two steps: they first (rm ), where l is the length of the sentence and c is
detect the entity head, and then they infer the entity the number of NER categories + 1(for non-entity).
boundaries as well as the category of the named We compute the score for a span i by:
entity. Straková et al. (2019) tag the nested named
entity by a sequence-to-sequence model exploring hs (i) = FFNNs (xsi )
combinations of context-based embeddings such he (i) = FFNNe (xei )
as ELMo, BERT, and Flair. Zheng et al. (2019)
rm (i) = hs (i)> Um he (i)
use a boundary aware network to solve the nested
NER. Similar to our work, Sohrab and Miwa (2018) + Wm (hs (i) ⊕ he (i)) + bm

6471
where si and ei are the start and end indices of the Parameter Value
span i, Um is a d × c × d tensor, Wm is a 2d × c
BiLSTM size 200
matrix and bm is the bias.
BiLSTM layer 3
The tensor rm provides scores for all possible
BiLSTM dropout 0.4
spans that could constitute a named entity under the
FFNN size 150
constrain that si ≤ ei (the start of entity is before
FFNN dropout 0.2
its end). We assign each span a NER category y 0 :
BERT size 1024
y 0 (i) = arg max rm (i) BERT layer last 4
fastText embedding size 300
We then rank all the spans that have a category Char CNN size 50
other than ”non-entity” by their category scores Char CNN filter widths [3,4,5]
(rm (iy0 )) in descending order and apply follow- Char embedding size 8
ing post-processing constraints: For nested NER, Embeddings dropout 0.5
a entity is selected as long as it does not clash the Optimiser Adam
boundaries of higher ranked entities. We denote a learning rate 1e-3
entity i to clash boundaries with another entity j if
si < sj ≤ ei < ej or sj < si ≤ ej < ei , e.g. in Table 1: Major hyperparameters for our models.
the Bank of China, the entity the Bank of clashes
boundary with the entity Bank of China, hence only
the span with the higher category score will be se- fair comparson we also used the same documents
lected. For flat NER, we apply one more constraint, as in Lu and Roth (2015) for each split.
in which any entity containing or is inside an entity For GENIA, we use the GENIA v3.0.2 corpus. We
ranked before it will not be selected. The learning preprocess the dataset following the same settings
objective of our named entity recognizer is to as- of Finkel and Manning (2009) and Lu and Roth
sign a correct category (including the non-entity) (2015) and use 90%/10% train/test split. For this
to each valid span. Hence it is a multi-class classi- evaluation, since we do not have a development set,
fication problem and we optimise our models with we train our system on 50 epochs and evaluate on
softmax cross-entropy: the final model.
For CONLL 2002 and CONLL 2003, we evaluate
exp(rm (ic )) on all four languages (English, German, Dutch and
pm (ic ) = PC
ĉ=1 exp(rm (iĉ ))
Spanish). We follow Lample et al. (2016) to train
our system on the concatenation of the train and
N X
C
X development set.
loss = − yic log pm (ic )
For ONTONOTES, we evaluate on the English
i=1 c=1
corpus and follow Strubell et al. (2017) to use the
same train, development and test split as used in
4 Experiments CoNLL 2012 shared task for coreference resolution
Data Set. We evaluate our system on both nested (Pradhan et al., 2012).
and flat NER, for the nested NER task, we use the Evaluation Metric. We report recall, precision
ACE 2004 2 , ACE 2005 3 , and GENIA (Kim et al., and F1 scores for all evaluations. The named en-
2003) corpora; for flat NER, we test our system on tity is considered correct when both boundary and
the CONLL 2002 (Tjong Kim Sang, 2002), CONLL category are predicted correctly.
2003 (Tjong Kim Sang and De Meulder, 2003) Hyperparameters We use a unified setting for
and ONTONOTES4 corpora. all of the experiments, Table 1 shows hyperparam-
For ACE 2004, ACE 2005 we follow the same eters for our system.
settings of Lu and Roth (2015) and Muis and Lu 5
(2017) to split the data into 80%,10%,10% for train, In Sohrab and Miwa (2018), the last 10% of the training
set is used as a development set, we include their result mainly
development and test set respectively. To make a because their system is similar to ours.
6
2
The revised version is provided by the shared task organ-
https://catalog.ldc.upenn.edu/LDC2005T09 iser in 2006 with more consistent annotations. We confirmed
3
https://catalog.ldc.upenn.edu/LDC2006T06 with the author of Akbik et al. (2018) that they used the revised
4
https://catalog.ldc.upenn.edu/LDC2013T19 version.

6472
Model P R F1 Model P R F1
ACE 2004 ONTONOTES

Katiyar and Cardie (2018) 73.6 71.8 72.7 Chiu and Nichols (2016) 86.0 86.5 86.3
Wang et al. (2018) - - 73.3 Strubell et al. (2017) - - 86.8
Clark et al. (2018) - - 88.8
Wang and Lu (2018) 78.0 72.4 75.1
Fisher and Vlachos (2019) - - 89.2
Straková et al. (2019) - - 84.4 Our model 91.1 91.5 91.3
Luan et al. (2019) - - 84.7
Our model 87.3 86.0 86.7 CONLL 2003 English
Chiu and Nichols (2016) 91.4 91.9 91.6
ACE 2005
Lample et al. (2016) - - 90.9
Katiyar and Cardie (2018) 70.6 70.4 70.5 Strubell et al. (2017) - - 90.7
Wang et al. (2018) - - 73.0 Devlin et al. (2019) - - 92.8
Wang and Lu (2018) 76.8 72.3 74.5 Straková et al. (2019) - - 93.4
Lin et al. (2019) 76.2 73.6 74.9 Our model 93.7 93.3 93.5
Fisher and Vlachos (2019) 82.7 82.1 82.4 CONLL 2003 German
Luan et al. (2019) - - 82.9 Lample et al. (2016) - - 78.8
Straková et al. (2019) - - 84.3 Straková et al. (2019) - - 85.1
Our model 85.2 85.6 85.4 Our model 88.3 84.6 86.4
GENIA CONLL 2003 German revised6
Katiyar and Cardie (2018) 79.8 68.2 73.6 Akbik et al. (2018) - - 88.3
Wang et al. (2018) - - 73.9 Our model 92.4 88.2 90.3
Ju et al. (2018) 78.5 71.3 74.7 CONLL 2002 Spanish
Wang and Lu (2018) 77.0 73.3 75.1
Lample et al. (2016) - - 85.8
Sohrab and Miwa (2018)5 93.2 64.0 77.1 Straková et al. (2019) - - 88.8
Lin et al. (2019) 75.8 73.9 74.8 Our model 90.6 90.0 90.3
Luan et al. (2019) - - 76.2
CONLL 2002 Dutch
Straková et al. (2019) - - 78.3
Our model 81.8 79.3 80.5 Lample et al. (2016) - - 81.7
Akbik et al. (2019) - - 90.4
Table 2: State of the art comparison on ACE 2004, ACE Straková et al. (2019) - - 92.7
2005 and GENIA corpora for nested NER. Our model 94.5 92.8 93.7

Table 3: State of the art comparison on CONLL 2002,


5 Results on Nested NER CONLL 2003, ONTONOTES corpora for flat NER.

Using the constraints for nested NER, we first eval-


embeddings which are less informative for cate-
uate our system on nested named entity corpora:
gories such as DNA, RNA. Our system achieved
ACE 2004, ACE 2005 and GENIA . Table 2 shows
SoTA results on all three corpora for nested NER
the results. Both ACE 2004 and ACE 2005 contain
and demonstrates well the advantages of a struc-
7 NER categories and have a relatively high ratio of
tural prediction over sequence labelling approach.
nested entities (about 1/3 of then named entities are
nested). Our results outperform the previous SoTA 6 Results on Flat NER
system by 2% (ACE 2004) and 1.1% (ACE 2005),
respectively. GENIA differs from ACE 2004 and We evaluate our system on five corpora for flat NER
ACE 2005 and uses five medical categories such (CONLL 2002 (Dutch, Spanish), CONLL 2003 (En-
as DNA or RNA. For the GENIA corpus our sys- glish, German) and ONTONOTES. Unlike most of
tem achieved an F1 score of 80.5% and improved the systems that treat flat NER as a sequence la-
the SoTA by 2.2% absolute. Our hypothesise is belling task, our system predicts named entities by
that for GENIA the high accuracy gain is due to our considering all possible spans and ranking them.
structural prediction approach and that sequence-to- The ONTONOTES corpus consists of documents
sequence models rely more on the language model form 7 different domains and is annotated with 18

6473
F1 ∆ a biaffine model and confirms our hypothesis that
the dependency parsing framework is an important
Our model 89.9
factor for the high accuracy of our system.
- biaffine 89.1 0.8
Contextual Embeddings We ablate BERT em-
- BERT emb 87.5 2.4
beddings and as expected, after removing BERT
- fastText emb 89.5 0.4
embeddings, the system performance drops by a
- Char emb 89.8 0.1
large number of 2.4 percentage points (see Table
4). This shows that BERT embeddings are one of
Table 4: The comparison between our full model and
ablated models on ONTONOTES development set. the most important factors for the accuracy.
Context Independent Embeddings We re-
move the context-independent fastText embedding
fine-grained named entity categories. To predict from our system. The context-independent em-
named entities for this corpus is more difficult than bedding contributes 0.4% towards the score of our
for CONLL 2002 and CONLL 2003. These corpora full system (Table 4). Which suggests that even
use coarse-grained named entity categories (only with the BERT embeddings enabled, the context-
4 categories). The sequence-to-sequence models independent embeddings can still make quite no-
usually perform better on the CONLL 2003 English ticeable improvement to a system.
corpus (see Table 3), e.g. the system of Chiu and Character Embeddings Finally, we remove the
Nichols (2016); Strubell et al. (2017). In contrast, character embeddings. As we can see from Table 4,
our system is less sensitive to the domain and the the impact of character embeddings is quite small.
granularity of the categories. As shown in Table 3, One explanation would be that English is not a mor-
our system achieved an F1 score of 91.3% on the phologically rich language hence does not benefit
ONTONOTES corpus and is very close to our system largely from character-level information and the
performance on the CONLL 2003 corpus (93.5%). BERT embeddings itself are based on word pieces
On the multi-lingual data, our system achieved F1 that already capture some character-level informa-
scores of 86.4% for German, 90.3% for Spanish tion.
and 93.5% for Dutch. Our system outperforms the Overall, the biaffine mapping and the BERT em-
previous SoTA results by large margin of 2.1%, bedding together contributed most to the high ac-
1.5%, 1.3% and 1% on ONTONOTES, Spanish, Ger- curacy of our system.
man and Dutch corpora respectively and is slightly
better than the SoTA on English data set. In ad- 8 Conclusion
dition, we also tested our system on the revised In this paper, we reformulate NER as a structured
version of German data to compare with the model prediction task and adopted a SoTA dependency
by Akbik et al. (2018), our system again achieved parsing approach for nested and flat NER. Our sys-
a substantial gain of 2% when compared with their tem uses contextual embeddings as input to a multi-
system. layer BiLSTM. We employ a biaffine model to
assign scores for all spans in a sentence. Further
7 Ablation Study constraints are used to predict nested or flat named
entities. We evaluated our system on eight named
To evaluate the contribution of individual compo-
entity corpora. The results show that our system
nents of our system, we further remove selected
achieves SoTA on all of the eight corpora. We
components and use ONTONOTES for evaluation
demonstrate that advanced structured prediction
(see Table 4). We choose ONTONOTES for our ab-
techniques lead to substantial improvements for
lation study as it is the largest corpus.
both nested and flat NER.
Biaffine Classifier We replace the biaffine map-
ping with a CRF layer and convert our system into Acknowledgments
a sequence labelling model. The CRF layer is fre-
quently used in models for flat NER, e.g. (Lample This research was supported in part by the DALI
et al., 2016). When we replace the biaffine model project, ERC Grant 695662.
of our system with a CRF layer, the performance
drops by 0.8 percentage points (Table 4). The large
performance difference shows the benefit of adding

6474
References NER. In Proceedings of the 57th Annual Meet-
ing of the Association for Computational Linguis-
Alan Akbik, Tanja Bergmann, and Roland Vollgraf. tics, pages 5840–5850, Florence, Italy. Association
2019. Pooled contextualized embeddings for named for Computational Linguistics.
entity recognition. In Proceedings of the 2019 Con-
ference of the North American Chapter of the Asso- Meizhi Ju, Makoto Miwa, and Sophia Ananiadou.
ciation for Computational Linguistics: Human Lan- 2018. A neural layered model for nested named en-
guage Technologies, Volume 1 (Long and Short Pa- tity recognition. In Proceedings of the 2018 Con-
pers), pages 724–728, Minneapolis, Minnesota. As- ference of the North American Chapter of the Asso-
sociation for Computational Linguistics. ciation for Computational Linguistics: Human Lan-
guage Technologies, Volume 1 (Long Papers), pages
Alan Akbik, Duncan Blythe, and Roland Vollgraf.
1446–1459, New Orleans, Louisiana. Association
2018. Contextual string embeddings for sequence
for Computational Linguistics.
labeling. In Proceedings of the 27th International
Conference on Computational Linguistics, pages
Ben Kantor and Amir Globerson. 2019. Coreference
1638–1649, Santa Fe, New Mexico, USA. Associ-
resolution with entity equalization. In Proceed-
ation for Computational Linguistics.
ings of the 57th Annual Meeting of the Association
Barry Haddow Beatrice Alex and Claire Grover. 2007. for Computational Linguistics, pages 673–677, Flo-
Recognising nested named entities in biomedical rence, Italy. Association for Computational Linguis-
text. In Proc. of BioNLP, pages 65–72. tics.

Piotr Bojanowski, Edouard Grave, Armand Joulin, Arzoo Katiyar and Claire Cardie. 2018. Nested named
and Tomas Mikolov. 2016. Enriching word vec- entity recognition revisited. In Proceedings of the
tors with subword information. arXiv preprint 2018 Conference of the North American Chapter of
arXiv:1607.04606. the Association for Computational Linguistics: Hu-
man Language Technologies, Volume 1 (Long Pa-
Jason PC Chiu and Eric Nichols. 2016. Named entity pers), pages 861–871, New Orleans, Louisiana. As-
recognition with bidirectional lstm-cnns. Transac- sociation for Computational Linguistics.
tions of the Association for Computational Linguis-
tics, 4:357–370. J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii. 2003. GE-
NIA corpus—a semantically annotated corpus for
Kevin Clark, Minh-Thang Luong, Christopher D. Man- bio-textmining. Bioinformatics, 19(suppl1 ) : i180−
ning, and Quoc Le. 2018. Semi-supervised se- −i182.
quence modeling with cross-view training. In Pro-
ceedings of the 2018 Conference on Empirical Meth- Guillaume Lample, Miguel Ballesteros, Sandeep Subra-
ods in Natural Language Processing, pages 1914– manian, Kazuya Kawakami, and Chris Dyer. 2016.
1925, Brussels, Belgium. Association for Computa- Neural architectures for named entity recognition. In
tional Linguistics. Proceedings of the 2016 Conference of the North Amer-
ican Chapter of the Association for Computational Lin-
Ronan Collobert, Jason Weston, Léon Bottou, Michael guistics: Human Language Technologies, pages 260–
Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 270. Association for Computational Linguistics.
2011. Natural language processing (almost) from
scratch. Journal of machine learning research, Hongyu Lin, Yaojie Lu, Xianpei Han, and Le Sun. 2019.
12(Aug):2493–2537. Sequence-to-nuggets: Nested entity mention detection
via anchor-region networks. In Proceedings of the 57th
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Annual Meeting of the Association for Computational
Kristina Toutanova. 2019. Bert: Pre-training of deep
Linguistics, pages 5182–5192, Florence, Italy. Associa-
bidirectional transformers for language understand-
tion for Computational Linguistics.
ing. In Proceedings of the 2019 Annual Conference
of the North American Chapter of the Association
Wei Lu and Dan Roth. 2015. Joint mention extraction and
for Computational Linguistics.
classification with mention hypergraphs. In Proceed-
Timothy Dozat and Christopher Manning. 2017. Deep ings of the 2015 Conference on Empirical Methods in
biaffine attention for neural dependency parsing. Natural Language Processing, pages 857–867, Lisbon,
In Proceedings of 5th International Conference on Portugal. Association for Computational Linguistics.
Learning Representations (ICLR).
Yi Luan, Dave Wadden, Luheng He, Amy Shah, Mari
Jenny Rose Finkel and Christopher D. Manning. 2009. Ostendorf, and Hannaneh Hajishirzi. 2019. A gen-
Nested named entity recognition. In Proceedings of eral framework for information extraction using dy-
the 2009 Conference on Empirical Methods in Nat- namic span graphs. In Proceedings of the 2019 Con-
ural Language Processing, pages 141–150, Singa- ference of the North American Chapter of the Associa-
pore. Association for Computational Linguistics. tion for Computational Linguistics: Human Language
Technologies, Volume 1 (Long and Short Papers), pages
Joseph Fisher and Andreas Vlachos. 2019. Merge and 3036–3046, Minneapolis, Minnesota. Association for
label: A novel neural network architecture for nested Computational Linguistics.

6475
Xuezhe Ma and Eduard Hovy. 2016. End-to-end se- of the Seventh Conference on Natural Language Learn-
quence labeling via bi-directional LSTM-CNNs-CRF. ing at HLT-NAACL 2003, pages 142–147.
In Proceedings of the 54th Annual Meeting of the Asso-
ciation for Computational Linguistics (Volume 1: Long Bailin Wang and Wei Lu. 2018. Neural segmental hyper-
Papers), pages 1064–1074, Berlin, Germany. Associa- graphs for overlapping mention recognition. In Pro-
tion for Computational Linguistics. ceedings of the 2018 Conference on Empirical Meth-
ods in Natural Language Processing, pages 204–214,
Aldrian Obaja Muis and Wei Lu. 2017. Labeling gaps be- Brussels, Belgium. Association for Computational Lin-
tween words: Recognizing overlapping mentions with guistics.
mention separators. In Proceedings of the 2017 Confer-
ence on Empirical Methods in Natural Language Pro- Bailin Wang, Wei Lu, Yu Wang, and Hongxia Jin. 2018.
cessing, pages 2608–2618, Copenhagen, Denmark. As- A neural transition-based model for nested mention
sociation for Computational Linguistics. recognition. In Proceedings of the 2018 Conference
on Empirical Methods in Natural Language Process-
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt ing, pages 1011–1017, Brussels, Belgium. Association
Gardner, Christopher Clark, Kenton Lee, and Luke S. for Computational Linguistics.
Zettlemoyer. 2018. Deep contextualized word repre-
sentations. In Proceedings of the 2018 Annual Confer- Changmeng Zheng, Yi Cai, Jingyun Xu, Ho-fung Leung,
ence of the North American Chapter of the Association and Guandong Xu. 2019. A boundary-aware neural
for Computational Linguistics. model for nested named entity recognition. In Pro-
ceedings of the 2019 Conference on Empirical Methods
Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, in Natural Language Processing and the 9th Interna-
Olga Uryupina, and Yuchen Zhang. 2012. CoNLL- tional Joint Conference on Natural Language Process-
2012 shared task: Modeling multilingual unrestricted ing (EMNLP-IJCNLP), pages 357–366, Hong Kong,
coreference in OntoNotes. In Proceedings of the China. Association for Computational Linguistics.
Sixteenth Conference on Computational Natural Lan-
guage Learning (CoNLL 2012), Jeju, Korea.
Dan Shen, Jie Zhang, Guodong Zhou, Jian Su, and Chew-
Lim Tan. 2003. Effective adaptation of a Hidden
Markov Model-based Named Entity Recognizer for
the biomedical domain. In Proceedings of the ACL
2003 Workshop on Natural Language Processing in
Biomedicine.
Mohammad Golam Sohrab and Makoto Miwa. 2018.
Deep exhaustive model for nested named entity recog-
nition. In Proceedings of the 2018 Conference on
Empirical Methods in Natural Language Processing,
pages 2843–2849, Brussels, Belgium. Association for
Computational Linguistics.
Jana Straková, Milan Straka, and Jan Hajic. 2019. Neural
architectures for nested NER through linearization. In
Proceedings of the 57th Annual Meeting of the Associ-
ation for Computational Linguistics, pages 5326–5331,
Florence, Italy. Association for Computational Linguis-
tics.
Emma Strubell, Patrick Verga, David Belanger, and An-
drew McCallum. 2017. Fast and accurate entity recog-
nition with iterated dilated convolutions. In Proceed-
ings of the 2017 Conference on Empirical Methods
in Natural Language Processing, pages 2670–2680,
Copenhagen, Denmark. Association for Computational
Linguistics.
Erik F. Tjong Kim Sang. 2002. Introduction to
the CoNLL-2002 shared task: Language-independent
named entity recognition. In COLING-02: The
6th Conference on Natural Language Learning 2002
(CoNLL-2002).
Erik F. Tjong Kim Sang and Fien De Meulder. 2003. In-
troduction to the CoNLL-2003 shared task: Language-
independent named entity recognition. In Proceedings

6476

You might also like