Extracting named entities with SpanRuler
In many NLP applications, including semantic parsing, we start looking for meaning in a text by examining the entity types and placing an entity extraction component into our NLP pipelines. Named entities play a key role in understanding the meaning of user text.
We’ll also start a semantic parsing pipeline by extracting the named entities from our corpus. To understand what sort of entities we want to extract, first, we’ll get to know the ATIS dataset.
Getting to know the ATIS dataset
Throughout this chapter, we’ll work with the ATIS corpus. ATIS is a well-known dataset; it’s one of the standard benchmark datasets for intent classification. The dataset consists of utterances from customers who want to book a flight and/or get information about flights, including flight costs, destinations, and timetables.
No matter what the NLP task is, you should always go over your corpus with the naked eye. We want...