Overview of spaCy conventions
Calling nlp on our text makes spaCy run a pipeline consisting of many processing steps. The first one is the tokenization to produce a Doc object. Then, depending on the spaCy components we choose to add to our pipeline, the text can be further processed by components such as a tagger, a parser, and an entity recognizer. We call this a language processing pipeline. Each pipeline is built using components. Each component returns the processed Doc and then passes it to the next component. This process is showcased in the following diagram:
Figure 2.1 – A high-level view of the processing pipeline
A spaCy pipeline object is created when we load a language model. In the following code segment, we load an English model and initialize a pipeline:
- First, we import spaCy and use
spacy.loadto return aLanguageclass instance:import spacy nlp = spacy.load("en_core_web_md") - Now we can use this
Languageinstance...