Creating a pipeline component using extension attributes
To create our component, we will use the @Language.factory decorator. A component factory is a callable that takes settings and returns a pipeline component function. The @Language.factory decorator also adds the name of the custom component to the registry, making it possible to use the .add_pipe() method to add the component to the pipeline.
spaCy allows you to set any custom attributes and methods on the Doc, Span, and Token objects, which become available as Doc._., Span._., and Token._.. In our case, we will add Doc._.intent to Doc, taking advantage of spaCy’s data structures to store our data.
We will implement the component logic inside a Python class. spaCy expects the __init__() method to take the nlp and name arguments (spaCy fills then automatically), and the __call__() method should receive and return Doc.
Let’s create the IntentComponent class:
- First, we create the class. Inside the...