Creating patterns with SpanRuler
spaCy’s SpanRuler component allows us to add spans to the Doc.spans and/or Doc.ents dictionaries using token-based rules or exact phrase matches.
SpanRuler is not a matcher; it’s a pipeline component that we can add to our pipeline via nlp.add_pipe. When it finds a match, the match is appended to doc.ents or doc.spans. If adding to doc.ents, ent_type will be the label we pass in the pattern. Let’s see it in action:
- First, define a pattern for the entity:
patterns = [{"label": "ORG", "pattern": [{"LOWER": "chime"}]}] - Now, we add the
SpanRulercomponent. By default, it adds the spans todoc.spans, and we want to add it todoc.ents, so we specify that in the config:span_ruler = nlp.add_pipe( "span_ruler", config={"annotate_ents":True}) - Now, we can add the...