Creating patterns with PhraseMatcher
While processing financial, medical, or legal text, we often have long lists and dictionaries and we want to scan the text against our lists. As we saw in the previous section, Matcher patterns are quite handcrafted; we coded each token individually. If you have a long list of phrases, Matcher is not very handy. It’s not possible to code all the terms one by one.
spaCy offers a solution for comparing text against long dictionaries – the PhraseMatcher class. The PhraseMatcher class helps us match long dictionaries. Let’s get started with a basic example of using PhraseMatcher to match terms defined in a list:
- Import the library and the class and instantiate the
nlppipeline as usual:import spacy from spacy.matcher import PhraseMatcher nlp = spacy.load("en_core_web_sm") - Now we can instantiate the
PhraseMatcherobject and callnlp.make_doc()on the terms one by one to create patterns:matcher = PhraseMatcher...