Training an EntityLinker component with spaCy
The first step to train the model is to create the KB. We want to create a pipeline that will detect whether a reference to Taylor means a reference to Taylor Swift (singer), Taylor Lautner (actor), or Taylor Fritz (tennis player). Each of them has its own page and identifier on Wikidata so we will use Wikidata as our KB source. To create the KB, we need to create an instance of the InMemoryLookupKB class passing the shared Vocab object and the size of the embeddings that we`ll use to encode the entities. Let’s create our KB:
- First, we will choose the
Languageobject (en_core_web_md) and add aSpanRulercomponent to match all thetaylormentions (this will be used to create the corpus):import spacy nlp = spacy.load("en_core_web_md") ruler = nlp.add_pipe("span_ruler", after="ner") patterns = [{"label": "PERSON", "pattern": [{"LOWER": "taylor"}...