Understanding the entity linking task
Entity linking is the task of identifying the entity mentioned and linking it to the corresponding entry in each knowledge base. For example, the Washington entity can refer to the person George Washington or the US state. With entity linking or entity resolution, our goal is to map the entity to the correct real-world representation. As spaCy’s documentation says, the EntityLinker spaCy architecture requires three main components:
- A knowledge base (KB) to store the unique identifiers, synonyms, and prior probabilities
- A candidate generation step to produce the likely identifiers
- A machine learning model to select the most likely ID from the list of candidates
In KB, each textual mention (alias) is represented as a Candidate object that may or may not be linked to an entity. A prior probability is assigned to each candidate (alias, entity) pair.
In the spaCy EntityLinker architecture, first, we initialize a KB with...