Installing spaCy’s language models
The spaCy installation doesn’t come with the statistical language models needed for the spaCy pipeline tasks. spaCy language models contain knowledge about a specific language collected from a set of resources. Language models let us perform a variety of NLP tasks, including parts of speech tagging popularly called as POS tagging and named entity recognition (NER).
Different languages have different models that are language-specific. There are also different models available for the same language. The naming convention of the models is [lang]_[name]. The [name] part usually contains information about the model capabilities, the genre, and the size. For example, the pt_core_web_sm model is a small Portuguese pipeline trained on web text. Large models can require a lot of disk space, for example, en_core_web_lg takes up 382 MB, while en_core_web_md needs 31 and en_core_web_sm takes only 12 MB.
It is a good practice to match the model...